@LMeyerov: Scientist-at-large launching a big data visualization startup.
Previous life in hacking new languages: Superconductor for hardware accelerated data visualization, Ph.D. at Berkeley on multicore web browsers, Flapjax for reactive JavaScript (FRP), and ConScript+Margrave for secure scripting.
Friday, December 25, 2009
In lieu of a dataflow analysis
In lieu of a dataflow analysis for my JS subset.. I used a bit of reflection and wrote 6 regular expressions that assume a simple syntactic convention. And now time for automatic incrementalization..
Wednesday, December 23, 2009
Browser in a browser!
Spent the evening going from
V({backgroundColor: "rgb(255, 204, 204)",
wCnstrnt: {px: 100},
hCnstrnt: {px: 100}},
V({wCnstrnt: {px: 20}, hCnstrnt: {px: 20}, backgroundColor: "red"}),
H({wCnstrnt: {px: 30}, hCnstrnt: "auto", backgroundColor: "green"},
V({wCnstrnt: {px: 20}, hCnstrnt: {px: 20}, backgroundColor: "orange"}),
V({wCnstrnt: {px: 5}, hCnstrnt: {px: 5}, backgroundColor: "white"})))
to

... and all within your browser!
The magic is actually what's on the inside: I really spent the evening writing a layout engine generator that takes in a layout engine specification and emits an engine that can consume a basic box layout, solve it, and render it. Tomorrow I'll add in left floats and hopefully *automatically* incrementalize it :) Probably won't get to working on the finale -- full specification and verification -- until I get more performance results on the basic parallel engine stuff. However.. still kinda cool it was that easy.
Check out the BSS0 and BSS1 layout specification in the source of my toy layout interpreter. In a slicker version, the inherited/synthesized pass stuff might be inferred (being explicit about the staging helps for performance guarantees, however). Anyways, for this version, you need to run in Firefox with Firebug enabled (or delete all the console stuff).
V({backgroundColor: "rgb(255, 204, 204)",
wCnstrnt: {px: 100},
hCnstrnt: {px: 100}},
V({wCnstrnt: {px: 20}, hCnstrnt: {px: 20}, backgroundColor: "red"}),
H({wCnstrnt: {px: 30}, hCnstrnt: "auto", backgroundColor: "green"},
V({wCnstrnt: {px: 20}, hCnstrnt: {px: 20}, backgroundColor: "orange"}),
V({wCnstrnt: {px: 5}, hCnstrnt: {px: 5}, backgroundColor: "white"})))
to

... and all within your browser!
The magic is actually what's on the inside: I really spent the evening writing a layout engine generator that takes in a layout engine specification and emits an engine that can consume a basic box layout, solve it, and render it. Tomorrow I'll add in left floats and hopefully *automatically* incrementalize it :) Probably won't get to working on the finale -- full specification and verification -- until I get more performance results on the basic parallel engine stuff. However.. still kinda cool it was that easy.
Check out the BSS0 and BSS1 layout specification in the source of my toy layout interpreter. In a slicker version, the inherited/synthesized pass stuff might be inferred (being explicit about the staging helps for performance guarantees, however). Anyways, for this version, you need to run in Firefox with Firebug enabled (or delete all the console stuff).
Thursday, December 17, 2009
Exciting
Yesterday I finished rewriting my CSS layout grammars to be functionally pure (and restrict almost all iteration to that over nodes): with a bit more work, I think you can represent them in the language of Rep's very simple grammars (though not if you want good parallelism ;-)). This is important because, if I generate solvers based on such a grammar, it's ok to only support a restricted language. Result 1: win!
Today, I've been extending my documentation on the grammars. Part of this is the negative space. I haven't even read the specification of margins/padding/clearance/tables. I remembered that, while I also don't handle overflow (visible, auto, scroll, hidden) and exotic positioning styles (absolute, fixed, etc.), they didn't seem hard. Beyond some interactions with floats, they weren't -- haven't created new language levels with them yet, but added the informal intuition of what changes. Result 2: win!
I already have floats (well, left-floats), inlines, and boxes. Now just tables, margins, and padding, and we'll support most websites! That means there's a pure, linear time representation of CSS with clear (speculative) data parallelism and a simple language -- making it incremental and parallel might not be so bad. Still worried about the renderer..
Almost done with our extended TR, just want to add the translation from CSS to BSS. Looking good :D Worst-case scenario, I drop out of grad school and become a CSS consultant.
Today, I've been extending my documentation on the grammars. Part of this is the negative space. I haven't even read the specification of margins/padding/clearance/tables. I remembered that, while I also don't handle overflow (visible, auto, scroll, hidden) and exotic positioning styles (absolute, fixed, etc.), they didn't seem hard. Beyond some interactions with floats, they weren't -- haven't created new language levels with them yet, but added the informal intuition of what changes. Result 2: win!
I already have floats (well, left-floats), inlines, and boxes. Now just tables, margins, and padding, and we'll support most websites! That means there's a pure, linear time representation of CSS with clear (speculative) data parallelism and a simple language -- making it incremental and parallel might not be so bad. Still worried about the renderer..
Almost done with our extended TR, just want to add the translation from CSS to BSS. Looking good :D Worst-case scenario, I drop out of grad school and become a CSS consultant.
Monday, December 14, 2009
When should something be a language feature?
Adrienne and I had been hacking on a JavaScript library for controlling capabilities (see previously posted snippets). We have some semblance of an understanding of its pros and cons, both in abstractions provided and strategy used to implement them. My suspicion is that it belongs as a language feature. That's a big claim.
I described our library to a friend who does a lot of security research. He does more systems and dynamic analysis techniques than linguistic ones so he didn't have intuition for my claim. More precisely, I think he fell under the blub paradox: he got along fine without it, and can hack together something if he really needs it, so why bother with the extra language complexity? When something should be a language feature is a tough but stimulating question, and I think some of my reasons for this project are representative:
Why something should be a language feature is a subtle question -- I still don't really know. Another perspective is to ask when something should not be a language feature. Where is the line?
I described our library to a friend who does a lot of security research. He does more systems and dynamic analysis techniques than linguistic ones so he didn't have intuition for my claim. More precisely, I think he fell under the blub paradox: he got along fine without it, and can hack together something if he really needs it, so why bother with the extra language complexity? When something should be a language feature is a tough but stimulating question, and I think some of my reasons for this project are representative:
- Performance. Bill Buxton, in his "order of magnitude" principle, states changing a dimension of something by a magnitude makes it a new thing. In this case, if a security feature is cheap, you feel fine using it -- imagine how we'd write C code if processes and message passing were as threads with shared memory! Using language support, we can do some dirty tricks. Lightweight threads in functional languages is essentially this.
- Conciseness. Similarly, if we struggle to write something, we won't. There's a world of a difference in writing functional code in C++ and Haskell or even O'Caml. If you have to write C++ code, it's possible to write most functional stuff with some encoding and library help, but if you want to write functional style code, you'll be more productive in other languages. This is the OOM principle all over again.
- Expressiveness. At a bigger scale, abstractions like continuations and aspects are fundamentally difficult to encode locally -- shoe-horning them into legacy code is tough.
- Legibility. Most of our time is spent reading, analyzing, testing, and revisiting code: removing boilerplate and standardizing idioms strengthens large and long-term projects. Data binding -- even if not at the Flapjax level -- is a crowd favorite here. SQL is probably another good example (... until it falls down).
- Automation. Spread through is a notion of global or standardized idioms: manually handling them increases the risk of error. My favorite example is manual garbage collection.
- Tool support. If a feature is important, we'll probably want program analysis help (verification, testing, etc.) and IDE support for it -- which are easier to do when a feature is made a language feature. Package/import handling in Eclipse for Java apps is just the tip of the iceberg here.
- TCB. The trusted computing base should be as small as possible: if a security critical property requires a lot of funny code and usage assumptions... that's bad. The whole multi-process browser architecture movement is, in a sense, about this (... though it's not the only way once you think about it like this).
- Finally... Understanding. The approach of making a calculus based around a feature is in part due to this: what's going on at a basic level and how necessary is it? What happens if we deeply embed it -- what's the value and impact? For a research project, making an idiom a language feature is an enlightening approach, even if it'll be watered down to a library later. I find this somewhat similar to denotational and typed approaches to coding.
Why something should be a language feature is a subtle question -- I still don't really know. Another perspective is to ask when something should not be a language feature. Where is the line?
Friday, December 4, 2009
Getting closer to a quals topic
Trying to figure out what my thesis project is going to be. The theme of my research at Berkeley has shifted, overall, from AJAX application abstractions (user level, inference, etc.) to browser design (security, performance, and specification).
I think my security work has been exciting and we're close to where I want to be with it (still want to combine our project on rich security abstractions with the one on lightweight browser security primitives for a powerful but feasible middle ground over the course of the next year). Unfortunately, I think the politics of (web) security research and web standards will limit the impact of this work in the short-term and the PL community wouldn't find it too interesting despite being driven by new linguistic abstractions (as opposed to the trend of coarse software architectures or analysis-driven approaches nobody uses). It's getting too frustrating for what I "know" is right. If I wanted a quick thesis, however, doing the unification, verifying a DOM policy language subset, and improving the language primitives / policy language with some guidance from more case studies would probably be the way to go.
The parallel browser stuff is getting more appealing in terms of popular interest, a vehicle for principled work, and a concrete set of problems. The rub is the theoretical component: what's actually new? Is it "just" engineering? I think the engineering challenge is crucial: just like we don't know how to build a multicore OS, which is recognized as a fundamental systems research question, so is building the browser, which is now essentially the 'other' stuff that's expected but not in the kernel. As I look at my Kindle and iPhone, I know I could use the speed. There's a clear problem -- but could it be solved by some guy who knows assembly, a few simple algorithms, or network-friendly compression algorithms? Luckily, answering even that is an open and crucial systems question.
However... appealing to the PLer inside, I'm finding common abstractions between my algorithms. The DOM tree is a structured model and a *lot* of computations that tax your CPU are centered around it. After writing another couple of components, I want to step back and figure out: what's an appropriate DSL for writing these browser-style algorithms? Currently, my suspicion is some sort of parallel, incremental (interactive/reactive), and continuous tree language (e.g., pipelines of attribute grammars). There is a lot of theoretical performance work in this area, but it suffers from lack of application, scaling, and unification.
Applying such a language to the variety of challenges in browser development will both make it compelling and flush out algorithmic concerns that aren't apparent when 'just' looking at traditional parsing tasks. Furthermore, hopefully I'll come out with generally useful artifacts: the next yacc and some browser components (like my layout grammars). So, "PICTL: The Parallel, Incremental, and Continuous Tree Language; or, How to Build a Parallel Browser."
We'll see. Should keep me busy for the next couple of years.
I think my security work has been exciting and we're close to where I want to be with it (still want to combine our project on rich security abstractions with the one on lightweight browser security primitives for a powerful but feasible middle ground over the course of the next year). Unfortunately, I think the politics of (web) security research and web standards will limit the impact of this work in the short-term and the PL community wouldn't find it too interesting despite being driven by new linguistic abstractions (as opposed to the trend of coarse software architectures or analysis-driven approaches nobody uses). It's getting too frustrating for what I "know" is right. If I wanted a quick thesis, however, doing the unification, verifying a DOM policy language subset, and improving the language primitives / policy language with some guidance from more case studies would probably be the way to go.
The parallel browser stuff is getting more appealing in terms of popular interest, a vehicle for principled work, and a concrete set of problems. The rub is the theoretical component: what's actually new? Is it "just" engineering? I think the engineering challenge is crucial: just like we don't know how to build a multicore OS, which is recognized as a fundamental systems research question, so is building the browser, which is now essentially the 'other' stuff that's expected but not in the kernel. As I look at my Kindle and iPhone, I know I could use the speed. There's a clear problem -- but could it be solved by some guy who knows assembly, a few simple algorithms, or network-friendly compression algorithms? Luckily, answering even that is an open and crucial systems question.
However... appealing to the PLer inside, I'm finding common abstractions between my algorithms. The DOM tree is a structured model and a *lot* of computations that tax your CPU are centered around it. After writing another couple of components, I want to step back and figure out: what's an appropriate DSL for writing these browser-style algorithms? Currently, my suspicion is some sort of parallel, incremental (interactive/reactive), and continuous tree language (e.g., pipelines of attribute grammars). There is a lot of theoretical performance work in this area, but it suffers from lack of application, scaling, and unification.
Applying such a language to the variety of challenges in browser development will both make it compelling and flush out algorithmic concerns that aren't apparent when 'just' looking at traditional parsing tasks. Furthermore, hopefully I'll come out with generally useful artifacts: the next yacc and some browser components (like my layout grammars). So, "PICTL: The Parallel, Incremental, and Continuous Tree Language; or, How to Build a Parallel Browser."
We'll see. Should keep me busy for the next couple of years.
Subscribe to:
Posts (Atom)