Friday, December 4, 2009

Getting closer to a quals topic

Trying to figure out what my thesis project is going to be. The theme of my research at Berkeley has shifted, overall, from AJAX application abstractions (user level, inference, etc.) to browser design (security, performance, and specification).

I think my security work has been exciting and we're close to where I want to be with it (still want to combine our project on rich security abstractions with the one on lightweight browser security primitives for a powerful but feasible middle ground over the course of the next year). Unfortunately, I think the politics of (web) security research and web standards will limit the impact of this work in the short-term and the PL community wouldn't find it too interesting despite being driven by new linguistic abstractions (as opposed to the trend of coarse software architectures or analysis-driven approaches nobody uses). It's getting too frustrating for what I "know" is right. If I wanted a quick thesis, however, doing the unification, verifying a DOM policy language subset, and improving the language primitives / policy language with some guidance from more case studies would probably be the way to go.

The parallel browser stuff is getting more appealing in terms of popular interest, a vehicle for principled work, and a concrete set of problems. The rub is the theoretical component: what's actually new? Is it "just" engineering? I think the engineering challenge is crucial: just like we don't know how to build a multicore OS, which is recognized as a fundamental systems research question, so is building the browser, which is now essentially the 'other' stuff that's expected but not in the kernel. As I look at my Kindle and iPhone, I know I could use the speed. There's a clear problem -- but could it be solved by some guy who knows assembly, a few simple algorithms, or network-friendly compression algorithms? Luckily, answering even that is an open and crucial systems question.

However... appealing to the PLer inside, I'm finding common abstractions between my algorithms. The DOM tree is a structured model and a *lot* of computations that tax your CPU are centered around it. After writing another couple of components, I want to step back and figure out: what's an appropriate DSL for writing these browser-style algorithms? Currently, my suspicion is some sort of parallel, incremental (interactive/reactive), and continuous tree language (e.g., pipelines of attribute grammars). There is a lot of theoretical performance work in this area, but it suffers from lack of application, scaling, and unification.


Applying such a language to the variety of challenges in browser development will both make it compelling and flush out algorithmic concerns that aren't apparent when 'just' looking at traditional parsing tasks. Furthermore, hopefully I'll come out with generally useful artifacts: the next yacc and some browser components (like my layout grammars). So, "PICTL: The Parallel, Incremental, and Continuous Tree Language; or, How to Build a Parallel Browser."

We'll see. Should keep me busy for the next couple of years.

No comments: