We finally submitted a paper about FTL, our language for specifying layout languages and synthesizing a parallel evaluator. The paper is dense, so we'd appreciate feedback on what to cut and what to expand :)
Our systems goes all the way from a high-level language specification to an optimized browser layout engine. We had to solve two basic problems:
- Strongly scaling the parallel tree traversals. Our specification analysis will take a language specification, such as for CSS, and find a sequence of parallel tree traversals that will lay out any document (the tree) in that language. However, we couldn't find any existing algorithms that would give you a 2x speedup on 2 cores, 4x speedup on 4 cores, etc. We ended up A) finding a mix of existing data layout optimizations (pointer compression, blocking, etc.) and B) inventing our own runtime task scheduler (semi-static scheduling that simulates work stealing to precompute a partitioning). Scaling plummets if you drop either. More fun, due to the locality benefits of data layout optimizations, we achieve superlinear speedups -- 9.2.x on 8 cores!
- Generating the right sequence of tree traversals. While our system will automatically find parallelism in a declarative specification, it won't find all. For example, specifying how a box lays out its children by using absolute coordinates creates a global dependency, but specifying them as coordinates relative to the box and then stitching them together to get the absolute ones will break the dependency enough for our analysis. Programmers can write queries about the set of schedules inferable from a specification to help debug this sort of problem.
We found reasoning about schedules to be such a helpful programming tool that we incorporated it into our specification language. For example, it's very convenient to make rendering calls within a specification, e.g., "box(self.x,self.y,self.w,self.h,self.color)". The specification must take care to say what order to render boxes, such as the foreground text being above the background. Z indices are ambiguous due to overlap and monadically passing around the dependency is really painful. We can take a lesson from the current informal CSS standard, which specifies the document traversal order and then locally defines order within each node. Our solution is to allow functional specifications (the grammar) to be refined with traversal order constraints ("do rendering in an inorder traversal"). We can also catch specification changes that violate such a constraint!
Synthesizing schedules yields extra benefits. When synthesizing an implementation that talks to legacy code, this has the added benefit of making it easy to avoid parallel calls into non-reentrant libraries. Furthermore, many different traversals will implement the same specification: our synthesizer can generate them and autotune over them.
I'm working with some great undergrads to move forward on several fronts. At MSR, I wrote a proof-of-concept manual implementation that uses SIMD instructions for further speedups within a core: we're now automating it as a new FTL backend. Next, Thibaud Hottelier is writing a visualization language based on bidirectional constraints that compiles to FTL, which our undergrads are using to write some stunning demos. I already wrote one to animate inspecting file sizes on your hard drive (see picture above): we get 22fps for animating 20,000 files, and we should handle bigger data sets once I fix some segfaults. [For reference -- the HTML5 version is 8fps and, on my phone, 1fps]. Finally, we're pushing FTL's expressive limits by formalizing the CSS standard in it. Fun times ahead!