Friday, July 6, 2012

Synthesis over Parallel Patterns: A Leap Beyond Parallel Frameworks, Types, and Annotations


We've been struggling with explaining the theoretical PL parts of our language research, but I think we're finally circling in. Basically, we found attribute grammars OK for defining the functional behavior of CSS and simple enough for running basic compiler analyses, but the formalism is poor for programming over the parallel patterns used for efficient runtime evaluation. That's where our language extensions for synthesizing over parallel patterns comes in.

Synthesizing over / into parallel patterns breaks down to three high-level issues:

  1. The low-level language of patterns.  The low-level language has to thoroughly express how our crazy-optimized tree traversals can be used. Think of it as the assembly language for parallel algorithms.
  2. The usable high-level language. Writing in assembly sucks. Unlike traditional higher-level languages, which suffer from leaky abstractions, ours is a superset of the low-level one: you can always pick the right level. We do this by adding the simple notion of a hole. Assemble the patterns where and how you want, but whatever patterns are inferable or unimportant, just leave a hole and let the compiler figure it out.
  3. The compilation technique for going from the high-level pattern language to the low-level one. I'll try to give a more detailed explanation in a future post.  In a sense, our algorithm is a form of CEGIS (counter-example guided inductive synthesis) that searches over ways to complete a partial program. Every time there is a potential candidate, it checks if any input would trigger a bug in the candidate, and if so, prunes out this and any other completion with the same sort of breakage. Ours is a bit cooler than usual because we have a domain-specific constraint solver, use symbolic counter examples ones that give asymptotic progress on the search, and we get a queryable debugger for free. All for another day :) 

First, however, a recap of our use of attribute grammars for the functional specification language, all of which sits besides the pattern specification language.


Background: Attribute Grammars

The functional behavior in our language is specified with attribute grammars.  For example, to define CSS, you'll say the width of an HBox is the sum of its children. That says nothing about when it is evaluated. Later, via our pattern specification layer, you can control that as well. In theory, other logical specification languages other than attribute grammars can be used, all the way up to your traditional favorite imperative language (see Sketch). However, the road from attribute grammars is clear, with most other alternative surface layers adding a painful indirection.

A traditional attribute grammar would let you declaratively define the horizontal box layout widget as follows:

HBox -> Node Node { HBox.width := max(HBox0.minWidth, Node1.width + Node2.width); ... }


Under our extensions (object-orientation, loops, etc.), we would instead write:

class HBox (paintRect) : Node   attributes:     input minWidth     var intrinsWidth   children:     childs : [ Node ]   constraints     width = max(minWidth, intrinsWidth)     loop childs:       intrinsWidth = fold 0 .. childs[i - 1].width

Hopefully the mapping is intuitive. Our extensions are nice, for example, because we support lists of children. Instead of making an awkward NodeList tree schema with a traditional grammar, you can just have a list of Nodes ("childs : [Node]") in our modernized form (think EBNF).  In the end, after some compiler translation, it's the same thing. After reductions, traditional AG compiler checks ensure that every variable is assigned exactly once (well-defined and non-ambiguous spec) and can generate a schedule that orders the assignments. Code generation (e.g., of a layout engine) is a direct step after getting the schedule.  

In the example, our compiler realizes the width of an HBox should be computed after the loop. More subtle, when and how to perform the loops is determined as well. For example, this grammar defines a tree where some nodes have interface Node, such as HBox ones. The width of these nodes will be computed in a bottom-up or inorder traversal over the entire tree: for an inorder traversal, the compiler would insert a "recur" right before the computation of "intrinsWidth." Writing loopy code in our language is cathartic for this reason. You just say a statement occurs in some loop, and the compiler will figure out in which loop and when, and reason about both local loops over the children of a node and global loops over the tree. 



To learn more about attribute grammars and our use of them for layout language specifications, I previously wrote about our declarative looping construct and an early experience in mechanizing CSS's automatic table layout specification.




The Low-Level Pattern Language

Our conceptually deep extension really starts with being able to express this new fragment:
schedule:   (bottomUp ((HBox (intrinsWidth width))) ) ; ??
I'll get into the "??" bit later, but some of the low-level language shows through here: a bottomUp (parallel postorder) tree traversal. Both bottom up and inorder tree traversals are valid passes for laying out any tree in this widget language, so our declarative pattern scheduling construct lets you pick which one. Most layout languages have many passes: if there was another class in this language, whatever pass comes after the semicolon would presumably handle whatever it needs to solve.

Our schedules are expressive and fine-grained. The one above says exactly what attributes will be solved in the first bottomUp pass. Furthermore, the compiler will check that these attributes really will be solved in such a traversal. This is great if someone changes code elsewhere that has a non-local impact on the dependency structure, which is quite common in practice. The patterns can be complicated as well, such as using a (sequential) inorder traversal for some subtrees while doing an overall parallel bottom up, so help with reasoning about dependency validity is huge.

The amazing expressive bit about parallel pattern languages comes next, however.

High-Level Pattern Language: Declarative Scheduling with Holes

Imagine our HBox widget is being added to a parallel library. There might be some other traversals that happen after, but we want to guarantee that it will solve in the first one. The sequencing operator ";" runs one pattern after another. Powerfully, instead of saying what pattern to use next, we leave a hole, "??". It might be instantiated as another pattern, say "topDown", or even another composition of patterns, say "topDown ; bottomUp". The synthesizer will figure out what is valid. These can quite non-obvious. For example, we repeatedly found inorder tree traversals that sequentialize a computation can often be eliminated. Instead, a global parallel (topdown or bottomup) traversal is run, and only certain subtrees will be run as sequential subtraversals.

Holes can be anywhere (see the last line of the grammar above). If we don't care at all, we might replace the entire schedule with "??". More commonly, we'll specify something like that a bottomUp should be first and then trust the synthesizer to compute the right attributes in it: "(bottomUp ??) ; ??".

Not shown in the grammar above, and generally only rarely exploited, arbitrary Prolog expressions can actually be used to constrain the holes. For example, the schedule may be one where the last traversal is an inorder one that makes sequential rendering calls. Writing "?? ; (Inorder ??)" is incorrect because it presupposes there is a preceding traversal. Prolog, for all its oddities, ended up being a great for such constraints.

Synthesizing over Parallel Patterns

The above declarative pattern language let's us automatically use efficient parallel patterns with details where you care and holes where you don't. Synthesis over parallel patterns is powerful and expressive leap over say "@parallel" loop annotations in C++, map/reduce/etc. skeletons in cloud computing, and type-directed data parallelism in Haskell:

  • Our synthesizer finds complicated parallel pattern usages. Not discussed, it doubles as a parallelism inspecter/debugger. (Topic for another day: the algorithms under the patterns, such as my cool semi-static extension to work stealing that gives temporal locality and drops overheads).
  • Our synthesizer automatically generates code to exploit the parallelism based on an orthogonal specification. No need to group code in the spec based on the parallelization scheme!
  • Our low-level pattern language lets programmers communicate to each other and the compiler the precise parallelization scheme. 
  • Our high-level pattern language (holes via synthesis) lets programmers communicate at an arbitrarily high level. Combined with the above, we get some very different styles of parallel programming: sometimes we roll in the mud, sometimes we don't.
  • Our synthesizer checks and rejects invalid schedules. This is good both when designing the initial code and especially essential when maintaining it. Editing someone else's parallel code is normally terrible: now we can both precisely learn about and check against the previous versions!
It'll be fascinating to see if the community can get such declarative schedule synthesis for applying patterns beyond the tree traversal scenario. For example, I bet most graph computations (e.g., stencils) can have their level of abstraction similarly raised through our basic model.

No comments: