Friday, August 27, 2010

Parallel Attribute Grammars / Tree Stencil / Visitors

Got my first parallelism results for the tree solving part of my work. When computing on a (task parallel) tree, if the tree is big or the the computations per node is large, parallelization is easy: just traverse and work steal (e.g, Intel's TBB or Cilk). However, we often have small calculations, such as computing a (tree) sum. Furthermore, we also want to do many macroscopic (tree-wide) optimizations like changing its layout for better memory access patterns. I spent most of the summer doing macroscopic sequential optimizations for my tree language but now also have finally achieved parallel speedups for a minimal CSS-like layout language (vertical boxes, horizontal boxes, and colors). It is a *very* minimal language, meaning a lightweight computation (4 tree traversals, each less than 1ms), so speedups are non-trivial. However, by having very lightweight traversals, adding additional passes now has less of a sting (giving leeway to both the flexibility of CSS and the quality of our attribute grammar -> tree stencil compiler).





4-pass grammar speedup (where one of the passes is followed by parallel font handling code, but not counted). Overheads of creating and destroying pthreads are not counted (later step will be to do standard pooling and warmup tricks).

Monday, August 2, 2010

An Undercited Paper

Note: this is a rewrite of the original posting as the blogger interface confused me into thinking the edit an article button was just a comment button.

"NESL": Seminal work on data parallelism that handles regular structures like matrices very well. However, super common structures like trees, graphs, etc. are supported only in hard-to-read encoding or overly restricted (e.g., fixed depth) legible ones. Front page of google scholar: ~600 citations.

"Flattening Trees": shows that recursive types -- such as trees -- are no big deal to add at the interface level and that the flattening transformation can be modified to support them. ~22 citations and what seems like the real first step towards making flattening acceptable for general purpose languages. "More types for nested data parallel programming" finishes the job. (Nit: conflating nested data parallel with flattening is bad; these papers are really about the latter which is just *one* approach to the former).

Bonus read: "Inductive Graphs and Functional Graph Algorithms" shows how to (sequentially) support graphs. It's interesting in that 1) graph specialization is restricted to an ADT that could be optimized external to the language and 2) similarly, I suspect nested data parallelism can be reasonably achieved as library with it. At least for the near future, having the work be at the library level is a big deal: API changes are more forgiving.