Note: this is a rewrite of the original posting as the blogger interface confused me into thinking the edit an article button was just a comment button.
"NESL": Seminal work on data parallelism that handles regular structures like matrices very well. However, super common structures like trees, graphs, etc. are supported only in hard-to-read encoding or overly restricted (e.g., fixed depth) legible ones. Front page of google scholar: ~600 citations.
"Flattening Trees": shows that recursive types -- such as trees -- are no big deal to add at the interface level and that the flattening transformation can be modified to support them. ~22 citations and what seems like the real first step towards making flattening acceptable for general purpose languages. "More types for nested data parallel programming" finishes the job. (Nit: conflating nested data parallel with flattening is bad; these papers are really about the latter which is just *one* approach to the former).
Bonus read: "Inductive Graphs and Functional Graph Algorithms" shows how to (sequentially) support graphs. It's interesting in that 1) graph specialization is restricted to an ADT that could be optimized external to the language and 2) similarly, I suspect nested data parallelism can be reasonably achieved as library with it. At least for the near future, having the work be at the library level is a big deal: API changes are more forgiving.
Underappreciated, indeed.
ReplyDeleteUnfortunately, the resulting encoding of trees is not as powerful as the one in Blelloch's thesis (also a book, MIT press), see section 5.2.1.
Notably, the "depth" measure (as opposed to "work") for their computations is linear in the height of the tree, so if you have a very-unbalanced tree you will get a very-serial computation. This is kind of a bummer, because the great thing about NESL is that the programmer doesn't have to think about balancing the workload; just make sure you keep your vectors "big" and the language does it for you. Unfortunately the DPH algorithm turns unbalanced trees into lots of small vectors.
The hand-derived encoding Blelloch used doesn't have this problem -- you can do a "leaffix" on a completely unbalanced tree O(1) depth. Unfortunately Blelloch's approach required the programmer to do a lot of low-level vector manipulation.
Bridging the gap would be a major publishable result.
Have you considered pinging languges@cs.berkeley to see who else working on this? You might be surprised.
I'm not so sure the problem is as bad as you state. If we're 1) vectorizing and 2) have data dependencies, it's good and I don't think you'll do much better. For more SIMT or multicore parallelism, that's the data parallel haskell form, which is the 'real world' successor to this research: it uses work stealing where each segment records how much weight is below it. There are publications here but I don't think that part has been as interesting as the less-analyzed impact (and challenges) of integrating into existing general purpose compiler passes (GHC's).
ReplyDeleteMy interest is actually at a tuning level. For example, the tree should be blocked, not entirely segmented. While I currently achieve this essentially in the ADT way -- users never see the blocking of data structures but I hand wave correctness proofs -- but I might be able to do fun things now like representing blocking as a type isomorphism. Additional macroscopic and tuning transforms, such as pointer compaction, might be similarly treated.