Given a set of reals, how do you bin them in a histogram?
E.g., if splitting on 3 and 6, {1.1, 2.2, 3.3, 6.6} => { [1.1, 2.2], [3.3], [6.6] }
We can reduce it to sorting -- first sort, then cut. The PRAM model is a myth so the constants are probably too high.
We can create a concurrent histogram with locks (or transactions) on rows. If there are only a few bins, there'll be too many collisions.
We can partition on data, creating separate histograms, and then merge. Each core will yield a histogram where the rows are sorted. The sequential bottleneck is then really the merge. If you have nice data structures, the merge should just be O(bins), which, by premise, is small: just chain together your linked list.
... now how do you do this in a task parallel setting? Enter Cilk's hyperobjects (and hope merge only gets called when cores merge, not just tasks). Except... what do I do for the TBB version?
There are a lot of 'canonical' parallel problems. We solved finite state machines in lexing and a parallel (tree) map in css selector matching; now it's a histogram for a redundancy eliminator ( in my css selector preprocessor), which is the last bit I'm thinking about parallelizing for now. Assuming I get something running tomorrow... onto a fixed-point computation for layout solving (which I already solved, in theory.)
Here's an alternate way to do binning in parallel - should work with any parallel system, though it was written and tested for Cilk++.
ReplyDeleteCilk++ blog post on binning.
Wow, great! I might have to do this sooner rather than later -- I've over-optimized the original bottleneck for a CSS selector engine, so Amdahl's law has made this binning the new interesting part. Hopefully the solution will carry over to the un-distilled problem!
ReplyDelete