Monday, May 25, 2009

Mixing thread-aware and thread-agnostic code

For almost all of the algorithms I've been playing with for the parallel browser, Cilk-style parallelism matches. My development pattern is to do a sequential version, do a Cilk++ sketch, and then, for final tweaking, convert to TBB. (... and a lot of iteration involving hawkish monitoring of KCacheGrind statistics). However, invariably, something always goes wrong.

This week, it's using task-parallelism with a multi-threaded library. Task parallelism gets you away from the notion of a thread: whenever you have a unit of work, you just spawn it off, and thus may have many tasks for only a few processors. With threads, assuming you're CPU bound, you have as many threads as processors. FreeType2 is written for threaded use: each thread gets its own Library. However, task parallel usage (I'm rendering a bunch of glyphs: turning one character into a pixel can be thought of as a task) doesn't map nicely -- if I were to create a Library per task, I'd have to create thousands of Libraries instead of, say, 8.

The naive solution is to set up a resource pool: a task asks the pool for a Library when it starts, and returns it when it finishes. If there is no Library available, it gets created. If tasks are really small (e.g., individual characters, as opposed to, say, words), there'll be a lot of chatter when trying to get these Libraries (and, even if not, locks waste cycles, which is still a penalty proportional to task size).

TBB, because it is a library level solution, has actual task objects (Cilk should just puts some sort of continuation mark on the stack) and therefore faces the same problem all the time. It provides conveniences for reusing task objects (think of it like a manual TCO or trampoline). When reusing a task, a good habit can be to reuse data within it. In this case, when a task completes, it passes off its Library object to the next one that gets/becomes the reified task object.

Unfortunately, I don't think the code will work out that well. In reality, there's a hierarchy of resources (Library -> {Font}, Font -> {Glyph}). It'll work, but the impedance mismatch will cause some slowdowns.
Post a Comment