Tuesday, January 20, 2009

More on using Cilk++

An update to the earlier post.

A bit of expression reordering eliminated 30-40% of the computation time when using 2 cores, though showed no extra benefits when using more than 2 cores. I should be getting a ~80%, so my position that we want something more seems to stand.

The tweak was pretty interesting.

Before: (pseudo code)

function traverse (node) {
work();
cilk_for (child in node.children) traverse(child);
}


After:

function traverse (node) {
cilk_for (child in node.children) traverse(child);
work();
}


Normally, you want to prevent over-saturating your runtime with too many tasks, so you mix spawning tasks and doing the work. However, here, as the problem size is sufficiently small (600-4,000 tasks), and the beauty of work stealing, it's better to let the scheduler do its magic. It was only able to do so much, but this was a start.

End result? I have running code that chops 100-200ms off of your web page time, the approach will likely apply to other parts of the browser (I want the 1s processing time to be chopped down to 100ms!), and I'm probably not going to enjoy finding another approach that goes from 2-way parallelism to the potential 5-way.
Post a Comment