Tuesday, September 28, 2010

1/3rd solved

The theory behind an LL(k) version of worst-case expressive typical-case efficient parsing is easy... almost a restatement of how to do LL(k):



The proper way to state this might be an interesting form of abstract interpretation, but, in that case, my arrows are upside down.

Now to earn the bacon by replacing the LLker boxes with LRker ones. Way better than proposal writing and apartment hunting. Tomorrow will be my last Philz coffee for a long time -- hello Seattle!

Thursday, September 23, 2010

Affinity Matters

Took me awhile to find this, but to pin the current thread to some core on Linux 2.6+ kernels:


cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET( SOME_CORE_INT, &mask);
int error = sched_setaffinity(0, sizeof mask, &mask);


Using this significantly improved the scaling of my multicore tree visitors. I still haven't found a good way to do it on OS X systems, which only seems to support affinity across units with distinct L2 caches.

I believe it's more general than pinning in that it could be used for affinity by setting multiple potential cores. Tuning for hyperthreads is iffy in that I believe the hardware context -> logical cpu mapping is unreliable (e.g., no guarantee the HTs are contiguously named).

Edit: also, all you need to do is link in -lpthread and include "pthread.h" ; no more funny business.

Wednesday, September 8, 2010

Back from San Diego

Seth and I invented an awesome new multicore+simd parsing algorithm on the flight back. Sweet.

Monday, September 6, 2010

One scalable memory allocator later...

... and we now have great utilization on a quadcore. Colors represent sequential workload times in milliseconds: we're trying to speed up a bunch of tiny passes.



The first two dips occur at socket jumps, the second dip worsens due to Amdahl's law (a buggy work partitioner), and from there the performance soon peaks before burning out.

An interesting challenge I don't see often discussed is that I'm benchmarking the solving time and I needed to pull quite a few tricks to warm up before then (time isn't included here). Some time before a pass actually starts, e.g., near the end of the previous pass, the final implementation should actually start warming it up. That's a tuning nightmare! However, more fun, I want to first experiment with tree SIMDization.