Particularly fun in our submission is a 4x speedup on a canonically sequential algorithm as well as some more browser hardware acceleration tricks. Weirdest yet, in our approach, the clustering ratio of how well you can relate objects in a data parallel computation is inversely proportional to our expected vector etc. speedup! Somehow, it all makes sense.
Sadly, we couldn't do our 2 coolest algorithms: one for legal reasons, the other because the vector hardware I have is missing a permute instruction. So goes.