The Good Soldier LMeyerov: April 2008

Thursday, April 17, 2008

Research = Homework??

I need to convert JavaScript to static single assignment form. I couldn't find anybody that has already done that, which means it's back to undergraduate compilers courses for me. However, it's for a good cause.

The plan:
Phase 1 (2 weeks). SSA / constraint solver / event simulator (partial order reducer?)
Phase 2 (2 weeks). parallelization + load balancing
Phase 3 (summer): impact analysis, hybrid instrumentation, css, search, Napa Valley
Phase 4 (fall): IPO (or at least a paper)

Tuesday, April 15, 2008

GPGPUs vs Manycores: A Day with the Nvidia CUDA Team

Along with some other very fortunate Berkeley graduate students and professors, I spent a (mostly) revealing day learning about CUDA systems for general purpose GPU (GPGPU) programming at NVidia last Friday. The ragtag of Berkeley folk - architecture, autotuning, program analysis, language design, and operating systems researchers all participating in the ParLab - were well matched by the NVidia team - architecture, library tuning, and compiler speakers, in addition to teammates that sat in the back to help clear up confusion. Much of the day was introductory - most students I know with significant experience stayed home - but given my background, exactly what I had hoped for. It helped me answer what GPUs are, and what they aren't.

Almost two years ago, when as part of my NSF fellowship application, I proposed getting an FRP-like data flow system to run on a GPU, despite my very little understanding of them at the time and the nascency of CUDA-like projects. My reasoning was that GPUs are here, and as the data flow is rather explicit in FRP, I could easily partition a computation over tens of thousands of concurrent threads (the concern is more of what to do with the remainder of computations). I think a lot of people are in that boat right now, but with the interest in manycore, there's a danger that we will confuse which models are appropriate for what. A language designed for a GPU would likely be a lot less constrained and usable when remade for manycores.

GPUs are crazy parallel - but little more! In terms of hardware, there are 8 basic areas. Each area has 2 streaming multiprocessors (SMs). Each SM has 8 streaming processors and 16KB of shared memory ('scratchpad'). This sounds paltry - 8 * 2 * 8 = 128 - but when SIMD and pipelining tricks are factored in, Nvidia sets it at 768 threads per SM. Then, it looks like we have 8 * 2 * 768 = 12288 threads for full utilization.

Sounds nice, right? To put that in scale, my dinky Windows machine is running 82 processes right now, which would give each about 150 threads without needing to swap out. As most processes likely don't need that many, in general, virtual threads really will no longer be so virtual. However, reality starts to set in. Each SM does this funky SIMD-like thing the folks at NVidia dub SIMT where a 'warp' of 32 threads fetch instructions in lockstep. If threads in the same warp diverge, such as one threads takes the true branch of an if and the others don't, all of the others get the true branch instruction and do nothing with it, effectively waiting on the first thread. This wait would ultimately occur for every instruction in the if statement, and if it was a loop, multiplied by the iterations of the loop, etc. So, for non-GPUish programs, we divide thread count by 32: 12288/32 = 384 hardware threads. Luckily, it sounds like the architecture won't punish us with respect to power during such low utilization.

We've still ignored memory. 16Kb of shared memory between 768 threads? That's 2-3 bytes per thread! That's why excess utilized memory bleeds onto DRAM from registers, skipping the scratchpad. This is fine for pixel shaders and some lucky scientific apps, but imagine a multithreaded JavaScript: where would the stack go? The CUDA language extension of C does not permit recursion, likely for this reason. I don't have a feeling for the average stack size of a JavaScript app, but I'm guessing the previous estimate of 384 hardware threads for a JSGPU language would significantly drop further. Life really isn't that bad; at a 100x time penalty, we can go to DRAM, stores are non-blocking, and loads only block at first use. To properly load, however, we need to stripe / coalesce reads in an aligned, contiguous, increasing, and a specifically sized manner: thread 1 gets address 0, thread 2 address 2, ... etc, with fenceposts at the half-warp. I forgot the word alignment and load size, but you get the point; efficient memory reads are hard.

Once you factor in these limitations, and look at the language the CUDA folks have been making, the notion of GPGPU programming, as opposed to traditional general programming, makes a lot of sense. The CUDA team is making a great contribution to the computing community by allowing us to write and run highly parallel data driven code on commodity hardware (... and they're even making G80 racks now, so Blue Gene may need to be careful about where it walks at night). I see writing a language for GPUs to have a different set of concerns as that for a manycore architecture, but still important and interesting, and while these concerns are different in significant ways, there is still overlap. Writing as stackless yet higher order language as possible (take that, Joy!) is an interesting challenge, as is dealing with memory locality. I'm actually still fairly optimistic about functional languages in this context, and even impure ones with vigilant points-to analyses and developer annotations. However, it is important to keep in mind that these would be languages for GPU tasks with low memory / computation ration, including the stuff that goes on the stack.

Wednesday, April 9, 2008

IDL Error when Building Firefox on Windows : Social Error Search

I seem to be using a different computer setup every time I build Firefox. This time, for Firefox 3.05b, Windows XP (but Windows Vista SDK) with Visual C++ 8 (Visual C++ 2005 Express Edition).

I followed the instructions on the Firefox build page: download source, install the Mozilla build tools, install the Windows SDK and also Visual C++, create a .mozconfig file, and then run 'make -f client.mk build'. And then, big surprise, I got: "midl : command line error MIDL1001 : cannot open input file oleacc.idl". This was somewhere around the ia2 folder with a bunch of Accessibility IDL files.

Long story short, I searched around the website and found the fix of extra things to disable in your .mozconfig file.

My final .mozconfig is as follows:

. $topsrcdir/browser/config/mozconfig
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/ff-build
ac_add_options --disable-optimize
ac_add_options --enable-debug
ac_add_options --enable-static
ac_add_options --disable-shared
ac_add_options --disable-libxul
ac_add_options --disable-tests
ac_add_options --disable-accessibility
ac_add_options --disable-activex
ac_add_options --disable-activex-scripting
ac_add_options --disable-xpconnect-idispatch

I forgot to add --disable-mochitests which would have been a good idea as well. Only --disable-accessibility and the ones below it are relevant to the fix; the rest speed up basic building and debugging.

Awhile back I wrote a little bit about search and error messages, whether by searching for a fix (taking your code from not type checking to type checking, or finding signatures for error profiles and building wikis out of them). An idea for awesomeness: inferring the search hit that fixes the bug. When trying to figure this build error out, I basically started a search session with the exact error message, and then went into a feedback loop between search engines and web pages to refine answers, until, finally, I finished and my searches were about something completely different.

Some other first-years are trying to work on semantics-aware code search, which is neat (I admit Dave's type graph work in Prospector is more in my style than their natural language approach); I want better searching of errors. Until somebody figures out a better way, I'll keep writing posts of brain dead errors like these that take a disproportionate amount of time to figure out to Google. Perhaps it's worth trying to streamline this, say by linking my browser search history to compiler output.

Monday, April 7, 2008

jQuery et al Considered Harmful

Hopefully posts on lessons from Japan, thoughts on realistic collaborative computing, and costs of proxy servers later, but for now: what's the deal with the chaining so popularly touted in JS frameworks?

In 12 year old Dmitri Gaskin's Google TechTalk on jQuery, he shows the basic model:

$('DOM collection selector').
filter(fn).parents().selector('some refinement').
customEffect(params).function(fn);

Here, you build up a collection of display objects by traversing downwards using a path selector, filter out some things, then go upwards in the tree and back down again, apply some custom jQuery effects (.animate(basic,tween,parameters)), and maybe actually more, such as by applying some custom function fn to everything, and, to top it all off, allow it all to happen in any order.

Neat, right? The framework endorses a style where DOM selections are threaded between side effectors and selectors. Upon seeing this touted as a big feature, I gave it more thought than I did before: it's sugar for ";", "with", selectors, and assignment. Later in the talk, the "ends()" function was introduced, indeed highlighting the stack based style of this (mixing "with" and ";").

Let's compare a JS/XPath fragment to a jQuery fragment:

with (body.selectNodes('td/a')) {
for (var i in this) i.style.backgroundColor = '#CCC';
for (var j in this) with (j.parent)
if (this.style.height > 100)
this.move(200);
for (var k in this) k.move(300); }

with

$('td a').
css({backgroundColor: '#CCC'}).
parents().
filter(function(d){ return d.style.height > 100; }.
move(200).ends().
move(300);

The above baked-in imperative style isn't very modern. XPath is dissonant with CSS selectors more common to DOM selection, so I'll reuse jQuery there (Prototype et al work here as well). Point-free style in JavaScript is verbose (think the lambda term in map/filter/fold), so I'll reuse jQuery shortcuts. This is common too; we have a version for Flapjax extended for time-varying values to let you write:

div(
{width: $E('click').collect(add1, 0)},
'click to expand!');

So, in a more modern style, but still without the glue:

with ($('td a')) {
this.css({backgroundColor:'#CCC'});
with (this.parents())
if (this.style.height > 100)
this.move(200);
this.move(300); }

As I learned in Flapjax, using such dots is a poor man's encoding of macros (ex: we lose syntax tracking). We can pass whatever extra info around that we want as part of composition. In the case of older school non-FrTime-esque FRP, we could run exec at the end, though it is implicit in Flapjax. Thus, we're really talking about syntax and style here in terms of how to thread 'this', assuming you're ok with the problems of 'with'.

The difference between the modern JS version and the canonical jQuery version is that we use the language to show context in the former, but not in the latter. To determine the context of 'this' for any particular statement in the former, 'just' follow the with (or function) statements at the top of automatically indented blocks.

Two things make this difficult in jQuery.

The ability to mix selectors and effects in a chain makes it more of a chore to pick out what is actually being selected for a particular effect. A type-based solution to providing support for this doesn't immediately jump out at me, but I suspect something using variants on returns might be doable for IDE support once JS2 arrives. More custom program analysis might be feasible, as would custom (but unsafe) baked in IDE support, but I don't see any of this happening in practice.

The usage of ends makes this stack based, which introduces a cognitive load of where in the tree you are, not just what to look at from the preceding chain. The style promoted by allowing ends makes legibility go down the drain. Again, blocks, as used in the imperative form, make this explicit.

Hypocritically, you might argue, we took the dot strategy in Flapjax/FIRE!

For example, to get a count of how many bunches of rapid clicks occurred (with at least 300ms breaking apart each bunch), we would write:

var x = $E('click').calm(300).collect(add1, 0);

The difference between this sort of code, and that of above, is that we aren't bringing assignments into the picture. Something like ends or multiple effects would make no sense as we are building one value: the point is that we reuse the language. In our case, pipe and filter style of shells provide a nice explicit sugar for the threading of the stream fragment to be transformed:

var x <- $E('click') | calm 300 | collect add1 0;

Each piped expression creates a new value to be used by the next, so there is no backtracking, and to have an observable effect, you must assign the resultant value somewhere.

Further, you might argue that this already is doable in JS! In JS (and thus Flapjax), you could write:

var x, y, z;
var z = bar(z = foo(y = bar()));

But that doesn't mean you should -- I find more let* and destructuring sugar more legible due to a clear separation of left and right values. In practice, I at most see "var x = y = foo()" style usage. As noted, adding in the vagaries of assignment (point-free jQuery assignment sugar) and branching makes this even worse.

The title of this article was too strong: it is the glue promoted by jQuery style libraries that I find suspect. The dynamic nature makes IDE support difficult, and at the syntactic levels, these frameworks promote saving a few keystrokes at the expense of significant code hints in the glue layer. As APIs, they are incredibly clean -- point-free code is verbose in JS (so we promote common forms in Flapjax as well) -- and different path selection methods across browsers have different speeds, so a library to pick which is used is currently for the best. Further, language integrated querying is a big issue -- query reuse, within the client and event between clients and servers, still hasn't really been addressed yet -- so I think it's still unclear how to integrate it in the long run. However, mixing effects, selectors, and branching a la ends in the currently popular manner seems to be against the simplicity principles promoted by JS frameworks.