Friday, December 7, 2007

Setting Closure Properties and Points-To Analysis

I often return a closure and want to have properties associated with it, but not in any significant way rely upon the scope chain. This is either for optimization - in the hopes the compiler rocks - or because I really do want to expose the field. However, my typical pattern for this is to create a temporary variable with the closure to return, then modify the closure once I have a reference, and finally return the closure. That's ugly.

Some flavors of JavaScript support named lambdas:

var x = function foo () { ... };

However, that is 1) not valid ECMAScript, and 2) ambiguous not consistently supported by browsers (credit Brendan Eich). Does the foo scoped into the body of the function refer to all foos created at that point, or just the instance (so a true named lambda)? Using arguments.callee does exactly what I want, however: a handle for the closure accessible within the body of the function. Modifying fields of the closure instance later in code works as the desired communication channel. As a neat extra (as shown in an earlier constructor vs function post), the source code of the script should also be available via a string coercion of arguments.callee.

The below adds the disabled attribute to a closure in a pretty way. A function is returned with logging on by default, and per-instance disable-able. Doing so via prototypes may get the intensional semantics across better, so the example is mostly demonstrative.

var log = [];
function toggloid (f) {
return function () {
if (!arguments.callee.disabled) {
log.push('hit: ' + (new Date()).getTime());
return f.apply(this, arguments);

var foo = toggloid(function(x,y) { return x + y; });
foo.disabled = true;
print(log); // hit: 1197084163187

<Correction> After Brendan's comment, I reexamined it a bit more closely. Named anonymous functions should be in JS1.5+ according to the ES3 specification, but they just aren't universally implemented (quick scan online says Safari recently fixed them but IE is still quirky). x = function foo () { ... f ... } should act the same as x = function () { ... arguments.callee ... }, so the closure instance is being bound in both cases ("hurray!"). The named lambda holds through function boundaries, which can be duplicated via function () { return (function (f) { ... f ... })(arguments.callee); }, except, again, for the weirdness of constructors vs functions.

Fun point 1: I gave my last talk today, this time on points-to analysis. I never fully followed details for implementing a Prolog analyzer (exponential blowup) or the CYK parsing approach before, so it was neat going over it. The real "Aha!" moment is when we look at a Prolog inference rule for fields (a little longer than necessary, but illustrative):

%% put/assign/get facts are generated from this program
%% var x = {} %% r1 = o1 %% introduce object + aliasing
%% x.f = x %% r1.f1 = r1 %% field takes a value
%% var y = x %% r2 = r1 %% arbitrary aliasing
%% var z = y.f %% r3 = r2.f %% field value flows out
%% var q = z %% r4 = r3 %% arbitrary aliasing

%% Assume Alias and other FlowsTo fact inference rules exists
%% Final inference rule to show how values flow through fields:
FlowsTo(o, a) :- FlowsTo(o, b), Putf(b, c),
Alias(c, d), Getf(d, e), Alias(e, h)

The corresponding CFG parsing rule would be:

FlowsTo := FlowsTo Putf Alias Getf Alias

Every atom corresponds to a node in the program flow graph. Every edge between two adjacent atom is a ground fact (Put, Get, Assign, New). The inference rules build up the intermediate nodes.

But what are the (atom1, atom2) bits doing?

They actually exist in normal context-free grammars, except we drop them! Normally, they'd be numbers. The root node would be (o, strLen): a relation holds if there is a parse from the beginning atom to the end atom. When we have multiple nonterminals together on the right hand side, we implicitly have the following form: rel1(a, b) rel2(b, c) rel3(c, d). The matching of atoms between relations means the parse is over contiguous text. In Points-To analysis, we want to make sure we're talking about the same objects between relations, and the same inductive argument holds.

The other initial subtlety in the analysis for me was that we're doing CYK parsing on graphs, not linear strings. However, that makes sense. For strings, we first parse piece-wise (adjacent edges). Then, we parse all 3-edges, etc to strLen-edges. To parse a tree, we mean parse all possible paths in the tree. We just do the same thing: all 2-edges, 3-edges, ... etc. How about a cyclic graph? Once we add one edge in the cycle, the edge holds for all loops through the cycle, and we don't need to keep going. Thus, flows-to (and the equivalent inverse, points-to) analysis can be thought of a CFL-reachability on a graph between the producer and the receiver.

Fun point 2:

Kuang bought a XBox 360 Halo Edition for the DB lab. Apparently it didn't come with Halo 3, but the basic idea is there.
Post a Comment