Friday, December 7, 2007

Setting Closure Properties and Points-To Analysis

I often return a closure and want to have properties associated with it, but not in any significant way rely upon the scope chain. This is either for optimization - in the hopes the compiler rocks - or because I really do want to expose the field. However, my typical pattern for this is to create a temporary variable with the closure to return, then modify the closure once I have a reference, and finally return the closure. That's ugly.

Some flavors of JavaScript support named lambdas:


var x = function foo () { ... };


However, that is 1) not valid ECMAScript, and 2) ambiguous not consistently supported by browsers (credit Brendan Eich). Does the foo scoped into the body of the function refer to all foos created at that point, or just the instance (so a true named lambda)? Using arguments.callee does exactly what I want, however: a handle for the closure accessible within the body of the function. Modifying fields of the closure instance later in code works as the desired communication channel. As a neat extra (as shown in an earlier constructor vs function post), the source code of the script should also be available via a string coercion of arguments.callee.

The below adds the disabled attribute to a closure in a pretty way. A function is returned with logging on by default, and per-instance disable-able. Doing so via prototypes may get the intensional semantics across better, so the example is mostly demonstrative.


var log = [];
function toggloid (f) {
return function () {
if (!arguments.callee.disabled) {
log.push('hit: ' + (new Date()).getTime());
}
return f.apply(this, arguments);
};
}

var foo = toggloid(function(x,y) { return x + y; });
foo(1,2);
foo.disabled = true;
foo(2,3);
print(log); // hit: 1197084163187


<Correction> After Brendan's comment, I reexamined it a bit more closely. Named anonymous functions should be in JS1.5+ according to the ES3 specification, but they just aren't universally implemented (quick scan online says Safari recently fixed them but IE is still quirky). x = function foo () { ... f ... } should act the same as x = function () { ... arguments.callee ... }, so the closure instance is being bound in both cases ("hurray!"). The named lambda holds through function boundaries, which can be duplicated via function () { return (function (f) { ... f ... })(arguments.callee); }, except, again, for the weirdness of constructors vs functions.
</Correction>


Fun point 1: I gave my last talk today, this time on points-to analysis. I never fully followed details for implementing a Prolog analyzer (exponential blowup) or the CYK parsing approach before, so it was neat going over it. The real "Aha!" moment is when we look at a Prolog inference rule for fields (a little longer than necessary, but illustrative):


%% put/assign/get facts are generated from this program
%% var x = {} %% r1 = o1 %% introduce object + aliasing
%% x.f = x %% r1.f1 = r1 %% field takes a value
%% var y = x %% r2 = r1 %% arbitrary aliasing
%% var z = y.f %% r3 = r2.f %% field value flows out
%% var q = z %% r4 = r3 %% arbitrary aliasing

%% Assume Alias and other FlowsTo fact inference rules exists
%% Final inference rule to show how values flow through fields:
FlowsTo(o, a) :- FlowsTo(o, b), Putf(b, c),
Alias(c, d), Getf(d, e), Alias(e, h)


The corresponding CFG parsing rule would be:


FlowsTo := FlowsTo Putf Alias Getf Alias


Every atom corresponds to a node in the program flow graph. Every edge between two adjacent atom is a ground fact (Put, Get, Assign, New). The inference rules build up the intermediate nodes.

But what are the (atom1, atom2) bits doing?

They actually exist in normal context-free grammars, except we drop them! Normally, they'd be numbers. The root node would be (o, strLen): a relation holds if there is a parse from the beginning atom to the end atom. When we have multiple nonterminals together on the right hand side, we implicitly have the following form: rel1(a, b) rel2(b, c) rel3(c, d). The matching of atoms between relations means the parse is over contiguous text. In Points-To analysis, we want to make sure we're talking about the same objects between relations, and the same inductive argument holds.

The other initial subtlety in the analysis for me was that we're doing CYK parsing on graphs, not linear strings. However, that makes sense. For strings, we first parse piece-wise (adjacent edges). Then, we parse all 3-edges, etc to strLen-edges. To parse a tree, we mean parse all possible paths in the tree. We just do the same thing: all 2-edges, 3-edges, ... etc. How about a cyclic graph? Once we add one edge in the cycle, the edge holds for all loops through the cycle, and we don't need to keep going. Thus, flows-to (and the equivalent inverse, points-to) analysis can be thought of a CFL-reachability on a graph between the producer and the receiver.

Fun point 2:

Kuang bought a XBox 360 Halo Edition for the DB lab. Apparently it didn't come with Halo 3, but the basic idea is there.

5 comments:

Unknown said...

Hey, cool post :) I don't have that last Alias in the equivalent FlowsTo rule I wrote for Java. (I have a separate rule FlowsTo := FlowsTo Assign though). Is that some Javascript-specific modification? --Manu

lmeyerov said...

Yep, that's exactly what I meant by "... Prolog inference rule for fields, a little longer than necessary, but illustrative". Good eye :) The JS fields thing is more in the realm of possible exam questions as it is fairly quick extension to come up with.

I haven't had an opportunity to play with higher order functions in a points-to analysis (or points-to at all, really!). I'm doing some dynamic analysis of JS programs this week, and am finding a shocking amount of regularity, esp. in HOFs, so I suspect it is a tractable problem in practice.

Unknown said...

If you get a chance, you should check out Chapter 3 of my dissertation sometime. It shows how to formulate direct handling of higher-order functions (also known as "on-the-fly call graph building") via CFL-reachability. It's a bit hairy, but kind of cool too I think :) It's formulated for Java's virtual calls, but I think handling first-class functions shouldn't be difficult. Also, you can check out this paper by Fahndrich and Rehof if you haven't before.

Brendan Eich said...

var x = function foo () { ... };

is certainly valid ECMA-262 Edition 3 code. Why do you think otherwise?

/be

lmeyerov said...

Right again :) Function expressions support optional identifiers in ES3 (page 71 of the spec), though the single hit for 'anonymous' (function) in the spec states arguments.callee "allows anonymous functions to be recursive", which, while crafty wording, I cede to your point :)

When using this in 2006, I believe I encountered 2-3 browsers that did not implement it correctly. Searching online suggests IE still treats named function expressions as function declarations, causing namespace poisoning, while an acknowledged related Safari bug appears to have been fixed. I confused "unsupported by popular implementations" with "unsupported by the spec" - it is neither ambiguous nor unsupported by the spec, but unsafe with respect to the casual browser.