Monday, November 19, 2007

Recent thoughts

Variety of thoughts recently, but, unfortunately, very little research has occurred.

At the last minute, I presented a paper on dynamically inferring object roles: finding approximations of type states for objects and determining how objects transition between them. I think it would be *very* interesting to contrast properties inferred from single threaded and multi-threaded execution. There is a trend to verify type-state nowadays, but not much on where those properties come from. If even Daniel Jackson is leaning towards specification extraction, maybe there is something to it :)

Mike said the g word on his blog with respect to Flapjax: garbage collection. There are several issues, all very close to each other in impact, but all conceptually different, contributing to the general sentiment of "yikes":
  • JS1.5 has no weak references, so switched out graph chunks persist
  • Using switch to achieve dynamic data flow graphs sucks
  • Flapjax does not have a component model
  • Conditionals persist both branches
  • Conditionals that don't are confusing
Additionally, there is a lot of object churn: whenever a computation is represented as a data flow node, and another computation is dependent upon it, a message object is sent from one to another. That's a bunch of stack frames and a heap object for every little computation - Kim's PEPM work demonstrated how expensive this gets. Essentially, constant, but significant in large apps, and annoying for animations.

I have several feelings here. Mainstream VMs are not built to support event oriented systems, and annotations for extent make a lot of sense (or should at least be more aggressively inferred, as AJ found). Additionally, while I was sensitive to the memory problem while making Flapjax demos, coding in Flex seemed a lot more manageable because I used FRP to hook fields of components together, rather than create them entirely. Large changes in control were still handled imperatively - but most of it didn't need to be.

I have 3 weeks to get some sort of research result, and then a week to write it up. Normally that would seem like a lot, but I just have no head for "dynamic probable recursive and shape structure characterization".

Monday, November 12, 2007

JavaScript Fun

In the Berkeley compilers course I am TA'ing, we were going to have the students implement a JS1.5 to Python source to source translator. This would be bottom up: they wrote their parser generator previously, so now they would write the convoluted JavaScript grammar (an ANTLR3 grammar took pages), complete with line ending sensitivities (with and without semicolons etc), and then generate python code that would access hooks into a JS pseudo-interpreter. Unfortunately, I leaked the first part by accident in the previous assignment as a test case, and while coding the reference solution of the second, I became bored and unmotivated.

Instead, on Friday evening and Saturday I modified the grammar to support ES4 (JS2) class inheritance, sans namespacing and innerclasses, and wrote the corresponding pretty printer. Today, I added the optional type annotations to the grammar and built the symbol tables necessary for type checking. Tomorrow, the type checker. While parameterized, structural, and nullary types are a bit beyond the scope of the assignment, I did add something ES4 does not have for some inexplicable reason: higher order types. Subtyping functions is a good time :)

Incidentally, on Reddit, there is a big stir right now about garbage collector in Firefox suffering from heavy fragmentation. While people are suggesting many convoluted solutions, I'm wondering if an easy interim solution is possible: take advantage of virtual memory, so that even if there is fragmentation, it'll be paged to disk. This is a big insight of generational collection, but can be achieved without explicit generational collection algorithms.

On a sadder note, Rob mentioned interest in seeing some of the digital artwork I've done through the years (with most of the highlights being concert posters for various flavor of the week rock bands while I was at WBRU), and I realized that, with all the computer migration I've been doing in these past 8 or 9 years, I have very little left to show. Eg, I could only find an early draft of a poster I did for Guster, as well as some high school era photo and flash pieces. Some signed posters are still in a box in my room, and if I can find the cable to my external HD, I may be able to find some source files (I really want to get the one for The Living End poster), I may be in some more luck. Chris wanted my concert photos for the insert art of an live in-studio recordings WBRU album he is producing, and they're all trapped on that darn drive.

Edit 12/05/2007: I could not find any examples online of higher order types in EcmaScript 4, so I assumed there was no support for it. After some scouring, I found an example of defining a function type. Running in the EcmaScript 4 reference implementation shell, we see all is well:


>type bTi = function (boolean):int
>var z : bTi = function (x:boolean):int { return 1; }
>z(true)
1
>z = function (x:boolean):int { return 5; }
>z(true)
5
>z = function (x:boolean):int { return "hello"; }
>z(true)
hello // uh oh
>z = function (x:boolean):int { var s : string = "hello"; return s; }
>z(true)
hello // hmm...
>z = function (x:boolean):string { return "hello"; }
**ERROR** ...


I would have expected a type error on cases where I annotated the return type with "int" but actually returned a string. Going further:


>z = function (x:boolean):int { return 1; }
> type bTiTi = function (bTi) : int
> var c : bTiTi = function (f:bTi) : int { return f(true); }
>c(z)
1
> c = function (f) { return f(true); }
> c = function (f:int):int { return 1; }
**ERROR** ...


So, while returns are not checked against declared return types, function parameters seem to be. I should read Cormac's stuff more closely to understand what the heck is going on - this seems to be a lot of annotation for not a lot of checking.

Saturday, November 10, 2007

A new home, a new blog

... and we're back!

Brinkster deleted my account (shame on them!) and I might have a copy of the old database, albeit a bit stale around the edges (shame on me!), that I can port over.

Recently, in Leo:

Volleyball has been going exceedingly well. I'm not really at the spiking phase, but I can block and play front rather well. Backrow still is a relatively weak for me, but then again, it is for almost everyone else too. We had spectators and beer this time, both of which helped us destroy the undergraduates. Not sure what is in store for the weekend, but hopefully some time in the city (see EunYee before she heads back to Texas? or Ilya now that he's @ Lucas Arts in the Presidio? Or finally go to Mt. Diablo and open up a bottle from Peju... ).

On the more technical side:

1. I began a research project in earnest (or as earnest as I can be if I switch to parallel frp come January). Given some dynamic data, I want to know a mixture of logical and probable truth - the dynamic nature of analysis hints at some over-approximation, except also some likeliness. Right now I'm applying this to basic schema learning (regular tree automata anyone?), but I thought of corresponding a neat trick for bottleneck/contention analysis of parallel systems. If this project works out, I would like to shoehorn it onto an implementation of CouchDB such that you can always get a likely type description of view of your data, and onto a goal-oriented concolic tester (schema evolution and bug finding becomes interesting at that point as well).

2. Last year, in cs164, the students implemented a JavaScript -> Python source-to-source translator. I started to code up a reference implementation tonight, and realized it was silly. Either you do the trivial mapping, or you waste your time. Assuming the trivial mapping, you're still wasting your time. To not waste their time entirely, they would also do some points-to etc analysis, though this year we decided to do some optional type annotations with corresponding checks. So, when coding tonight, I did the intuitive thing instead: I wrote a compiler from JS2 classes to JS1.5 . I'll finish some edge cases in the morning, but I'll also try to shoehorn in an additional embellishment the students should be able to recreate: instead of parameterized type annotations, length: 'a list -> int, expand function definitions: curriedAdd: int -> int -> int. Never really thought of how that plays with subtyping (perhaps something about covariance/contravariance?) but we'll see.


There are also two problems I got stuck on today:

1. Dynamic analysis is currently the way to go for testing parallel apps, often with static algorithms bolted on top where a valid path is found dynamically and reorderings of it are analyzed. That gives you two axis of search: along the path, and off the path. On the path is tractable, but once you're done, you must move on the other axis - except, my suspicion is that you'll likely end up on an 'equivalent' path when you randomly seed again. Last week, I wrote a concolic tester with the following results (# static branches in source covered out of 170 possible branch points, over paths through program traversed):

However, this was without reordering on a path. I'm starting to appreciate this problem more, and this particular incarnation seems particularly thorny.

2. AJ is submitting a neat paper on churn in OO: if there is constant heap allocation in your overly-structured oo program, you might find correlations for exit points, suggesting where you should automatically rewrite the byte code to allocate on the stack instead of heap, and thus garbage collect for free. Part of it raised an optimization problem: in the program tree where each node and edge has an integer value, and a path is scored by its alternating sum product (n * (e + n * ( ... ))) divided by path length, what is the maximal or minimal path? It's trivial without the division, but I didn't see a solution with it (one thought was that if you knew bounds, you could divide to get [0-1] values and thus some convergence argument).


Jono had some interesting intuitions about first class events that felt something like polymorphic exception handling that I want to muzzle over (even though we all know FRP is the way to go!). Maybe I'll pick Ben's brain a bit.