The Good Soldier LMeyerov: December 2007

Monday, December 31, 2007

Demand Driven Refinement Types?

I was a little bored last night, and while drifting through thoughts, figured I should take a look at what all this refinement type stuff was about so I'd have a better feeling for when to give it a real look.

The first example in Tim Freeman's thesis "Refinement Types in ML" demonstrates a (recursive) function to pull out the last element in a linked list (well, cons cell, I'm doing element). Using ML pattern matching sugar, we can cleanly describe the recursive and base cases, and ignore the case of an empty list (with no last element) as it is undefined in the function:

datatype 'a list = nil | cons 'a * 'a list
fun lastelt
cons(val, nil) = val
| cons(_, rest) = lastelt rest

The type of lastelt isn't " 'a list -> 'a ", because lastelt isn't defined for nil lists. We could say it is defined on non-nil lists, which is fair. How about if we returned the second to last element in our function? Given our datatype, we want to cut it up a little differently, and define refinement types. The thesis then shows a refinement type to augment the above code:

rectype 'a empty = nil
and 'a singleton = cons('a, nil)
and 'a long = cons('a, cons('a, nil))
(* ... however many other refinements ... *)
and 'a all = bottom (list) **

Now, we can say which refinements our function is defined on. In spirit, at least, refinement types provide a way to guide abstract interpretation style type inference: we track how operations (constructing cons cells and destroying them) interact in terms of our rectypes. It provides a separate, but similar, axis of hints for our inference algorithms.

While I was originally keeping this work filed away as being a potential tool to look into if I were to work on mutation/references or expressive type systems, it seems relevant to much more, and pose a few questions.

For starters, in the case of refinement types for recursive data structures, are explicit refinement types really necessary for an inference algorithm? Evan has been rocking recently with shape analysis via inductive proofs using separation logic (coming to a POPL near you); an analogue might be possible here. If inference fails to unify due to a missing case, expand to catch it. The trick then would seem to be on how to detect fixed points when expanding the refinement types. This probably stinks algorithmically, but it's a start :)

Anyways, refinement types seem neat, and this has been a pleasant reminder in why examples help, choosing a good example is important, and why, given time, I prefer to read a thesis over a conference paper. As for the example bit, a good example should help demonstrate not just the solution, but also the problem. I'm struggling a little with that in my paper right now - paper length requirements force you to think :)

**Even without having read significantly further, I suspect this last line is unnecessary wrt the lhs, and the rhs seems odd though sound in the lack of parameterization, but either way, it isn't too relevant.

Saturday, December 22, 2007

Why I Live in the Bay Area

Special emphasis on live.

QUINTESSENCE Five elements. One fantastic journey.

"The show is a spectacular feast for the senses that combines modern dancing, Russian folk music, and clownery that together create a truly unique theatrical experience."

Quintessence features the combined efforts of several renowned performing troupes:
• World famous clown-mime theater Licedei,
• Acclaimed Russian ambient folk singers Ivan Kupala,
• Pioneering folk-modern troupe Firebird Dance Theater,
• California's home grown Infamous Siberian surf rock band Red Elvises

I saw a voicemail from Ilya (geek in residence @ Lucas Arts now) about this 15 minutes ago, and, starting 5 minutes ago, I'm a proud ticket holder.

As an alternative list of why I like my life style here, my tentative winter break plans:

Take another stab at dependent type theory
Work on my guided transparency + imperative FRP paper
New Years in NYC
Fix my structural type-feedback JS analysis
ParLab retreat + poster, start thinking about frp, concurrency, asynchrony, & parallelism
Check out the last day of POPL
Either muck with Haskell or Erlang, attempt a JS concolic tester, or automatically find a particular class of security flaws in Firefox
Start reading papers for prelims (reattempt PI-calculus, Hoare logic, & abstract interpretation?)

<BEGINRANT>
I'm reviewing case studies by our compiler students of using a simplified variant of FRP for a project - very interesting reading. I rarely see PL papers that involve user studies, so this is an interesting experience. Knowing that a <pick-favorite-language>-genius can write a fully featured, verifiable, horizontally scaling web app in <small-rational>K lines of code in a language/framework they designed doesn't help me much. However, what would? Structuring a user study, getting users, and then getting meaningful results, is hard. If the interest is in getting feedback from 'average joe' programmers, not expert programming language enthusiasts, then, in general, I should only expect quantitative, not qualitative, statements from users. From these, I must also guide them in what to answer, and this implicitly already biases them in their answers. For example, I'm interested in how many errors are due to the distinction between discrete event streams and continuously valued behaviours - yet users may incorrectly attribute these because they don't know better. One difficult but effective approach I've recently seen performed by Dan Grossman for his type error finding paper was to actually instrument a compiler to record user actions, and thus he was able to do a detailed postmortem analysis to achieve a good categorization. A time suck, but I believe going this extra mile is part of what separates the scientists from the mathematicians in programming language theory. At least in the conferences proceedings and journals I tend to read, the latter far outnumber the former, and I see it as a field of artists with similar taste ("schools of thought"), not actual scientists.

At this point in time, I feel we need way more of the former than we currently have, despite the freedom of the latter and their tendency to focus on foundations (which I believe is still important). Even in good old program analysis, where one can compare the results of one's tool against others, it seems most papers make a token, biased effort - Lhotak's "Comparing Call Graphs" demonstrated to me recently again just how susceptible analyses are to environmental conditions, stressing the pains one ought to go to in preparing fair comparisons. The lack of such care in many others papers in supposedly 'top tier' conference papers embarrasses me as someone trying to enter the field. Whenever I review a paper and see someone include this sort of result (or neglects it), I am always tempted to brush it aside as no new technique is presented, though when I step back, this information may very well be much more useful to others than whatever technique is actually being presented, as it helps guide future focus. </ENDRANT>

Monday, December 17, 2007

Save an Excel Chart as a Picture, a SE Problem?

I don't want to post many of these, but as I ended up making some charts of the evaluation of my work in Excel instead of in R (sorry R folks - you need to make more friendly graphing libraries), I needed to get them into latex somehow, and there is no export or save button in Excel. Google told me how to do it via scripting, but that is silly.

So: if you want to export or otherwise save your excel chart as a jpg, png, eps, or whatnot, all you must do is copy the picture (ctrl-c ctrl-v) into PowerPoint and right click on the image there to get the expected "Save as Picture" dialog. Hopefully Google will pick up on this tidbit for other aggrieved users out there.

On a more theoretical level, Mira Dontcheva wrote a paper on macro-learning awhile back and some UTA folks on better error reporting for black box components using a social scheme that would have been a useful way for MS developers to realize that this stupidity exists, assuming it was not intentional. Features being used, not even how, is valuable information for developers - internal Adobe teams seemed to be going ga-ga over availability of such data in certain contexts - and I think it does reveal another interesting question for achieving this in a lightweight manner: assuming you're working on a popular product, can you infer feature requests by mining google searches for how to do something? Sort of like the black box testing paper, but without the heavy instrumentation and sneaky, expensive tracking.

Tuesday, December 11, 2007

Research Blogging, Dogfood, and an Extreme Distaste for Programming

I keep a list of "Leo's ideas for awesomeness" as part of my gmail account - whenever I think of something neat, I continue the conversation with my self by adding a one sentence addendum. Would it be appropriate to post such ideas here for potential discussion or a lucky google search by someone pondering the same thing? Probably not - a professor I respect critiqued another grad student in our field for keeping such a blog, mainly because we are in a business of ideas and showing our cards is dangerous. I suspect he meant this as a matter of being scooped - but there is also a potential for miscommunication, and blogs are largely a one-way medium (that's why the read-write web is scalable: we read more than we write).

However, while coding a toy dynamic analysis tool these past couple of days, I've been rediscovering Java. In particular, java.utils.* . When writing a concolic tester in OCaml earlier this semester (I wanted to learn a bit about their story on parsing), I used named records (data types), hashmaps, and lists. That's it. In some 1000 lines of various search algorithms, persistent storage manipulation, parsing, and data mining, I did nothing fancy - I came close to writing a trie (prefix tree) at one point, but decided against it midway through writing it. However, my new tool creates an augmented Rhino JS shell with hooks for my analysis. I had to write Java code to pull out my data, so at that point, I decided to quickly write my Java to process it, and then dump some to stdout/stderr for usage in bash/R. The guiding light here has been of ease. Statistical analysis is fine in R, but to generate the numbers, I do processing in Java, and there I use ArrayLists, HashMaps, and HashSets.

So, in the process, I've noticed three things.

First, doing OO in Java is burdensome, especially without an IDE. To be clear, I don't mean this as a FP vs OO argument. A lot of my transformations are tree transforms; to get type state represented correctly, each transform should probably create a new type of tree, but that's burdensome. ANTLR has some nice syntax for simple localized transforms, but that doesn't transfer over to my problem. Additionally, when instrumenting my data analysis to peak at what was happening with data as it went through the interpreter, I was repeatedly shot down due to namespaces. My code should be viewed as an augmentation, so I put it into a different namespace and thus can only access public fields, and that just wasn't sufficient. I resorted to making my shell part of the Rhino namespace. I'm not sure how typical AOP scripts function when applied over multiple namespaces, but I wager serious ones get hampered (or ignore them somehow).

Next, while I started with a few LinkedLists, iterations over code to augment capabilities (remember, I wasn't extending due to verboseness), I had the opportunity to eye-ball and clean a lot of code repeatedly. Thus, my LinkedLists were, one after another, replaced with ArrayLists. Factor in iterators that do not allow concurrent modification, all of my iterations were maps or folds with fairly separable side effects (transparent parallelism, please!).

Finally, while I could generally specify how big to initialize the above collections, I was largely left wondering - why? I suspect that, in practice, a mixture of tools like Daikon and DySy could fill it in for you. I'm not sure if this is an optimization that matters, but I do put it in the camp of things that I would prefer my compiler to do, especially as my choice in collection data structure will likely be violated by latter code.

So what's the point? At a superficial level, while my housemate Joe has already made me wary of boolean variables, I am now also suspicious about occurrences of LinkedLists. However, beyond that: the less code you write, the better, and I suspect dynamic optimizations, even if computed at compile time via driving testsuites, are still a big wide-open area in optimization.

At a higher meta-level, the above stresses the importance of dogfooding ("using your own product"). In this case, I'm interpreting it as a programming language junky using a mainstream programming language (instead of reading papers). Spike, one of my favorite professors at Brown (I never took a course with him, but he was always curious as to what was filling my nearest whiteboard), has a policy of doing some serious hacking on Tuesday nights. I always was fearful of when Shriram would write some Flapjax code, because I knew I'd have a full TODO list the next day. While at Macromedia working on Flex 2 and then moonlighting at the research lab at Adobe, while there were many great coders, many of the best feature and platform visionaries I interacted with were ones that also routinely tried using the actual product in real projects.

So, I believe dogfooding is good, and not just because bugs come out and minor features enter: I also get to learn about the domain, empowering future thoughts. I'm not lucky enough to have zillions of good ideas, but I do have a list of bad ones that I continually add to and bug folks about, and hopefully the few that I decide to act upon are worth it. Most of these ideas are probably not suitable for such a blog, but the domain experience definitely is.

Leo's ideas for awesomeness, entry #47.

Friday, December 7, 2007

Setting Closure Properties and Points-To Analysis

I often return a closure and want to have properties associated with it, but not in any significant way rely upon the scope chain. This is either for optimization - in the hopes the compiler rocks - or because I really do want to expose the field. However, my typical pattern for this is to create a temporary variable with the closure to return, then modify the closure once I have a reference, and finally return the closure. That's ugly.

Some flavors of JavaScript support named lambdas:

var x = function foo () { ... };

However, that is ~~1) not valid ECMAScript, and 2) ambiguous~~ not consistently supported by browsers (credit Brendan Eich). Does the foo scoped into the body of the function refer to all foos created at that point, or just the instance (so a true named lambda)? Using arguments.callee does exactly what I want, however: a handle for the closure accessible within the body of the function. Modifying fields of the closure instance later in code works as the desired communication channel. As a neat extra (as shown in an earlier constructor vs function post), the source code of the script should also be available via a string coercion of arguments.callee.

The below adds the disabled attribute to a closure in a pretty way. A function is returned with logging on by default, and per-instance disable-able. Doing so via prototypes may get the intensional semantics across better, so the example is mostly demonstrative.

<Correction> After Brendan's comment, I reexamined it a bit more closely. Named anonymous functions should be in JS1.5+ according to the ES3 specification, but they just aren't universally implemented (quick scan online says Safari recently fixed them but IE is still quirky). x = function foo () { ... f ... } should act the same as x = function () { ... arguments.callee ... }, so the closure instance is being bound in both cases ("hurray!"). The named lambda holds through function boundaries, which can be duplicated via function () { return (function (f) { ... f ... })(arguments.callee); }, except, again, for the weirdness of constructors vs functions.
</Correction>

Fun point 1: I gave my last talk today, this time on points-to analysis. I never fully followed details for implementing a Prolog analyzer (exponential blowup) or the CYK parsing approach before, so it was neat going over it. The real "Aha!" moment is when we look at a Prolog inference rule for fields (a little longer than necessary, but illustrative):

%% put/assign/get facts are generated from this program
%% var x = {} %% r1 = o1 %% introduce object + aliasing
%% x.f = x %% r1.f1 = r1 %% field takes a value
%% var y = x %% r2 = r1 %% arbitrary aliasing
%% var z = y.f %% r3 = r2.f %% field value flows out
%% var q = z %% r4 = r3 %% arbitrary aliasing

%% Assume Alias and other FlowsTo fact inference rules exists
%% Final inference rule to show how values flow through fields:
FlowsTo(o, a) :- FlowsTo(o, b), Put_f(b, c),
Alias(c, d), Get_f(d, e), Alias(e, h)

The corresponding CFG parsing rule would be:

FlowsTo := FlowsTo Put_f Alias Get_f Alias

Every atom corresponds to a node in the program flow graph. Every edge between two adjacent atom is a ground fact (Put, Get, Assign, New). The inference rules build up the intermediate nodes.

But what are the (atom1, atom2) bits doing?

They actually exist in normal context-free grammars, except we drop them! Normally, they'd be numbers. The root node would be (o, strLen): a relation holds if there is a parse from the beginning atom to the end atom. When we have multiple nonterminals together on the right hand side, we implicitly have the following form: rel1(a, b) rel2(b, c) rel3(c, d). The matching of atoms between relations means the parse is over contiguous text. In Points-To analysis, we want to make sure we're talking about the same objects between relations, and the same inductive argument holds.

The other initial subtlety in the analysis for me was that we're doing CYK parsing on graphs, not linear strings. However, that makes sense. For strings, we first parse piece-wise (adjacent edges). Then, we parse all 3-edges, etc to strLen-edges. To parse a tree, we mean parse all possible paths in the tree. We just do the same thing: all 2-edges, 3-edges, ... etc. How about a cyclic graph? Once we add one edge in the cycle, the edge holds for all loops through the cycle, and we don't need to keep going. Thus, flows-to (and the equivalent inverse, points-to) analysis can be thought of a CFL-reachability on a graph between the producer and the receiver.

Fun point 2:

Kuang bought a XBox 360 Halo Edition for the DB lab. Apparently it didn't come with Halo 3, but the basic idea is there.

Thursday, December 6, 2007

ErlyWeb storms the Bay (FP)

Yariv presented his work on a very scalable web framework, ErlyWeb, at tonight's meeting of BayFP.

It was interesting to see an Erlanger's take on this CRUDy world. Some personal highlights:

He practiced the talk on his girlfriend, forcing us to "man-up" as Kurtis would put it
Calls are handled through a pipeline (like HApps commands?)
Controller functions with corresponding view functions (one-to-many)
Component system for server-intensive components (think Amazon.com)
Hot swapping support
He simulated running 80,000 processes - how many can Apache do?
He doesn't directly use Wadler's type annotation/checking tools
Yariv doesn't really unit test

One thing he focused on in the talk was comet support, which is a Big Thing (tm) for AJAX/RIA folks. While some browsers explicitly support it and FRP has a story for how to code the client using it (with or without browser hand holding), it's a little ambiguous on how to write good server code to use it. Here Erlang steps in: lightweight processes with absolutely no shared memory can idle for as long as you want - no harm putting them on to disk, even. This really puts the ball back into the browser's court.

Web frameworks focus on being developer friendly, and while I don't traditionally put Erlang into that camp due to the preponderance of message channels, Yariv's examples impressed me. The presentation further pushed me to wanting to do a little hacking to get a feel for typical code in such systems, and then find message passing patterns that could benefit from sugar like automatic lifting in transparent FRP or how to interface with such a system in the client world.

Additionally, the story on hot-swapping was great. Whenever you write an infinite loop, you do so using recursion. If your loop gets a 'swap' command for the next iteration, instead of doing the normal recursive call to handle it, invoke yourself as a remote call, and voila, hot-swapping. You could write your comet code like this as well.

As a funny point, I'm trying to write up notes for a discussion section tomorrow on points-to analysis. Someone mentioned tonight that method lookups for objects masked by interfaces are not actually constant in Java - bingo, motivation :) [With some glossing over the rules to handle dynamic class loading]. Ras challenged me to show how a dynamic type analysis differs from a points-to analysis, so I'm still considering that one. Usability - perhaps in the guise of the minimum description length principle - sounds lame.

Wednesday, December 5, 2007

Wrapping JavaScript New Constructor with Variable Arguments

I wanted to wrap all of my library functions with a logging utility. However, as JavaScript functions can also be used as constructors, I ran into a problem.

Convention dictates that if a function name begins with a capital letter, then it is a constructor. Fine, so I can determine which functions are constructors. Good. I still want to wrap the constructors, however.

Working through it, I saw that calling new on a function that returns a value will actually return that value. Thus, my wrapper could call new on the enclosed function:

Good.

Now, what to do about the parameters to the constructor Cool?

Luckily, functions have a length attribute. I could write

I don't like that - what happens if the constructor actually took variable arguments? Would anybody really want to use a library as hacky as this? This approach is embarrassing.

The basic hope is that we could use arguments and apply or call (why are they both there again?), except those don't really do anything about the identity of the object. However, that's fine: we can just provide it:

There is one slight subtlety to the above:

h = Hot.apply({}, [50]);
c = Cool.apply({}, [5]);
assertTrue(h)
assertFalse(c)

Cool has no return, so when invoked as a function, it returns undefined, but when invoked as a constructor, returns an object. Hot does have a return in all uses as it cannot really determine what the usage was and we always return an object.

Thus, if you wrap constructor, use it as a function, and count on it returning undefined, you will get the incorrect behaviour and cannot use the above method. The style police should probably revoke your dynamic language usage rights as well.

We can do slightly better, however:

The above sets us up for sending out warnings. If we knew that constructing an object caused no side-effects, in the case of an undefined return on the else case, we may want to emit a warning, etc. We can keep tweaking and optimizing, but ultimately, I think this exposes a slight gap in the semantics of JS1.5 (or reveals what it means to write 'new'). Of course, in AS3, the same problem arises with the Class object. I don't know if that's fixed in ES4.

Now only if I could get blogger to correctly format code...

Edit 12/05/07:

Instead of returning the wrapped function right away, in case some code may write "new Hot(5) instanceof Hot", you should also set "Hot.prototype = f.prototype" (you could do this within wrap).

Tuesday, December 4, 2007

Auto-Lifting Reactive Parameters, Snapshots

Now that our students have implemented a few functional reactive library combinators as well as a compiler to automatically lift functions, we're making them write a simple app. The infrastructure we provided already supports data binding template tags for them.

They must make make a fish-eye menu (think the OSX dock) where images are loaded from Flickr's public feed and rotate every few seconds. They should also display tags associated with the photos as well as some status text indicating that the page is waiting for the next feed between when a request is made and a response is received. My basic implementation of that took roughly 50 lines, including HTML, with maybe another 5-10 lines of that to be very clear about what the variables are, instead of the type of inlining I'd normally do. My new version of the above menu is significantly less hacky and more readable as it wasn't written at 2am this time around. The students need to go a bit further, and instead of using the general random public feed, synthesize a queue of search tags that can be added to, which drives the urls selected for feeds.

Three interesting things came up during this exercise:

Pure FRP

I did not have to go outside of the system to achieve functionality (language mode, not library) - I did not expect this. Their library is a significantly stripped down version of Flapjax, so I had to reimplement 2-3 effectively stdlib combinators, but those were all 1-2 lines of function body.

Trying to define a simple count function failed, linguistically

Consider

function count (n) {
return collect(n, function(acc,v) {return acc + 1;}, 0);
}
count(requests); //count the # of requests in the request stream

Collect is like fold (reduce), except instead of acting on a list of values that entirely exists, it acts on a stream, so as n gets new values, collect emits new values that are a function of that new value and the last previously computed value.

Count is seemingly written correctly: it gets an event stream, and passes it through collect. Collect will not be lifted because it is already defined to work in the world of reactive values. However, the function application "count(requests)" will be, being transformed into "lift(count, requests)". Whenever requests gets a new value, the new value event will be sent to count - meaning the event stream isn't passed. Instead, there are repeated invocations of count on individual request values. Thus, we must annotate the function "count" as being defined to know about reactive values (more formally, specify that the first parameter of count is defined in the 1-arity reactive domain):

function count (n) {
return collect(n, function(acc,v) {return acc + 1;}, 0);
}
count.__doNotLift = true;
count(requests); //count the # of requests in the request stream

However, we're language designers, not function decorating hackers. What are we really trying to declare? The function count is defined to work not just on some 'n', but n such that they are reactive values. Thus, if we really must annotate, let's do it right:

function count ({n}) {
return collect(n, function(acc,v) {return acc + 1;}, 0);
}
count(requests); //count the # of requests in the request stream

The interesting thing here is a function like delay, that takes a value (say the mouse position), and the amount of time to delay any change to it (for example, 500ms). It would be natural to define it as

function delay({val}, amt) {
//something with setInterval
}
var slowMouse = delay(mouse, 5);

However, what happens when the amount we want to delay is also variable? That's when the beauty of the parameter decoration comes in to play:

function delay({val}, amt) {
//something with setInterval
}
var slowMouse = delay(mouse, seconds);

Delay is defined to work on a time varying value and a number. It is easy to lift this to work on two time varying parameters:

function delay2({val}, {amt}) {
lift(function(t){ return delay(val, t); }, amt).switch();
}
var slowMouse = delay(mouse, seconds);

We use lift to unbox the amount, so we could pass it to delay. However, that means we're getting a stream of streams, leading us to switch, which lowers it back to the desired stream. It works, but that's ugly and annoying to write.

The really cool thing here is that, if we use the decorators, our compiler can automatically generate delay2 and you never write it nor think about it - you only have to write as much as necessary for the interesting functionality, and the rest can be lifted automatically. If you annotate all your variables with their time-varying-ness, all of the conversions would be statically done, but if you only annotate function parameters to define on what domain the function works on wrt those individual parameters, the compiler will conservatively insert a version of lift that could do any extra parameter-based lifting as needed. Similarly, you could imagine, instead of writing "__doNotLift", specify "__arity=[1,0]". If you want to be really dorky, imagine what happens when you have arity 2, 3, ... n.

That, in a nutshell, was my summer at Adobe research this year.

Showing the loading status was impressively declarative

Again, this is in the students' toy language, but neat nevertheless. Given a request stream to flickr and a response stream of photo feeds to be rendered, the students must show a loading bar while a request is sent but has not yet been heard from. I first made my status variable as follows:

var loading =
(count(requests) != count(responses));

As new request and response values go through the streams, we can see that a response is being loaded because the counts don't match up. Except... what about dropped requests? This is AJAX, so no error has occurred - the request has not yet timed out, but we're no longer really waiting for it anyway. If we make a new request and get a response back in the interim, we want to ignore that the first one ever happened, though if it did return, that'd be a plus.

The solution is a little longer, but made me happy:

var numResponses = count(responses);
var numResponsesAtLastRequest =
snapshot(requests, numResponses);
var loading =
(numResponses == numResponsesAtLastRequest);

Snapshot, on a trigger event (a request occurring), takes on the value of the second argument at that time (the number of responses we had at that point). Thus, we're loading when the number of responses hasn't changed since we made the last request.

Once they turn their homework in, I'll post my source (executable on our online compiler).

Monday, December 3, 2007

Programming Languages and Cognitive Dissonance

I saw a paper that attempts to formalize a thought that we talked a bunch about back at Brown (hard to turn a corner and not run into a cognitive scientist there - I even brought one with me to Berkeley and he won't leave my apartment!). What is a conceptually simple, or usable, language?

There are several historic takes. A natural language interface is one (code search engines are an interesting new take on that one), and visual programming another (Conal Elliot's tangible programming is wonderful, as is all his work). Mira Dontcheva seems to be somewhere in the semantic web camp, focusing on user-oriented programming, and Rob Ennal's current take with MashMaker seems to really live up to Mira's general goal: inferring the program by demonstration. Traditional PBD failed because the domain was too large, but these two seem to be able to find sweet spots in more limited domains. Conversations with Shriram suggest he sees merits to NLP/ML based approaches either now, or soon.

However, no matter what we do, there will still be code at some level. Maybe there will indeed be a meta-circular PBD system in the future, but I'm not dropping out of grad school after a semester out of fear and taking up plan B with Matt (starting a punk band in Japan). So, how do we make a language more usable?

This paper takes another common tack: coding theory. While it doesn't precisely make sense (imagine taking all programs of the same length, and permuting the assignment from syntax to semantics), the intuition is clear: you need to be able to say what you want without struggling to encode the necessary language. There are a lot of cognitive simplicity principles out there, so I was unconvinced due to the lack of experimental validation in the paper about the particular choice and left feeling the proposal was rather arbitrary. In communication, too concise is bad: repetition helps in case a word was missed. On the other hand, there's the experiment that showed we can have roughly 7 ideas going on at the same time. At this point, compare Haskell to Java to some configuration language - things aren't so simple. Tack on abstraction capabilities, and of what form, and we lose even more correlation with length. Additionally, there's the problem of learning: a DSL can be great, but the learning curve for edge cases in its semantics can be killer, which is a popular argument against macros (which give new control constructs, as opposed to functions sans crazy approaches like Flapjax).

Rather than a mathematical theory, I'm curious about axes, particularly with how orthogonal they are to each other and how much of an impact they have. One breakdown could be:

Data abstraction
Control abstraction
Functional abstraction
Syntactic abstraction
Size (APIs)
Learnability (language levels, similarity to previous systems)
'Smart' defaults (I'm not even sure if this is always good)
IDE (code completion, version control, code generation making writing quick and reading long)
Tangibility (REPLs would be a weak form of this)
Static analyses
Dynamic analyses (mostly as a jab at the static camp)
Search (smart as in Prospector or just google)

Unfortunately, very little PL and SE is done with regards to structured evaluation over actual users. The HCI guys focus on it a lot more, but they're typically interested in the end-user, not the poor developer, and we prefer our formulas and generally don't have the resources nor time to do anything substantial.

JSONOID hacking

I hid myself away this weekend (modulo a late night Halo session with Rob, Ben, Kurtis, Andy, and Kwaaang) for some Rhino hacking. Now I have a JavaScript environment that can execute batch files or act as a shell, and most importantly, let you snapshot objects of interest:

$cat test.js
group1 = JSONOID.makeKey(); //recording group for associated items
function foo (x) { JSONOID.record(group1, x); }; //record something
foo(1); foo(2); foo(['a', 'b', 'c']);

$java -jar JSONOID.jar test.js
%%location 1
XP
v 1 "number"
%%location 1
XP
v 1 "number"
%%location 1
XP
v 1 "array"
v 2 "string"
v 3 "string"
v 4 "string"
e 1 2 "0"
e 1 3 "1"
e 1 4 "2"

I handle cyclic references and flatten prototype chains (so you see inherited edges, but they aren't labeled as being such). I haven't tested function wrapping support yet, but that's library level mostly.

Next step is to do some machine learning awesomeness and integrate it with Flapjax. The former just got a bit annoying because the package I wanted to use crashes when I try to use recursive features (which is pretty key to what I want to do). The latter part is interesting: Rhino is pure ~JS1.7, while a significant portion of Flapjax is about using the DOM safely and cleanly. Luckily, John Resig hacked up a browser environment emulator - in JavaScript - that I can just load in! Problem solved.

Resig's blog post in which I found the library was in relation to an interesting topic: JavaScript on the server. I predicted a few things a couple of years ago with Lee and Jono: a gesture based mp3 player phone with no buttons (too bad that doesn't get me a free iPhone!), and JS on the server. The latter was in multiple contexts so I don't feel as neat about it: security (emulate / check client code before running it), server scripting, and integrated languages. The last part doesn't make as much sense just yet, but next semester will be spent focused on an efficiency part of that equation, and it looks like Steve Yegge's NBL is ES4 and he's approaching the usability side of things with a Rails solution for it. I think that's a short term solution for building big, performant, analyzable apps, but I agree that it's probably the fastest way to go for now.

One piece of sadness: apparently CouchDB is really, really slow. Michael did an afternoon of hacking and outperformed it by a magnitude or two with a very ad-hoc solution using BerkeleyDB. However, the basic concept stands. We'll see what I want to hook Flapjax up to.