Thursday, May 8, 2008

JavaScript Rant

1. Try putting this into your URL bar:

javascript:alert(true + true + true)

Why does this still happen in modern languages? On a somewhat related note, Taras and Dave Mandelin came by to talk about their static analysis (and code transformation?) framework specialized for the multimillion line Mozilla codebase. While many of the problems they are trying to solve only exist because Mozilla is based on an unsafe language, two parts of their approach were still neat: reusing the compiler framework to get around the ugliness of the language, which 3d party tools will mess up in practice, and exposing data through a higher level language. In their case, they use JavaScript, but, as an analogy, DTrace provides a SQL-like language for handling probe data. Most queries are variations of data flow analyses with varying sensitivities, so figuring out a DSL would be interesting. For starters, tree matching ASTs would help a lot. Getting back on topic, what scared me (beyond the insanity of C++) from their talk was a description of one bug they were looking for: the simple use of booleans as numbers has required multiple security fixes.

2. Try putting this into your URL bar:

javascript:alert(function(){return 42;})

This, at first, sounds great. At a minimum, you can use it for better code fixing, but also to violate typical interface checks for some simple hot-fixes. Source-to-source translation for secure code has potential too: you can check whether the function you want to execute has the correct form.

There's a price for this typically trivially used and effectively non-critical feature: I can't guarantee that any of the tools I'm building to automatically make other people's programs better actually create a program that does the right thing. For example, as a first step in a wacky project, I'm exploring all possible interesting states of a JavaScript program. Instead of randomly picking new paths through the program every time I rerun it, I'm working on something(s) smarter. This requires me to compile the program down into a more analyzable form for ease of implementation and to add logging code so my analysis can get the information it needs. However, even adding simple log statements potentially changes the behavior of the final program with respect to the original beyond generating a log. If the original program converts a function to a string, inspects the result, and then acts based upon it, my added lines of code may have shifted around information the original needed. It *might* be possible to detect that some code is trying to do the introspection and interject the correct function text, but, realistically, that ain't happening.

While dynamic languages give a lot of freedom to developers, they also restrict them: it has hard to ensure an invariant, so we must be conservative with our funkier code. This is not just an expressibility issue: performance suffers significantly. Simply removing eval and introspection might be enough to enable significant automated code redistribution. Rich Internet applications can load faster using automated analysis and repacking tools than when we try to figure out script slicing and dynamic load ordering by hand. I found a couple of very nice cartesian product type inference projects for Python (meaning significant code speed increase) last semester, but, to my consternation, the authors concluded that certain dynamic code loading properties of the language destroyed any potential for them.
Post a Comment