The Good Soldier LMeyerov: May 2009

Wednesday, May 27, 2009

Exciting times!

Thought I'd post a quick status update. I will not actually be here this summer!

1. Browser stuff. Over the next month, I'll be bouncing around and hopefully finishing the initial version of my parallel web page layout algorithms. In the fall, I want to make sure it's all stitched together and then might switch into thinking about adaptivity or, even more general, parallel scripting.

2. Webpage model extraction / exploration stuff. After the browser work reaches a good state (PPoPP?), I'll be rewriting and scaling out our blackbox analyzer and will make it directed. If collaboration works out, there'll be some interesting twists (either a new type of analysis or integrating and expanding some earlier whitebox ideas)

3. Summer! Something mysterious at Microsoft Research about browser security. I'm guessing/hoping a principled clean-slate approach or some program analysis.

One of many flights start tomorrow.

Monday, May 25, 2009

Mixing thread-aware and thread-agnostic code

For almost all of the algorithms I've been playing with for the parallel browser, Cilk-style parallelism matches. My development pattern is to do a sequential version, do a Cilk++ sketch, and then, for final tweaking, convert to TBB. (... and a lot of iteration involving hawkish monitoring of KCacheGrind statistics). However, invariably, something always goes wrong.

This week, it's using task-parallelism with a multi-threaded library. Task parallelism gets you away from the notion of a thread: whenever you have a unit of work, you just spawn it off, and thus may have many tasks for only a few processors. With threads, assuming you're CPU bound, you have as many threads as processors. FreeType2 is written for threaded use: each thread gets its own Library. However, task parallel usage (I'm rendering a bunch of glyphs: turning one character into a pixel can be thought of as a task) doesn't map nicely -- if I were to create a Library per task, I'd have to create thousands of Libraries instead of, say, 8.

The naive solution is to set up a resource pool: a task asks the pool for a Library when it starts, and returns it when it finishes. If there is no Library available, it gets created. If tasks are really small (e.g., individual characters, as opposed to, say, words), there'll be a lot of chatter when trying to get these Libraries (and, even if not, locks waste cycles, which is still a penalty proportional to task size).

TBB, because it is a library level solution, has actual task objects (Cilk should just puts some sort of continuation mark on the stack) and therefore faces the same problem all the time. It provides conveniences for reusing task objects (think of it like a manual TCO or trampoline). When reusing a task, a good habit can be to reuse data within it. In this case, when a task completes, it passes off its Library object to the next one that gets/becomes the reified task object.

Unfortunately, I don't think the code will work out that well. In reality, there's a hierarchy of resources (Library -> {Font}, Font -> {Glyph}). It'll work, but the impedance mismatch will cause some slowdowns.

Wednesday, May 20, 2009

collaborative security

Was watching a video of Aza Raskin and, around 18:00, I got excited. Can we treat security as a people problem?

I've been mulling about this both in my work in overcoming data silos and in extracting models of applications. In the former, the user might want to add extra security to an app like google calendar, say by doing special permissions on for a particular event or even encrypting data before Google sees it, and, in the model extraction, I'd like users to pool their models together to collaboratively get bigger ones -- but I don't want stuff like bank account info to leak over. This latter problem occurs slightly differently in some of my work in mashup security: can we trust an extension to translate a webpage, but not, say, leak a bank account number?

Everyone, including Aza, bashed on the UAC: we can't just pepper users with dialog boxes. We really want things like blacklists That Just Work. Aza asks, just as we might trust a smart nephew to buy us a computer, might we trust one to figure out security for us? In the absence of a smart nephew, can we learn the security policy? What do cautious people normally say to a dialog box? Is there a bit of information on a page that users generally mark as privileged?

In three of my projects so far, I've found cases where I didn't think the application writer could a priori determine the appropriate action, yet doubt that the casual web user can either. What would it mean to build a browser or application extension that outsources security?

The magical incantation

//height in font units => height in pixels
int height = ((*(cachedFont->face))->height / ( (int) (*(cachedFont->face))->units_per_EM)) * fsizev * (RESOLUTION / 72);

Sunday, May 17, 2009

extension idea

Turn any html table into a spreadsheet.

I wanted to average some columns in a web page I was looking at, but Excel no longer copies/pastes correctly :(

Tuesday, May 12, 2009

Android != Web

I get asked three questions pretty frequently when I mention I'm trying to parallelize web browsers as a way to make phones faster.

First, folks ask about Chrome. No, not like Chrome -- parallel processes might as well be concurrent; the point there was OS/hardware enforced address and maybe time/resource scheduling separation to provide security guarantees.*

Second, what about V8, Tamarin, Tracemonkey, etc.? These are awesome and I wish I had skills like that. However, two caveats. First, most of the time in a browser is not spent in the JavaScript runtime. Therefore, despite being a language geek, that's not what I'm working on speeding up. Second, Proebstring's Law tells us that compiler speedups give us 4% speedup a year while new hardware give us 60%. Now that JavaScript is getting serious compiler attention (e.g., not being interpreted), I wouldn't be surprised at maybe another 2-3x speedup over the next couple of years. However, then it'll reach a similar state to Java and Proebstring's Law will apply. If you consider that we can effectively get an extra core of performance every year or two, perhaps maybe we should listen to Proebstring and take advantage of the hardware.

Finally, folks wonder about the iPhone, Android, and their relation to the web. A surprising thing here is that, despite Google and Apple both pushing the web as a platform (viz. Chrome and Safari, respectively), they are also torpedoing it. Contrary to mass perception, my technology-driven understanding is that Android and the iPhone SDK are anti-web. Currently, they're a necessary evil for performance reasons, but they are also distinctly outside of the web ecosystem. I am working to make high-level domain specific web languages (e.g., CSS) fast enough to avoid the need for a return to such lower-level systems.**

Back to work...

*Interestingly enough, the Chrome security model isn't good for say, mashups or extensions, and there are faster ways to achieve what it is currently being used for (also, in part, researched by Google!). I think pragmatism, such as for time-to-market concerns, had a sad impact here.

**Android is interesting and important for many other reasons, such as opening up phone functionality and a push towards rethinking the integration of a browser into an operating system.

Sunday, May 10, 2009

Hurray, a paper acceptance!

In the words of an esteemed collaborator, "DisneyWorld, whoooo!". [OOPSLA, 2009]

2 out of 4 so far this year (hopefully, after a lot more polishing, be 4/5 by the end of it!):

This paper presents Flapjax, a language designed for contemporary Web applications. These applications communicate with servers and have rich, interactive interfaces. Flapjax provides two key features that simplify writing these applications. First, it provides event streams, a uniform abstraction for communication within a program as well as with external Web services. Second, the language itself is reactive: it automatically tracks data dependencies and propagates updates along those dataﬂows. This allows developers to write reactive interfaces in a declarative and compositional style.

Flapjax is built on top of JavaScript. It runs on unmodiﬁed browsers and readily interoperates with existing JavaScript code. It is usable as either a programming language (that is compiled to JavaScript) or as a JavaScript library, and is designed for both uses. This paper presents the language, its design decisions, and illustrative examples drawn from several working Flapjax applications.

I'll be keeping quiet on what my next thoughts on it are for awhile (three biggies on my mind, however: sharing, search, and parallelism) -- hopefully the next public things will be sometime next winter. Edit: one last biggie. Live programming.

Wednesday, May 6, 2009

software licenses

Finally, a reason to use types.

Tuesday, May 5, 2009

Hurray for SPJ

Just one tidbit from SPJ on writing papers:

Every review is gold dust

Be (truly) grateful for criticism as well as praise

This is really, really, really hard

But it’s really, really, really, really, really, really important

Whenever I prepare a new talk or am writing a paper, I always check back to his hints. Dave Patterson also has some useful advice on how to have a bad career.

On a more fun note.. starting to architect the world's fastest font rendering engine.

Sunday, May 3, 2009

information flow security

This has been bugging me for awhile: while I find the research questions surrounding information flow fascinating and important, when did it transform from a program analysis idea into a usable, deployable security one? It seems just like STM: there's been a huge emphasis on it in the community, but shockingly relatively little analysis of wide-scale deployments -- apparently even the browser guys at MS & Mozilla have sunk their teeth into information flow now.

I get it for low-level systems: if you can run an analysis to detect leaks just like you do for buffer overflows, great! However, when I start thinking about something like mashup security, its lossy/conservative nature seems like an awkward fit, in which case we're back at square one. Worse, when I type "information flow usability" into google, the result is about type inference.

Information flow analysis is a fundamental program analysis question. Building an information flow type system / language / etc. is a principled research approach. While I'm still enthusiastic about adding something like gradual types to a scripting language (for both performance and correctness -- and I view it as a fairly conservative extension), I'm worried about adding qualifiers for information flow: relying on this is much deeper and I'm not convinced expressiveness and usability have been adequately investigated (though the Jif guys do have some interesting case studies there). Again, if you can check a property using information flow analysis, that's great, but I'm surprised by the emphasis on static/type support for it, and have no clue as to how far it gets it. Are we over-optimizing something tiny or does it hit some sweet spots?

Maybe I'm the only one. Next week will be interesting..