Saturday, September 14, 2013

Sneak Peek for My Strange Loop Talk on Parallel Program Synthesis

I'm excited for Strange Loop (programmable bio!) and Emerging Languages Camp next week. My talk will geek out on the language design idea in Superconductor that makes parallel programming in it such a breeze: program synthesis!

To give a taste, this post takes a synthesis approach to programming regular expressions. The trigger for it was one of the most important issues for programming languages in the 21st century: increasing literacy. My geneticist friend was struggling to transform values in a huge spreadsheet, and telling her to learn regular expressions is a non-starter.

Uphill Both Ways.

Imagine a "quick" sed script to rewrite "Alonzo Church" into "C., A.":

sed 's#^\([a-zA-Z]\).* \([a-zA-Z]\).*#\2., \1.#g'

... if you were sitting over here, you'd be watching my eyes bleed right now. Also, I'm pretty sure that the code is buggy and might even erase my hard drive.

A New Hope.

Emina Torlak and I spent the last couple of days tweaking a synthesizer that can generate the above Sed scripts from just two examples:

   '(("Leo Meyerovich" ("L" "M") "M., L.")
     ("Emina Torlak" ("E" "T") "T., E."))
   `(? "., " ? "."))

Each example specifies a name, what text to pull out of it, and the rewritten result. The last line defines a result template where the "?" symbols should get replaced by some of the input text.**

The "syn-sed" function will generate code that works like the Sed script above but without harming any animals. The relationship is strong: in theory, if you let your computer spin long enough, it will recreate any sed formula. As the tool outputs the desired formula, it also serves as a gentler onboarding to learning and using regular expressions. Going back to my biologist friend, that's a big deal.

Automating Test-Driven Development.

Regular expressions are tricky; we constantly have to test them on examples and tweak them until they work. The synthesis approach asks you to write down the input/output tests (specifications are good, right?), and in return, automates the crazy tweaking for you. If you ever find an example that doesn't work, instead of reverse engineering the regular expression and tweaking it, just add the new input/output test specification to the list of examples.

The regular expressions are just the beginning of what  I'll show program synthesis can do, so see you at the talk!

Note: L* and other algorithms provide inference techniques specific to characterizing a regular language; the Sed example's use of a rewrite rule shows that we want synthesis for all the other DSLs too. For an amazing version of the regular expression scenario, check out Excel's new FlashFill!

**: I bet we can eliminate the result template with some more hacking.

No comments: