Wednesday, January 14, 2009

Tonight's Hilarity

Now we have our two core parallel web page layout algorithms, I'm rewriting the Python prototype to C++ and then will expand the Cilk++ reflow engine with more elements and to include fonts, images, and painting (there goes the next month..). To start with, I wanted to write a quick ~CSS2 parser with lex/yacc.

The CSS specification throws implementors a bone by providing a partial lex/yacc file. Cool, except two slight surprises, which somewhat defeat the purpose:

1. Partial files (tokens and token types were missing) -- no biggie, but an unnecessary hastle

2. The provided grammar is ambiguous. This is reported by lex/yacc outright. I didn't feel like debugging this, and found an explanation for CSS2 (I had been doing CSS1, but will probably just switch to CSS2). Essentially, nestings of empty elements can yield ambiguity.


I didn't really care about the first issue, it just requires learning a bit more about lex/yacc (I've always been using s-exprs or antlr). However, the second hiccup was odd.

At first I was going to brush it off as not important to the spec: in undergraduate courses, you are taught to think about languages as sets, and are always presented yacc-like notations as a way to define a recognizer. However, once you move up to worrying about language semantics, one of the most convenient implementation approaches is a syntax-directed one, which also ties nicely into semantic layers. Making this leap, especially for 'real' languages, strongly benefits from a nice term structure: clear grammars are useful not only for implementing frontends, but also as a safety check when mapping from syntactic domains to richer ones. If you're going to put something into a language spec...

No comments: