Monday, September 12, 2011

What to Implement

A big chunk of the coding side of my thesis is writing a declarative spec of CSS and then automatically generating a fast/correct/instrumentable/etc. implementation from it. However, how much do I really need to implement? More generally, if you're making a tool to compute over webpage style, how much do you really care about? E.g., if you make a proxy-based layout optimizer, which features are important? Full compliance, ACID tests, etc. are often an overkill.

To help pick a subset to focus on, I ran a bunch of popular pages through my parallel browser skeleton and counted dynamic occurrences of CSS attributes. For example, if a browser or user defined style property is that images inside paragraphs get a blue border ("p img { border: blue; }"), and a gallery has 2 such pictures, that counts as 2 hits on feature "border".

For many features, the options -- e.g., blue vs. red, 10" vs. 20% -- are easy to handle. For others, the cases are a big deal: e.g., a table ("display: table") vs. word-wrapping box ("display: block"). So, for some of the more common features, I also broke down the cases.

The diagrams below show first the features and then, for a couple, the cases. It might be interesting to also do a log plot (power law, anyone?). Basically, you can get legible (but wonky) looking sites without much. To get pixel perfect... not so much.

**Note: this isn't terribly scientific. The sample size is small and on popular, professionally engineered sites. Likewise, many of the features are 'default' features set by the browser, and may even correspond to doing nothing. For a bigger scale, also check out Opera's MAMA analysis.

Post a Comment