Saturday, January 26, 2013

Adoption-Oriented Languages

Language design decisions should not just increase adoption but also exploit adoption. For example, I just measured how popular languages can rely upon Stackoverflow as a code assistant. The chart below shows that increased language popularity on SourceForge (x axis) leads to faster answer times for questions asked on Stackoverflow (colored bins on the y axis):

Stackoverflow Time-to-Answer vs. Language Popularity. Raw data grabbed a stackoverflow query that modifies G. Winker's, and language rankings from our previous analysis.

Top 10 languages have questions answered within 5 hours over 80% of the time, but for languages worse than Top 50, that shrinks to only 50% of the time. In terms of a simple blocker at work, that's a big deal: you can reliably ask a question, go work on something else, and come back to the answer!

Of course, measuring response time vs. popularity is up to interpretation: it could be that as adoption increaess, people ask easier questions. However, I suspect the chart is actually undercounting the adoption benefit: the more questions are answered, the more people don't need to wait for answers because they can just search for old ones! As one measure, we see that a language's popularity relates to its answer corpus size:

 Stackoverflow Answer Corpus Size vs. Language Popularity. Same dataset as above. 

I had to switch to a log-scale axis for answers! For popular languages, search becomes a valuable tool because Stackoverflow alone provides a FAQ of 100,000 solutions. An enterprising individual might want to further examine the duplicate rate, which I suspect would provide an insightful indicator for the likelihood of a query already being answered by a previous response.

Open source library repos are another case of a feature that improves with adoption: the library of builtins is ever expanding. Likewise, one of my favorite research projects is cooperative bug isolation, which exploits the adoption of a program to help narrow down causes of bugs. (AFAICT, most software companies do some form of tracking error and performance dumps.) A/B testing for feature design is also increasingly common. As another example, in academia, code completion tools are also  increasingly incorporating corpus-based solutions. A corpus can provide more value than type and namespace based solutions because, for example, synthesized code completions will resemble what people would actually write.

I started sketching out the notion of adoption-oriented language design in a position paper last week. I struggled to think of features that couldn't be designed to strengthen with adoption. For example, verification, testing, security, program synthesis, and optimization could all be designed to automatically exploit adoption of either the language, the library, or a program.  What strikes me is how little adoption is exploited today. Outside of the cases, am I missing any that are commonly practiced, or should be?

No comments: