Sunday, April 8, 2012

Socio-PLT: Principles for Programming Language Adoption

Ari and I are getting ready to send out our first paper on socio-PLT, the sociology of programming languages. We've been performing qualitative and quantitative analysis of language adoption, and I've been thinking about how to apply sociological principles to the design of my own languages. This paper 1) overviews some of my favorite cases that I gleaned from reviewing relevant sociology research over the past few years and 2) connects it to programming language design, both in the scientific design of existing features and the principled and creative design of new ones. Basically, it shows how understanding results from sociology can influence your approach to language design :)

I just put up a draft and would love comments. I suspect the paper may be controversial, but I would like it to be so for the right reasons :) Did we misrepresent anyone or leave out any awesome case studies? Was anything belabored and boring, or terse and tantalizingly vague? This is still in the draft state, so we have a lot to do and would appreciate feedback :)

Socio-PLT: Principles for Programming Language Adoption

Why does one programming language fail and another succeed? What does this tell us about programming language design, implementation, and principles? To help answer these and other questions, we argue for examining the sociological basis of programming language theory: socio-PLT.

This paper presents a survey for programming language adoption principles drawn from various sociological fields. For example, many programming language features provide benefits that programmers cannot directly and immediately observe and therefore may not find compelling. From clean water to safe sex, the health community has long examined how to surmount similar observability barriers. We discuss how principles and techniques drawn from social sciences such as economics, public health, and historical linguistics relate to programming languages. Finally, we examine implications of our approach, such as for the design space of language features and even the expectations of scientific research into programming languages.

Looking forward to your thoughts! (draft pdf)


gasche said...

I liked the paper overall, it is going in an interesting direction. It's a bit frustrating at times because it goes in detail in a zone that is uncomfortable to the academic community (the apparent non-adoption of statically-typed functional programming), and confronts would-be language designers to things that are too difficult and painful to be handled as a hobby (tooling, user studies...).

On the content itself, there are two things that I found myself disagreeing with:

- I don't like the name "sexy types". I had never heard of it before, and I don't like it. It is not a good name because it does not suggests the meaning that you seem to use it for (a "sexy" type would be beautiful, attractive, while you seem to use it for "precise", the fact of more accurately describing the data's invariant), and because some people may consider it offensive.

- I was not impressed by your remark that "closures and object are dual" and the corresponding paragraph (before Hypothesis 1). Closure and objects are related and equaly expressive but I don't think there are "dual" in any relevant sense of the term. There is a clear duality in OO-vs-FP, it is the eliminating-sum-of-data vs. constructing-products-of-functions duality, related to the Expression Problem. That's what I think when I hear "duality" in this context and your use of the word is confusing. In fact the whole paragraph is pretty bad because there is no clear purpose to what is said. I have no idea what you mean to say by comparing "nominal typing and inheritance" and "structural typing and type classes" -- or why you say that functional languages tend to use structural typing, whereas both generative algebraic datatypes and type classes have a distinctly nominal flavour. I'm under the impression that what you are trying to say is that "mainstream languages tend to use OOP so applying design ideas to OOP settings will have more practical impact or ease adoption of said ideas", and this should be said more clearly.

I'm sorry my precise comments come out as so negative. As I basically don't know anything about social sciences I can't comment on the general ideas of the paper, but I found it interesting and thought-provoking. Hypothesis 13 made me think about a "social network" of people uploading aggregated data about the programming errors they make, with the individual incentive of reflecting over one's programming practice and improving it, and the global benefit of allowing language designers to see which errors their design encourage and should maybe more specifically handled.

Leo Meyerovich said...

Thanks Gasche! Those were exactly the sort of things I was worried about.

I lifted 'sexy types' from Simon Peyton Jones ("Wearing the Hair Shirt") and a few other Haskellers (Kiselov). I'll clarify what was meant there -- I liked the name because it shows the enthusiasm :)

You're right about the duality being vague. There's both the simple yet practical level (an object as a recursive function with some dispatch, etc.) and the contentious yet more theoretical issues (at the heart of the technical side of the debate between William Cook and Bob Harper). At the same time, a lot of baggage typically comes with whatever approach you take -- this is part of why Scala and BitC are interesting. A lot of problems and solutions depend on what side you're on, and some even disappear. I didn't want to get into advocacy etc. of any particular idea, so it sounds like I stayed too high-level :)

Agreed about hypothesis 13 / section 4.3. It's an exciting opportunity :)

Chris said...

I haven't looked at this in detail yet, but I've forward this to some colleagues.

This looks quite ambitious -- you might be interested in how we looked as similar questions with adoption of a single language feature: Java generics.

We have a journal version with more data that you might be interested in.

Good luck,
Chris Parnin

Leo Meyerovich said...

Great paper Chris, thanks! Christian and I were talking about this (and the quantitative followup we've been doing) early on the generics project in ~2010 -- I hadn't realize it came out. I especially liked the finding that most people don't create their own abstractions and many uses of generics are simple (e.g., List).

From the perspective of our work, the demographics questions for generics is particularly fascinating. Major refactorings (e.g., for introducing generics) might be explained by only having 1-2 people being responsible, but how did they come about? Where did they learn generics, how did they argue for them? If they are the only ones to use 'advanced' features of generics, why?

Btw, we read your earlier neuro/cogsci paper. I suspect it'd be useful to have a paper similar to ours to clarify what some of the opportunities are. E.g., I found the Big Book of Concepts to be inspiring on the cog sci front, and my time in neuroscience makes me think it'll be more interesting as an HCI question (brain computer interfaces) than a language design principles one (esp. relative to cog sci).

Chris said...

Yes. I would look forward to the explanatory principles that would be generated from this work. The empirical work we did needs to be followed with more formative and qualitative studies.

I did a follow up of the cog neuro paper, "Programmer Information Needs After Memory Failure", that starts to layout a cognitive framework for building programming tools. Right now it's aimed more at interfaces in the IDE.

Good luck!

Blackheart said...

2.2.1 I think you overestimate the sociological influence on the adoption of first-class continuations. There are significant technical drawbacks as well. For example, making continuations first-class manadates evaluation order. Also, implementations that support them suffer a performance loss in all code, even code that doesn't use them. This is, I think, the primary reason that OCaml hasn't adopted them, even though OCaml's implementation strategy is perfectly suited to supporting first-class continuations. Supporting first-class continuations in a controlled way, such as via monads, allows a "pay-as-you-go" policy.

2.2.2 Again, I think the technical reasons point-free programming hasn't been adopted outweigh the sociological ones. In point-free programming the user is forced to specify a particular way to project out what amount to free variables. Pointful programming abstracts over the particular projections and lets the compiler do the work of inferring them. This is analogous and related to the distinction between Forth-like languages and Algol or Fortan-like languages: Forth forces the user to explicitly allocate and organize activation frames; block-structured languages abstract over the structure of activation frames.

3.1 Hypothesis 5. I found this intriguing and I think I agree with it.

3.3 Diffusion of Information. Hypothesis 8. Being "aware" of FP is not the same as being competent at it. In fact, I think many researchers would agree that the benefits of FP are not as important as the benefits of "thinking like a functional programmer". Also, unlike the subjects of the HIV study who agreed that HIV mortality is high, I don't think that most programmers would agree that FP is superior to OO or imperative paradigms.

I might have more comments later as I'm only a couple pages in.

Anonymous said...

"Question 20. To what extent to distinct language communities
have distinct values?"


Vivek Haldar said...

Overall this is a great paper, and I particularly enjoyed the deep citations.

I have some further thoughts here:

Leo Meyerovich said...

Thanks for the comments, will definitely include in the camera-ready. (What a quaint concept :))

Vivek: totally agree with you about environmental and organizational factors. There are many cool studies out there on these, such as the importance and subtleties of inside support for pushing policies across companies and public bodies (congresses, school boards, ...). Originally the paper was a giant Section 3 but we ended up trimming a lot and adding Sections 2 and 4 to make it more friendly and provocative. Now that it's out there, presenting our full literature survey may be more palatable and useful reading :)

Btw, as part of our recent quantitative analysis, we've indeed been asking about what influenced a recent language selection, such as management vs. team decision, legacy code, and client platform. Some fun stuff is popping out. Stay tuned :)