Saturday, May 1, 2010

Back from WWW and the significance of social network research

Came back from my first WWW conference. Given its broad scope, I wasn't sure of what to expect. Below are some notes on the good, the bad, and a bit of an insight I've had about all the social network research going on.

First, the good. As a researcher, often better than seeing a new solution is a new problem.

  • Gregory Conti talked about "Malicious Interface Design: Exploiting the User" -- how advertisers and complicit content providers work against users. Is there a social, economic, or technical defense against marketing? I don't even know where to begin, but at least awareness is rising.
  • Azarias Reda presented his work on "Distributing Private Data in Challenged Network Environments". Privacy seemed to be more about emphasizing the problem -- his talk was really about deploying an SMS-based interface for prefetching in (African) internet kiosks that have very limited bandwidth and thus can't handle synchronous (same browsing session) content requests.

    I've been talking to people about poor connectivity for a couple of years now (... my housemate actively works in asynchronous long-range wireless etc.). My basic thought is that it's a multi-tiered problem. E.g., the poor bandwidth kiosk model doesn't seem architecturally challenged for most content: aggressive caching should eliminate most transfer (and most common threat models), and new data isn't that heavy. A bigger issue is how to do so given developers don't really follow pedagogy (don't separate or label cacheable content) and, in many regions, there is no or only occasional connectivity (think cell phones). I'd love to work on making an asynchronous and occasionally connected web -- imagine having a village-local cache of the entire web that gets updated whenever somebody drives through the town yet still supports AJAX through optimistic/predicative interfaces. So little time :(

Second, the bad. Given the shift to the web from desktops for modern applications, WWW is a sensible focal point for research in browser and web app software. Browser security was fairly well represented and of good quality (... and I generally found these talks to be the more rigorous ones in their sessions) -- WWW seems to be a strong home for them. Given the optimization challenges in browsers and the ubiquity of mobile but slow hardware, I wish there was a bigger emphasis on performance issues (Zhu Bin gave a good talk about incremental/cached computation across pages, was good to see someone do it -- looking forward to reading about the details). Finally, language and framework driven approaches were barely on the radar: OOPSLA, ICSE, CHI, OSDI etc. seem to suck in much of the talent in this space.

Overall, while search, semantic web and now social nets seem to be strong points for WWW, basic client & application systems are on a slippery slope. Security seems to be teetering on -- NDSS is becoming more webby, so it'll be interesting to see how that plays out for the non-Oakland/CCS/Usenix papers. There were a few points when there just weren't any technical papers about improving general web apps, whether in the application, protocol, or browser layer, which was surprising and should be easy to fix. There's also a new Usenix Web Apps conference.. becoming less and less clear what the appropriate venues are.

Finally, what's the deal with social network research? One data mining researcher I talked to simply dubbed it the next trendy thing (the previous being search). I disagree. At face value, I'm not particularly interested in extracting particular facts from the Facebook or Flickr social graph. That is almost just recasting the idea of "WebDB".

An insight comes from the reason we're seeing a lot of incremental papers, namely, that social network data is available (we can redub most of the research as "social network research of online communities"). I've been reading "Diffusion of Innovation", a high-level book surveying research in field named in the title, and its historical descriptions make the significance of this proliferation clear. The studies outlined in the book were hard to do as they required researchers to do arduous tasks like interview hundreds of farmers about they did or did not adopt some new strain of miracle crop, find out their circle of acquaintances, and only then begin statistical analysis. Now, you can just download it all. For example, my spider just finished downloading records about 250,000 users and their interactions on some website over a span of 10 years -- for a course project! The iterative scientific process has just been accelerated in a big way.

The point is that I think the field of sociology itself is undergoing a revolution. The 20th century provided a mathematical foundation (information theory, bayesian statistics, etc.) and the 21st century is bringing the data. It's already obvious with cognitive science -- the availability of statistical models and now data samples means 'soft' science is becoming a misnomer.

No comments: