Thursday, February 7, 2008

Web Search and Targeted Ads Considered Harmful

1. Knowing the last 5-10 web pages you visited is typically enough to isolate you against other people
2. A related principle can be applied to search terms

Someone from Yahoo! Research gave a talk tonight about a security dilemma they're facing: they want to release user query records to researchers so we can come up with better methods, but they must also somehow scrub identities out of the data. The data in Netflix's search set has already been correlated with certainty to IMDB user accounts; same thing with search. However, the dimensionality in query records is much bigger than in movies, and mixed with sparseness and common sense, dangerous. For example, something like half the people in their database have performed 'vanity' searches for their own name. Add in some local restaurant names or businesses, and voila. Other search terms will reveal sensitive information: did you look at an AID clinic website recently? Perhaps for some medical symptoms? Search companies can't release that sort of information. One of the examples of malicious use are blackmail: "I'll tell your spouse that you looked for an adult club."

I felt good that they were thinking about this, but then I thought: wait a minute, aren't they already exposing some of this data? In particular, they allow advertisers to display clickable ads for particular terms. That is almost a full two-way channel! It doesn't reveal all of your data, but still enough to identify you in a potentially incriminating or otherwise undesirable manner and communicate that fact to you. For example, if both store Good and store Bad are in your town and use Google adSense, a malicious user can place Flash ads for both and thus record IP addresses of visitors. They can build up a 'hit list' of IP addresses that match, and then blackmail you on some other term: next time you see a targeted ad by these guys (for some other local search term), they show a window that says "IP, we know you searched for store Bad and we're gonna tell your wife..." except more imaginative.

The costs involved today are still somewhat steep, but with more thought, I suspect a better related ploy is possible, and, more importantly, this stresses that these companies must tread very softly in how they interface with advertisers. The problem is a fundamental one, however. Proving an interface like this secure is a challenge. Stephen McCamant has some neat work on tracking quantitative information flow that helps set the PL/SE mood before you switch into game theory or anything fancier.
Post a Comment