Monday, December 2, 2013

Scaling the Data Wall: Exporting Your List of Kindle Books from Amazon

I'm waiting to make a couple of cool announcements about our big data visualization startup. Meanwhile, I wanted to relate a stupid hack I needed to do before I could analyze my reading preferences: a way to download my list of Kindle books from Amazon. Amazon lets you export the list of physical purchases, but they try to only show a few digital purchases at a time. I have hundreds of books, so that's a no go. Searching on Google showed others got stuck on  Amazon  and  Goodreads as well. After some dead ends (examining receipt pages, digging around files in their native app, ...), I found a winner.

Long story short, here's how to extract your reading list from Amazon (~2 min) and, optionally, transform it into a CSV file (~2 min):

Extract the List of Books

  1. Log in to your Amazon account
  2. Click on "Your Account" up top
  3. Click on "Your Collection" as part of the "Digital Content" box

    You'll get to a page that's close but no cigar: books show up as you scroll, but at the same time, old ones get swapped out.
     
  4. Hit the print button and wait. And wait. And wait...
  5. At some point, everything will load. If a printer dialog pops up, cancel it.
  6. Voila!
You can copy/paste the page into a text editor like Notepad and you'll get a list of records like:

Your Collection

Title: Diffusion of Innovations, 5th Edition
Author: Everett M. Rogers
Date Acquired: January 26, 2010
Rating:

Title: Guns, Germs, and Steel: The Fates of Human Societies
Author: Jared Diamond
Date Acquired: January 24, 2010
Rating:

Don't use something like Word because it'll probably copy the HTML structure behind the page.

Transform Into CSV

Most text editors support regular expressions for fixing up content. For whatever search/replace menu item you have, do the following three steps to turn it into a CSV:

  1. Clean your data by removing commas
    Find:  ,
    Replace:
  2. Order your data by making each entry a comma delimited line
    Find: \r(Title|Author|Date Acquired|Rating):
    Replace: ,
  3. Clean your data by removing all those "Your Collection" lines
    Find: \rYour Collection
    Replace:

The rewrites will turn the above entries into just two lines 

, Diffusion of Innovations, 5th Edition, Everett M. Rogers,January 26, 2010,
, Guns, Germs, and Steel: The Fates of Human Societies, Jared Diamond,January 24, 2010,

Success! You have scaled the data wall and can plug in your personal preferences to Excel, Goodreads, or your own homebrew algorithm. Let's see how long it takes before Amazon plugs this hole!



No comments: