NoSQL Summer Reading List

July 31, 2010

For those of you who aren’t as much into reading up on different types of database, there’s an interesting summer reading list going on right now over atA NoSQL Summer. Unfortunately, I’m not lucky enough to live in a town with a NoSQL Summer group (not that I know of, at least) and I’ve had too much on my plate to start one up. But I still wanted to read all of the papers. What’s a poor guy to do? Instead of navigating a bunch of web pages and downloading some PDFs, I decided to automate the process and write a tiny program to do it for me. I turned to my favorite rapid fire language, Ruby, and fired off a quick script to parse the web pages and get me the content that I was looking for.``` #!/usr/bin/ruby

require ‘rubygems’ require ‘hpricot’ require ‘open-uri’ require ’net/http'

path to the target directory, you’ll probably want to change this…

unless your account is named ‘jeremiah’

base_folder = “/Users/jeremiah/Desktop/NoSQL”

open up the list of papers

doc = open(‘http://nosqlsummer.org/papers') { |f| Hpricot(f) }

find all of the links to each paper and loop through them

doc.search("//div[@class=‘o-papers on’]/a").each do |link|

ignore the closing tags.

there’s probably a better way to do this,

but I wrote this in 15 minutes at 11:30 at night

next unless link.is_a? Hpricot::Elem

paper_doc = open(“http://nosqlsummer.org/#{link.attributes['href']}") { |f| Hpricot(f) }

get the necessary elements to build our document name for saving

difficulty = paper_doc.at(“h4[@class*=‘difficulty’]”)[‘class’][-1,1] title = (paper_doc/“div[@class=‘o-paper on’]/h1”).inner_text download_link = paper_doc.at(“a[@class=‘download’]”)[‘href’]

begin # try to save puts “Attempting to download #{title} from #{download_link}…” write_out = open(”#{base_folder}/#{difficulty}_#{title}.pdf", “wb”) write_out.write(open(download_link).read) write_out.close rescue Exception puts " *** v^v^v^ error ^v^v^v ***" end end