Category Programming

Make It Better, Stupid

Whenever I write code, I churn out garbage. I do it this way so I can get the general idea down and mess around with the problem. I don’t need a good solution to solve my problem while I’m in development, I need a working solution. Once I get a working solution, I go back and I optimize the code and I eventually write something that doesn’t look like a baby panda threw up on my keyboard.

Today I optimized this:

 tags.each do |tag|
   tag.strip!
   tag_object = Tag.find_by_name(tag)

   tag_object = Tag.create!(:name => tag) if tag_object.nil?

   tag_list << tag_object
 end

into this:

tags.each { |tag| tag_list << Tag.first_or_create(:name => tag.strip) }

The actual refactoring I performed isn’t as important to me as the process. I write a lot of junk. I think it’s okay to write a lot of junk as long as you’re getting working ideas out in the wild, messing around, and making them work. Once you get those ideas working, it’s important to go back and clean them up so that you can maintain your ideas, build on them, and grow them into something bigger and better.

NoSQL Summer Reading List

For those of you who aren’t as much into reading up on different types of database, there’s an interesting summer reading list going on right now over at A NoSQL Summer. Unfortunately, I’m not lucky enough to live in a town with a NoSQL Summer group (not that I know of, at least) and I’ve had too much on my plate to start one up. But I still wanted to read all of the papers. What’s a poor guy to do?

Instead of navigating a bunch of web pages and downloading some PDFs, I decided to automate the process and write a tiny program to do it for me. I turned to my favorite rapid fire language, Ruby, and fired off a quick script to parse the web pages and get me the content that I was looking for.


#!/usr/bin/ruby

require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'net/http'

# path to the target directory, you'll probably want to change this...
# unless your account is named 'jeremiah'
base_folder = "/Users/jeremiah/Desktop/NoSQL"

# open up the list of papers
doc = open('http://nosqlsummer.org/papers') { |f| Hpricot(f) }

# find all of the links to each paper and loop through them
doc.search("//div[@class='o-papers on']/a").each do |link|
  # ignore the closing tags.
  # there's probably a better way to do this,
  # but I wrote this in 15 minutes at 11:30 at night
  next unless link.is_a? Hpricot::Elem

  paper_doc = open("http://nosqlsummer.org/#{link.attributes['href']}") { |f| Hpricot(f) }

  # get the necessary elements to build our document name for saving
  difficulty = paper_doc.at("h4[@class*='difficulty']")['class'][-1,1]
  title = (paper_doc/"div[@class='o-paper on']/h1").inner_text
  download_link = paper_doc.at("a[@class='download']")['href']

  begin
    # try to save
    puts "Attempting to download #{title} from #{download_link}..."
    write_out = open("#{base_folder}/#{difficulty}_#{title}.pdf", "wb")
    write_out.write(open(download_link).read)
    write_out.close
  rescue Exception
    puts "  *** v^v^v^ error ^v^v^v ***"
  end
end

This script very neatly downloads everything to the directory of your choosing (change the directory name). It also thoughtfully names the files with their difficulty rating as the first character so you can sort them ASCII-betically and make a halfway decent list to help your learn your way into NoSQL nerdery.

There’s only one problem. One of the papers, the graph traversal paper, won’t download for some reason. The ACM server returns an HTTP access denied error code. To get around this you can either download it with your browser, or you can go ahead and use the copy that I’ve provided – The Graph Traversal Pattern.

Enjoy!

This site is protected with Urban Giraffe's plugin 'HTML Purified' and Edward Z. Yang's Powered by HTML Purifier. 401 items have been purified.