Perspectives

Gems

If you have built a Rails application, there is a good chance the power of Ruby gems were utilized to quickly add new functionality. In this article, we're going to go over some best practices to follow when using gems within a Rails application. I will demonstrate how needed gems can be specified in your environment, unpack a copy into your application and then we will do a little gem hacking without breaking them.

The application we are about to build won't really be an application, we won't be creating any views, controller or even touch the database. We will be using the Gruff gem by Geoffrey Grosenbach to generate some simple graphs and then we will enhance Gruff with some new options.

Installing and Freezing

Let's get started by creating a new Rails application and installing the Gruff gem.

rails gem_hacks
sudo gem install gruff

Add the following in the Rails::Initializer.run block in your config/environment.rb:

config.gem "gruff"

By adding config.gem "gruff" in our environment the application will look for the gruff gem on start up. If it doesn't exist it will throw an exception. This is much better than having the application running while the portions that depend on the gem error out.

Lets make sure Gruff appears in the application's gem list and then copy it to the vendor/gems directory of the application with the unpack task.

rake gems
rake gems:unpack:dependencies gruff

By including Gruff within the application, we won't have to install the gem on every box the application finds itself on. Also, every developer working on the project will not have to worry about having the same gem version installed.

Making It Our Own

Next let's create a BarGraph class that will require 'gruff'. The BarGraph class will serve as a wrapper method for generating common bar graphs found on the site.

require 'gruff'
class BarGraph
  
end

Every ruby class needs an initialize method in order to create new instances. Add an initialize method inside the BarGraph class filled with Gruff code that will generate a static bar graph for now. Then create the bar_graphs directory inside of public/images.

def initialize
  g = Gruff::Bar.new("400x300")
  g.data :years, [185, 155, 110, 90, 135]
  g.labels = { 0 => "2004", 1 => "2005", 2 => "2006", 3 => "2007", 4 => "2008" }
  g.write('public/images/bar_graphs/test.jpg')
end

Next fire up your console and try BarGraph.new, then go into your public/images/bar_graphs directory and open up test.jpg, it should look like the following:

Making Improvements

Notice how the bar of the lowest value (90) is barely visible? By default, the minimum value will be the lowest value and the maximum will be the highest value. We can customize this by setting more attribute values of the instance. Add the following before the g.write line.

g.minimum_value = 0
g.maximum_value = 200

Let's once again regenerate our graph through the console; now the bar with the value of 110 should be visible.

Great, our graph starts at 0 and we've added some potential by specifying a higher maximum value. Lets make our BarGraph class more dynamic by adding some arguments that can be passed through the initialize method and refactor the code within.

def initialize(values, labels, save_path, options={})
  g = Gruff::Bar.new("400x300")
  g.data :years, values
  g.labels = labels
  g.minimum_value = 0
  g.maximum_value = 150
  g.write(save_path)
end

Now whenever we call BarGraph.new, arguments for values, labels and save_path will be needed. Reload your console and give the following a try:

BarGraph.new( [185, 155, 110, 90, 135], { 0 => "2004", 1 => "2005", 2 => "2006", 3 => "2007", 4 => "2008" }, "public/images/bar_graphs/test_with_arguments.jpg" )

This will generate the same graph as before since we are passing in the same data. If we passed in an array with values higher then 200, however, it would get cut off due to the static maximum value in our initialize method. We can remove the maximum value portion from our initialize and Gruff will default to the highest value in our array. Another option is to add code that will take that maximum value and round it up.

Hacking the Gruff Gem

The auto-rounding of the maximum value would be a good feature to have in other Gruff graphs as well. Since we have the gem source in the vendor/gems of our projects, adding it there might sound like a good idea at first. Later in the project lifecycle, however, the gem might be updated and those custom additions would be lost.

The ideal solution would be to create a new module that would contain our hacks which will be included wherever needed. Whenever the gem is updated and our hacks break, we can easily fix them since they are all located in one place. With that being said let's create our GruffHacks module and save it as gruff_hacks.rb in lib.

module GruffHacks
  class Gruff::Base
  
  end
end

Our GruffHacks module contains the Gruff::Base class which is found in vendor/gems/gruff/lib/gruff/base.rb. Take a few minutes to browse through the Base class as we will be overriding methods from it next. Near the top all the attribute accessors for Gruff are being set. These attributes are shared between all graphs since they are part of base. Lets add a attribute accessor for rounding maximum values called round_maximum_value into Gruff::Base of our GruffHacks module.

Remember we are not editing any code from the gem source, we are only going to use it for reference and copy methods that we will be overriding from it. Next lets copy the entire initialize_ivars method from the gem source into Gruff::Base of our GruffHacks module. At the bottom, after @norm_data = nil, let's add @round_maximum_value = false. The initialize_ivars method is responsible for setting default values for Gruff attributes, here we are setting @round_max_value to false to cancel out the rounding functionality we will add soon. The goal with this module is to enhance Gruff, but to have it function normally if our attribute hacks are not being set.

With the round_maximum_value attribute accessor along with its default value added, now it is time to set the new maximum value when round_maximum_value is true. Add the following method into Gruff::Base of our GruffHacks module

def round_maximum_value=(value)
  if value == true
    rounded_maximum_values = []
    
    # loop through each bar group, grab the highest value, round it up and add
                # to rounded_maximum_values array
    highest_val = @data.each do |row|
      row_highest = row[1].sort.last
      round_to = 10 ** (row_highest.to_s.length - 1)
      rounded_maximum_values << row_highest.roundup(round_to)
    end
    # Set maximum value based on highest rounded value
    @maximum_value = rounded_maximum_values.sort.last
  end
end

Add this after the very last method of GruffHacks, these two helpful methods are created by Charlie from PullMonkey for rounding numbers up or down. Our round_maximum_value method uses the roundup method.

class Numeric
  def roundup(nearest=10)
    self % nearest == 0 ? self : self + nearest - (self % nearest)
  end
  def rounddown(nearest=10)
    self % nearest == 0 ? self : self - (self % nearest)
  end
end

Before testing our new hack, lets include the GruffHacks module into our BarGraph class by adding include GruffHacks after the Gruff require statement. In our initialize method remove the g.maximum_value line and add in g.round_maximum_value = true. If you try regenerating the same graph as before, the maximum value will now be 200 since 185 was rounded up.

Worth the Effort

By overriding the initialize_ivars and adding the round_maximum_value setter into Gruff::Base we are able to automatically add some space between the highest value and the top of the graph by simply setting a single attribute to true. If there is a need to change this functionality later on, we won't have to dig through the gem source and hunt for modified lines. Instead we open up our GruffHacks module and easily find our overridden methods. It is also much easier to add new methods, maybe more style options are needed for the bars or the labels need to be positioned in a different way. Whatever it is, by keeping these enhancements in a separate module makes sharing and updating hacks between applications easier.

As projects grow, it becomes harder to keep up with all the details. Even if you are the only developer on the project, there is a good chance the inner workings will be forgotten months later. By breaking simple hacks into new modules we are minimizing the potential of possible bugs, and any bugs that do exist will be easier to find. Spending a few extra minutes to better organize your code and using the object oriented nature of a language such as Ruby will save time for you and your fellow colleagues.



Map

DataMapper is an object relational mapper for ruby with an interface somewhat similar to ActiveRecord's. More than sprinkles on top of a generic SQL adapter, DataMapper is a design pattern for defining repositories and the models that love them. DataMapper differs from ActiveRecord and ActiveResource in that models are encapsulated from repositories while queries and collections communicate between them. One of the most significant advantages to this approach lies in the ability to develop models separately from their repository.

Over the next several articles we will use the Twitter API to explore how DataMapper expects custom adapters to work, executes CRUD requests, handles associations and works with multiple repositories. For now we will concentrate on the basics of DataMapper and querying users from the Twitter API service. Those comfortable with ruby or ActiveRecord should be able to follow along, however I strongly recommend spending time with DataMapper's fantastic documentation if you have not already.

Modeling Our Models

First things first, we should install DataMapper and define a model to represent a user account from the Twitter API. To do that we need to define a module using the DataMapper::Resource module and describe the properties Twitter provides.

gem install datamapper
require 'dm-core'

class User
  include DataMapper::Resource

  property :id, Integer, :field => 'user_id'
  property :name, String
  property :screen_name, String
  property :email, String
  property :location, String
  property :description, Text, :lazy => false
  property :profile_image_url, String
  property :url, String
  property :protected, Boolean
  property :followers_count, Integer
end

DataMapper differs from the ActiveRecord family in that fields are defined in your model rather than being created from the repository's schema. This allows models to be built without an adapter making a connection and avoids the headaches of ActiveRecord-style migrations. Additionally you may specify the :field name to be used for each property, allowing antiquated or confusing field names to be user friendly. Defining a model's properties also allows DataMapper to intelligently calculate by type which fields should be lazily loaded, which may also be customized by passing the :lazy option a true or false value. In our User model we have set the lazy option to false as Twitter provides a user's description by default, and there is no use in wasting an API hit if we do not need to (Twitter limits API calls by the hour). A more in-depth description of DataMapper's lazy fields can be found in the documentation.

The Base Adapter

For the foundation of our Twitter adapter we need to catch any authentication options passed to the initialization as well as write a method for communicating with Twitter.

require 'cgi'
require 'open-uri'
require 'rubygems'
require 'dm-core'
require 'xmlsimple'

module DataMapper
  module Adapters
    class TwitterAdapter

      # Clients can provide DataMapper with a URI string or hash of options when
      # initializing an adapter. We can store these values and use them for each
      # request to the Twitter service if the client provides them. Depending on
      # your repository you may wish to verify authentication here rather than 
      # waiting for the initial request.
      #
      # name:: Name of the adapter
      # uri_or_options:: A uri string, or hash of options used to initialize the adapter
      #
      def initialize(name, uri_or_options)
        # don't forget to phone home!
        super(name, uri_or_options)

        case uri_or_options
        when Hash
          user = uri_or_options[:user] || ''
          pass = uri_or_options[:pass] || ''
          @auth = user.blank? || pass.blank? ? nil : [user, path]
        end
      end

      private

      # Requests a resource from the Twitter API. If the adapter was initialized
      # with the :user and :pass options, they will be used to authenticate the request.
      #
      # method:: Path to follow the base URI 'http://twitter.com'
      # params:: Hash of key/value pairs to be used as the query string
      # returns:: XmlSimple representation of the response from Twitter
      #
      def request(method, params = {})
        uri = "http://twitter.com/#{method}"
        options = {:http_basic_authentication => @auth}

        unless params.blank?
          query = params.map { |k,v| "%s=%s" % [CGI.escape(k), CGI.escape(v)] }
          uri << "?#{query.join('&')}"
        end

        result = open(uri, options)
        return XmlSimple.xml_in(result.read, {'ForceArray' => false})
      end

    end
  end
end

Now we can go ahead and see if we are on the right path, we should get a NotImplementedError when we try to perform any action with our repository.

DataMapper.setup(:default => {
  :adapter => 'twitter',
  :user => 'kscollective',
  :pass => 'snark snark'
})

User.first(:screen_name => 'kscollective') # => NotImplementedError

Fetching Heffalumps And Woozles

In order to continue building our adapter we have to be able to understand the queries and collections DataMapper uses to mediate between the models and adapters. DataMapper::AbstractAdapter defines #read_one and #read_many, both of which accept the query as the single parameter. The query object allows us to determine which model and fields to query along with any possible conditions to limit our results by. Queries also tell our adapter about any possible offsets, limits or ordering, but we will come back to that another day.

Query#model

Each query belongs to a model which we can use to load and return instances fetched from our repository, and may also be used to customize repository requests by type. We can even use the model to DRY up our adapter and work with a single #read method.

class << TwitterAdapter

  def read_one(query)
    read(query, query.model, false)
  end

  def read_many(query)
    Collection.new(query) do |set|
                  read(query, set, true)
    end
  end

  private

  # Each read has a query and returns a set, #read_one and #read_many should provide
  # the set to load the results into. When called by #read_many nothing needs to be returned
  # as the collection is filling itself, however we must return what ever object #read_one 
  # should return back to the client code.
  #
  def read(query, set, many = true)
    raise NotImplementedError # to be filled in later
  end

end

Query#fields

Each query contains a subset of the model's properties specifying which fields to query as well as the order of our values when instantiating each result. Twitter does not provide a means to filter out fields so we will use #fields only to order the values we pass to the model builders.

class << TwitterAdapter
  
  # Map the values from item into an array of the same size and order as Query#fields
  # [id, name, something_else, title] => [1, 'Hello World', nil, 'Cats!']
  #
  def parse_user_values(query, item)
    return query.fields.map { |f| item[f.field.to_s] }
  end
end

Query#conditions

The meat of most queries, conditions is an array of tuples containing the operator, property and value to be considered when executing the request. Each operator may be any of the standard 'SQL' operators (:eql, :in, :gt[e], :lt[e]) and should be used to match the property with one or more values. Because Twitter only provides a method to query individual users by id, email or screen name we can write a method to create an array of key/value pair queries.

class << TwitterAdapter

  def generate_users_query(query)
    result = Array.new
    fields = ['user_id', 'email', 'screen_name']
    
    conditions = query.conditions.select do |condition|
      condition[0] == :eql and fields.include?(condition[1].field)
    end
    
    # each item in conditions is a [operator, property, value] tuple
    for operator, property, value in conditions
      # if an array, each value must be queried individually
      [value].flatten.each { |v| result << [property.field, v] }
    end
    
    return result
  end

end

Mind The Gap!

What used to be the hardest part in using third party resources has now become a matter of building the request, parsing the request and loading the model.

def read(query, set, many = true)
  queries = generate_user_queries(query)
  
  for key, value in queries
    twitter_user = request("users/show.xml", {key => value})
    next if twitter_user.blank? or twitter_user['screen_name'].blank?
    user_values = parse_user_values(query, item)
    many ? set.load(user_values) : (break set.load(user_values, query))
  end
  
  return
end

Recess!

At this point you should be able to query users from Twitter without parsing a single query string or XML response.

DataMapper.setup(:default => {:adapter => 'Twitter'})

user = User.first(:screen_name => 'KSCollective')
puts user.url # => http://www.killswitchcollective.com

As you can see DataMapper can make working with third party repositories as automagical as ActiveRecord. DataMapper itself comes with adapters for the major databases with additional adapters available for CouchDB, Google Video and many others. In a few weeks we will continue building our Twitter adapter, leveraging the power of associations. Until then I highly recommend reviewing DataMappers documentation, the adapters included with the dm-core gem and the base adapter we have just built.



Spring

Get FlexibleCSV from GitHub

A Challenge in Flexibility

As part of a contact management system we are building for a client, I encountered a unique challenge with allowing users to upload and import their contacts from CSV files. Usually this would not be a problem, except that in this case there was no standardization to what the header names would be or what order the columns were in. Because the FasterCSV gem relies on using the header names as access keys, this process was suddenly quite complicated.

One solution would be to create a user interface that would display our database fields, their CSV columns and allow them to pair them up. For example, my database column is 'email' but their CSV column is 'Email Address', so they could mark those as equivalent. What would I do, however, for the users who have a "Full Name" column when I use 'first_name' and 'last_name' database columns? Suddenly the user interface could get very complicated and confusing.

Introducing FlexibleCSV

Instead, I developed FlexibleCSV, a gem that allows you to parse through a CSV file without knowing exactly what the headers are named. By providing a list of possible header names, you can access all the CSV columns with a uniform interface.

require 'flexible_csv'

# Arbitrary CSV data
csv_data1 = %Q{Full Name, Email Address\nJohn Doe, john@doe.com}
csv_data2 = %Q{Email, Name\njohn@doe.com, John Doe}

parser = FlexibleCsv.new do |csv|
  csv.column :full_name, "Name", "Full Name", "Client Name"
  csv.column :email, "Email", "Email Address"
end

parser.parse(csv_data1).each do |row|
  puts row.full_name #=> 'John Doe'
  puts row.email     #=> 'john@doe.com'
end

parser.parse(csv_data2).each do |row|
  puts row.full_name #=> 'John Doe'
  puts row.email     #=> 'john@doe.com'
end

Both data sets can now be accessed using the uniform #full_name and #email accessors.

Handling Complexity with Adapters

Going back to my original example, how would we handle CSV files that separated first and last names when my database uses the full name? Or vis versa? Though I considered adding this kind of functionality to the FlexibleCSV gem, ultimately I thought it best to keep that kind of logic in a separate adapter class. For example:

require 'flexible_csv'

# Arbitrary CSV data
csv_data1 = %Q{Full Name\nJohn Doe}
csv_data2 = %Q{First Name, Last Name\nJohn,Doe}

parser = FlexibleCsv.new do |csv|
  csv.column :full_name, "Name", "Full Name", "Client Name"
  csv.column :first_name, "First Name", "First"
  csv.column :last_name, "Last Name", "Last", "Surname"
end

class CsvAdapter
  def initialize(row)
    @row = row
  end

  def full_name
    row.full_name || "#{row.first_name} #{row.last_name}"
  end

  def last_name
    row.last_name || row.full_name.split(' ').last
  end

  def first_name
    row.first_name || row.full_name.split(' ').first
  end

  def method_missing(method_name, *args)
    row.send(method_name, *args)
  end
end

parser.parse(csv_data1).each do |row|
  ad_row = CsvAdapter.new(row)
  puts ad_row.full_name  #=> 'John Doe'
  puts ad_row.first_name #=> 'John'
  puts ad_row.last_name  #=> 'Doe'
end

parser.parse(csv_data2).each do |row|
  ad_row = CsvAdapter.new(row)
  puts ad_row.full_name  #=> 'John Doe'
  puts ad_row.first_name #=> 'John'
  puts ad_row.last_name  #=> 'Doe'
end

Using the adapter class, we can once again access each row of data from any CSV file with a uniform interface.

Go Get It!

To use the FlexibleCSV gem, you can follow or fork the project on GitHub or just install the gem:

sudo gem install chrisjpowers-flexible_csv




RSS Feed


CATEGORIES


ARCHIVES


BOOKMARKED


Add to Technorati Favorites