A Journey On Rails

Cleaner fixture generation and maintenance

Posted in Rails, Testing, Uncategorized by Vikram Venkatesan on February 9, 2011

Those who have used fixtures in Rails may know how handy they are when writing tests, helping us concentrate on the functionality, not having to worry about the test data. But, fixtures are not without their own problems.

  • They do not go through validations. ActiveRecord validations are not run while creating fixtures and there is no guarantee that the fixtures form valid AR objects. I’ve found many invalid fixtures sitting in my projects, yet the tests running safely based on the inconsistent data. This might very well be the case with any rails project. What was the last time you ran .valid? on your fixture records?
  • Records created through callbacks will have to be created manually. For instance, if there are UserObserver callbacks to create some records ‘on create’, they will have to be created manually. If not done, it’s a hole in the data set that is different from the one the real application uses in production/development. Hence, our tests might miss some data and cases that happen in production setup.
  • Less motivation to create fixture records manually and hence decreased test coverage. Managing YML files manually is no fun. The harder a process, the lesser will be the motivation to follow it.
  • Harder to maintain. It’s very hard to create records manually in some cases, like through tables. For instance, it took way too long for me to manually create all the intermediate table (in a habtm or has_many :through) records, which are supposed to be auto-generated by rails. Lets say, we have User has_many :cars, :through => :user_cars. Why should I create the user_car records by hand? Aren’t they supposed to be invisible to the application layer and hence the developer?
  • Hard to get a complete picture of the test data from fixtures, especially when there are more join tables, STI, etc.,.
  • Timestamps are absolute. The time values in the fixture data set are usually set to some hand-crafted time during the fixture creation. In some cases, the time may have to be closer to the time of test execution. Needless to say, it’s hard to update the time values for all the records in the fixtures when the need arises.

More than all these, fixtures do not appear to be the cleanest technique for the developers to create data at the data layer, wherein it’s real use lies in the application layer. We can very well write SQL statements for inserting fixtures, but it’s tedious and error prone. Fixtures are no different, except some support from Rails for naming records, associations, etc.,. But, they are still far from being an application layer functionality.

How about having a mechanism where we can write simple Rails code to create the fixture data? Fixture Generator plugin is just for that.

Installation

ruby script/plugin install git@github.com:venkatev/fixture_generator.git

Usage

class MyProjectFixtureGenerator < FixtureGenerator
 # Implement the 'populate' method in your populator.
 def populate
 # Create your records here and add them to the fixture set by calling
 # 'add_record(record_name, record_object)'. Please see lib/fixture_generator.rb for more details.
 end
end

# Simply invoke the following line test environment to generate fixture YAML files.
#
MyProjectFixtureGenerator.generate

# OR, even simple, run the following rake task. You may have to fix the fixture generator class name in the rake task
# tasks/fixture_generator_tasks.rake
#
rake fixtures:generate

Modifying fixture data is now lot more easier. Just go to your fixture generator ruby file, modify the record creation statements, and run rake fixtures:generate. The new YML files are right there, ready to be used in your tests! Using this approach, we may not even need to revision control the fixtures. The generator script is the single source of fixture data. The developers can run the rake task and generate the fixtures locally. Give it a try and let me know your feedback.

 
Happy testing!!!

Tagged with: ,

ActiveRecord :joins and :include

Posted in Rails, Uncategorized by Vikram Venkatesan on February 4, 2010

Yesterday, while fixing a bug related to eager loading in our project, I learned a few new things about :joins and :include.

Eager loading using :include with conditions

Consider there are two models, User and Experience(title: string, :user_id: integer,…) and User model has the following named scopes.

class User
 # With 'professor' title.
 named_scope :professors,
 :include => :experiences,
 :conditions => "experiences.job_title = 'professor'"

 # With the given title
 named_scope :with_title, lambda{|title|
   {:include    => :experiences,
    :conditions => ["experiences.job_title IN (?)", title]
   }
 }
end

Say, User.professors returns [u1, u2]. When we call u1.experiences, we will get only the professor experience of u1, even if there are other experiences. That’s because experiences association is eager loaded. So, u1.experiences does not trigger a SQL. It just returns what it fetched. Hence, those records that were filtered out will not be returned.

Solution: Use :joins key, which is the right way to join with an association for filtering. :include must be used only when the intent is to eager load the collections. For some reasons, we used :include instead of :joins in our project, and that paid of well in the form of bugs.

Another interesting learning related to joins and include was around the kind of queries that will be fired by them. I found this article really helpful in understanding that.

Beware of duplicates when using :joins

Replacing :include with :joins solved the eager loading issue. But, it created another problem; duplicate records were returned by the named scope. Say, we call User.with_title([‘professor’, ‘lecturer’]) to get all professors and lecturers. Since :joins always generate an INNER JOIN in a single SQL query, it may result in duplicate records being returned. On the other hand, :include works by firing separate SQL’s to load the associations to be eager loaded (this behaviour is since Rails 2.0; earlier, :include used to fire single query similar to :joins). It first makes a query to fetch the main table records (here, User). Then, it uses the id’s of the fetched records to make another query to fetch the other associations to be loaded, thus giving raise to n + 1 sql queries, where n is the number of associations to be loaded. Following are some sample SQL queries (for the models, say Tree and Branch, where Tree has_many Branches and the named scope is present in Tree model) that would be fired by the :include call.

SELECT trees.* FROM branches WHERE trees.name = 'Banyan'

# Say, the above query returns trees with ids [5, 7, 10]
SELECT branches.* FROM branches WHERE tree_id IN (5, 7, 10)

In the above Tree and Branch exmaple, the :conditions did not include any of the included tables. If your named scope (or any ActiveRecord finder statement) references the included tables’ columns in either conditions or order, then :include uses a single query with LEFT OUTER JOINs to load all the included associations, thus working almost the same as :joins. The reason I said almost is that, though both of them use a single query using JOINs, :include ensures duplicate records are not returned (in our case, the User records).

:joins and :include … when to use what?

Here are a few rules that I myself follow when using :joins and :include

  • Use :joins if you just want to use the association in conditions or ordering.
  • Use :include if you want to eager load the association using fewer queries thus avoiding the 1 + N problem.
  • Do not use the included associations in the conditions or order clause.

As per the rules, we must be using :joins. So, lets rewrite our sample example.

class User
 # With 'professor' title.
 named_scope :professors,
             :joins      => :experiences,
             :conditions => "experiences.job_title = 'professor'"

 # With the given title
 named_scope :with_title, lambda{|title|
   {:joins      => :experiences,
    :conditions => ["experiences.job_title IN (?)", title]
   }
 }
end

How to solve the duplicate problem?

Ok, we know :joins may result in duplicate records and :include doesn’t. But, just for that, we shouldn’t use :include, which was the mistake I did earlier. One way to fix it is by telling rails to fetch only DISTINCT records using the :select key.

class User
 # With 'professor' title.
 named_scope :professors,
             :select     => "DISTINCT users.*"
             :joins      => :experiences,
             :conditions => "experiences.job_title = 'professor'"

 # With the given title
 named_scope :with_title, lambda{|title|
   {:select     => "DISTINCT users.*",
    :joins      => :experiences,
    :conditions => ["experiences.job_title IN (?)", title]
   }
 }
end

User.with_title([‘professor’, ‘lecturer’]) will now fetch only distinct user records (say, 2 records)! But, User.with_title([‘professor’, ‘lecturer’]).count will return 3. That’s because, whenever we override the :select fragment, ActiveRecord simple replaces it with SELECT COUNT(table.*) to construct the COUNTER sql. That is the reason, whenever we use :finder_sql, we must also specify :counter_sql too (rails doc also warns about this). In our case, SELELCT DISTINCT users.* will be replaced with SELECT users.*, thus counting duplicates too. Same will be the case with empty? and any? calls too.  A :counter_sql like option even if there, wouldn’t be the right choice since we may have to specify the complete sql which may not be possible. One solution that worked for me was by overriding the count method on the named scope’s collection, as shown below

class User
  # With 'professor' title.
  named_scope(:professors,
              :select     => "DISTINCT users.*"
              :joins      => :experiences,
              :conditions => "experiences.job_title = 'professor'") do
    # Delegate to length
    def count; length; end
  end

  # With the given title
  named_scope(:with_title, lambda{|title|
     {:select     => "DISTINCT users.*",
      :joins      => :experiences,
      :conditions => ["experiences.job_title IN (?)", title]
     }
  } do
    # Delegate to length
    def count; length; end
  end

end

Readonly records returned by :joins

When you pass a SQL fragment to :joins, the resulting records will be readonly (http://api.rubyonrails.org/classes/ActiveRecord/Base.html – ‘find’ rdoc mentions about that too). Same applies to any finder; named scopes too. This is because, other tables’ attributes are also fetched using a JOIN query, and when you try to save that object, rails won’t know how to save those extra attributes. You can pass :readonly => false to bypass the behaviour.

What about eager loading?

If you noticed, we lost the eager loading benefits by using :joins. Using both :joins and :include in the same finder will result in table aliasing error where the same table is joined twice by active record. If someone knows a way to do this, please do share.

<br />
class User<br />
 # With 'professor' title.<br />
 named_scope :professors,<br />
 :include =&gt; :experiences,<br />
 :conditions =&gt; &quot;experiences.job_title = 'professor'&quot;</p>
<p> # With the given title<br />
 named_scope :with_title, lambda{|title|<br />
 {:include    =&gt; :experiences,<br />
 :conditions =&gt; [&quot;experiences.job_title IN (?)&quot;, title]<br />
 }<br />
 }<br />
end<br />

Testing Transactions

Posted in Uncategorized by Vikram Venkatesan on January 8, 2010

If you ever ran into a situation where you were not able to test the transaction logic in your code, the reason could have been that, rails runs each test case as a transaction itself and hence you don’t see the actual effect of rollback happening for your transaction. This happens when you are using transactional_fixtures that create fixture records only once while running the test suite life time by running the test cases as transactions, thereby undoing any DML modifications, leaving the fixtures intact. The default test_helper.rb generated by rails will have the following lines of code

# The only drawback to using transactional fixtures is when you actually
# need to test transactions.  Since your test is bracketed by a transaction,
# any transactions started in your code will be automatically rolled back.
self.use_transactional_fixtures = true

You can switch off trasactional fixtures for all tests by calling self.use_transactional_fixtures = false in your test_helper or in your test class. But, that will result in way too slow tests. How about disabling it only for the specific test case alone? From what i know, there isn’t a way to do so.

What you can rather do is, write a separate test class for such transactional tests setting self.use_transactional_fixtures = false. Things will work fine as you expected. One word of caution though! Now that you have disabled transactions, it’s your duty to clean up any data created by the test case(s), so that the new test class can co-exist with other tests using transactional fixtures. You can either do this manually at the end of the test method(s) or override the teardown method and add the cleanup code to it.

def teardown
 # Your code for undoing the data changes goes here....
end

Happy testing!!!