A Journey On Rails

Cleaner fixture generation and maintenance

Posted in Rails, Testing, Uncategorized by Vikram Venkatesan on February 9, 2011

Those who have used fixtures in Rails may know how handy they are when writing tests, helping us concentrate on the functionality, not having to worry about the test data. But, fixtures are not without their own problems.

  • They do not go through validations. ActiveRecord validations are not run while creating fixtures and there is no guarantee that the fixtures form valid AR objects. I’ve found many invalid fixtures sitting in my projects, yet the tests running safely based on the inconsistent data. This might very well be the case with any rails project. What was the last time you ran .valid? on your fixture records?
  • Records created through callbacks will have to be created manually. For instance, if there are UserObserver callbacks to create some records ‘on create’, they will have to be created manually. If not done, it’s a hole in the data set that is different from the one the real application uses in production/development. Hence, our tests might miss some data and cases that happen in production setup.
  • Less motivation to create fixture records manually and hence decreased test coverage. Managing YML files manually is no fun. The harder a process, the lesser will be the motivation to follow it.
  • Harder to maintain. It’s very hard to create records manually in some cases, like through tables. For instance, it took way too long for me to manually create all the intermediate table (in a habtm or has_many :through) records, which are supposed to be auto-generated by rails. Lets say, we have User has_many :cars, :through => :user_cars. Why should I create the user_car records by hand? Aren’t they supposed to be invisible to the application layer and hence the developer?
  • Hard to get a complete picture of the test data from fixtures, especially when there are more join tables, STI, etc.,.
  • Timestamps are absolute. The time values in the fixture data set are usually set to some hand-crafted time during the fixture creation. In some cases, the time may have to be closer to the time of test execution. Needless to say, it’s hard to update the time values for all the records in the fixtures when the need arises.

More than all these, fixtures do not appear to be the cleanest technique for the developers to create data at the data layer, wherein it’s real use lies in the application layer. We can very well write SQL statements for inserting fixtures, but it’s tedious and error prone. Fixtures are no different, except some support from Rails for naming records, associations, etc.,. But, they are still far from being an application layer functionality.

How about having a mechanism where we can write simple Rails code to create the fixture data? Fixture Generator plugin is just for that.

Installation

ruby script/plugin install git@github.com:venkatev/fixture_generator.git

Usage

class MyProjectFixtureGenerator < FixtureGenerator
 # Implement the 'populate' method in your populator.
 def populate
 # Create your records here and add them to the fixture set by calling
 # 'add_record(record_name, record_object)'. Please see lib/fixture_generator.rb for more details.
 end
end

# Simply invoke the following line test environment to generate fixture YAML files.
#
MyProjectFixtureGenerator.generate

# OR, even simple, run the following rake task. You may have to fix the fixture generator class name in the rake task
# tasks/fixture_generator_tasks.rake
#
rake fixtures:generate

Modifying fixture data is now lot more easier. Just go to your fixture generator ruby file, modify the record creation statements, and run rake fixtures:generate. The new YML files are right there, ready to be used in your tests! Using this approach, we may not even need to revision control the fixtures. The generator script is the single source of fixture data. The developers can run the rake task and generate the fixtures locally. Give it a try and let me know your feedback.

 
Happy testing!!!

Tagged with: ,

ActiveRecord :joins and :include

Posted in Rails, Uncategorized by Vikram Venkatesan on February 4, 2010

Yesterday, while fixing a bug related to eager loading in our project, I learned a few new things about :joins and :include.

Eager loading using :include with conditions

Consider there are two models, User and Experience(title: string, :user_id: integer,…) and User model has the following named scopes.

class User
 # With 'professor' title.
 named_scope :professors,
 :include => :experiences,
 :conditions => "experiences.job_title = 'professor'"

 # With the given title
 named_scope :with_title, lambda{|title|
   {:include    => :experiences,
    :conditions => ["experiences.job_title IN (?)", title]
   }
 }
end

Say, User.professors returns [u1, u2]. When we call u1.experiences, we will get only the professor experience of u1, even if there are other experiences. That’s because experiences association is eager loaded. So, u1.experiences does not trigger a SQL. It just returns what it fetched. Hence, those records that were filtered out will not be returned.

Solution: Use :joins key, which is the right way to join with an association for filtering. :include must be used only when the intent is to eager load the collections. For some reasons, we used :include instead of :joins in our project, and that paid of well in the form of bugs.

Another interesting learning related to joins and include was around the kind of queries that will be fired by them. I found this article really helpful in understanding that.

Beware of duplicates when using :joins

Replacing :include with :joins solved the eager loading issue. But, it created another problem; duplicate records were returned by the named scope. Say, we call User.with_title([‘professor’, ‘lecturer’]) to get all professors and lecturers. Since :joins always generate an INNER JOIN in a single SQL query, it may result in duplicate records being returned. On the other hand, :include works by firing separate SQL’s to load the associations to be eager loaded (this behaviour is since Rails 2.0; earlier, :include used to fire single query similar to :joins). It first makes a query to fetch the main table records (here, User). Then, it uses the id’s of the fetched records to make another query to fetch the other associations to be loaded, thus giving raise to n + 1 sql queries, where n is the number of associations to be loaded. Following are some sample SQL queries (for the models, say Tree and Branch, where Tree has_many Branches and the named scope is present in Tree model) that would be fired by the :include call.

SELECT trees.* FROM branches WHERE trees.name = 'Banyan'

# Say, the above query returns trees with ids [5, 7, 10]
SELECT branches.* FROM branches WHERE tree_id IN (5, 7, 10)

In the above Tree and Branch exmaple, the :conditions did not include any of the included tables. If your named scope (or any ActiveRecord finder statement) references the included tables’ columns in either conditions or order, then :include uses a single query with LEFT OUTER JOINs to load all the included associations, thus working almost the same as :joins. The reason I said almost is that, though both of them use a single query using JOINs, :include ensures duplicate records are not returned (in our case, the User records).

:joins and :include … when to use what?

Here are a few rules that I myself follow when using :joins and :include

  • Use :joins if you just want to use the association in conditions or ordering.
  • Use :include if you want to eager load the association using fewer queries thus avoiding the 1 + N problem.
  • Do not use the included associations in the conditions or order clause.

As per the rules, we must be using :joins. So, lets rewrite our sample example.

class User
 # With 'professor' title.
 named_scope :professors,
             :joins      => :experiences,
             :conditions => "experiences.job_title = 'professor'"

 # With the given title
 named_scope :with_title, lambda{|title|
   {:joins      => :experiences,
    :conditions => ["experiences.job_title IN (?)", title]
   }
 }
end

How to solve the duplicate problem?

Ok, we know :joins may result in duplicate records and :include doesn’t. But, just for that, we shouldn’t use :include, which was the mistake I did earlier. One way to fix it is by telling rails to fetch only DISTINCT records using the :select key.

class User
 # With 'professor' title.
 named_scope :professors,
             :select     => "DISTINCT users.*"
             :joins      => :experiences,
             :conditions => "experiences.job_title = 'professor'"

 # With the given title
 named_scope :with_title, lambda{|title|
   {:select     => "DISTINCT users.*",
    :joins      => :experiences,
    :conditions => ["experiences.job_title IN (?)", title]
   }
 }
end

User.with_title([‘professor’, ‘lecturer’]) will now fetch only distinct user records (say, 2 records)! But, User.with_title([‘professor’, ‘lecturer’]).count will return 3. That’s because, whenever we override the :select fragment, ActiveRecord simple replaces it with SELECT COUNT(table.*) to construct the COUNTER sql. That is the reason, whenever we use :finder_sql, we must also specify :counter_sql too (rails doc also warns about this). In our case, SELELCT DISTINCT users.* will be replaced with SELECT users.*, thus counting duplicates too. Same will be the case with empty? and any? calls too.  A :counter_sql like option even if there, wouldn’t be the right choice since we may have to specify the complete sql which may not be possible. One solution that worked for me was by overriding the count method on the named scope’s collection, as shown below

class User
  # With 'professor' title.
  named_scope(:professors,
              :select     => "DISTINCT users.*"
              :joins      => :experiences,
              :conditions => "experiences.job_title = 'professor'") do
    # Delegate to length
    def count; length; end
  end

  # With the given title
  named_scope(:with_title, lambda{|title|
     {:select     => "DISTINCT users.*",
      :joins      => :experiences,
      :conditions => ["experiences.job_title IN (?)", title]
     }
  } do
    # Delegate to length
    def count; length; end
  end

end

Readonly records returned by :joins

When you pass a SQL fragment to :joins, the resulting records will be readonly (http://api.rubyonrails.org/classes/ActiveRecord/Base.html – ‘find’ rdoc mentions about that too). Same applies to any finder; named scopes too. This is because, other tables’ attributes are also fetched using a JOIN query, and when you try to save that object, rails won’t know how to save those extra attributes. You can pass :readonly => false to bypass the behaviour.

What about eager loading?

If you noticed, we lost the eager loading benefits by using :joins. Using both :joins and :include in the same finder will result in table aliasing error where the same table is joined twice by active record. If someone knows a way to do this, please do share.

<br />
class User<br />
 # With 'professor' title.<br />
 named_scope :professors,<br />
 :include =&gt; :experiences,<br />
 :conditions =&gt; &quot;experiences.job_title = 'professor'&quot;</p>
<p> # With the given title<br />
 named_scope :with_title, lambda{|title|<br />
 {:include    =&gt; :experiences,<br />
 :conditions =&gt; [&quot;experiences.job_title IN (?)&quot;, title]<br />
 }<br />
 }<br />
end<br />

Namespaced models and controllers

Posted in code organization, Rails by Vikram Venkatesan on January 27, 2010

Namespacing ruby classes using modules is one of the powerful features of ruby that is underused in rails. The problem is with using them in models and controllers since rails’ default conventions and configurations do not provide much support for namespaces. For instance, if you want a model for storing user settings separate from the basic user information (say, User model), you may create a model called UserSetting with an underlying table called user_settings. Here, the prefix User which we have used is nothing but the context in which the model works. A better approach in my opinion is to put the Profile and Setting models inside a namespace, say User, like User::Profile and User::Setting. That way, the redundancy in naming is solved. Now, coming to how rails supports this usage, the major areas where you may face problem are

Table naming

Rails defaults to the table name inferred from the class name of the model, excluding the namespace. So, it will assume settings to be the table name. You will have to call set_table_name to make it use user_settings, if required.

set_table_name 'user_settings'

Associations

The association class names and foreign keys will have to be specified explicitly.

belongs_to :user_setting,
           :class_name => 'User::Setting'
           :foreign_key => 'user_settings_id'

Fixtures

Name based fixture association references won’t work for namespaced models. Say, there is a belongs_to :user_profile in User::Setting model. In user_settings.yml you cannot directly use the fixture name like follows.

setting_one:
 user_profile: profile_one

You may have to use hard-coded ids and use the foreign key column name, like user_profile_id : 5. I struggled the most to get around this limitation since I could not change all my old fixtures to use hard coded id’s. Honestly, I do not know any better way to make this work and am still searching for it.

Update: Call set_fixture_class :user_settings => User::Setting to enable using named fixtures as given above.

There are many articles and blogs talking about the merits and demerits of using namespaces in rails so as to keep the code structured and clean. A few even recommend not using namespaces due the lack of good support for it.

But, from my experience with it, the advantages you get by using them overweigh it’s demerits. It’s about time rails starts supporting namespaces. You may find it hard to get it working for the first time. Once it’s into practice, you will not regret the choice.

Testing Transactions

Posted in Uncategorized by Vikram Venkatesan on January 8, 2010

If you ever ran into a situation where you were not able to test the transaction logic in your code, the reason could have been that, rails runs each test case as a transaction itself and hence you don’t see the actual effect of rollback happening for your transaction. This happens when you are using transactional_fixtures that create fixture records only once while running the test suite life time by running the test cases as transactions, thereby undoing any DML modifications, leaving the fixtures intact. The default test_helper.rb generated by rails will have the following lines of code

# The only drawback to using transactional fixtures is when you actually
# need to test transactions.  Since your test is bracketed by a transaction,
# any transactions started in your code will be automatically rolled back.
self.use_transactional_fixtures = true

You can switch off trasactional fixtures for all tests by calling self.use_transactional_fixtures = false in your test_helper or in your test class. But, that will result in way too slow tests. How about disabling it only for the specific test case alone? From what i know, there isn’t a way to do so.

What you can rather do is, write a separate test class for such transactional tests setting self.use_transactional_fixtures = false. Things will work fine as you expected. One word of caution though! Now that you have disabled transactions, it’s your duty to clean up any data created by the test case(s), so that the new test class can co-exist with other tests using transactional fixtures. You can either do this manually at the end of the test method(s) or override the teardown method and add the cleanup code to it.

def teardown
 # Your code for undoing the data changes goes here....
end

Happy testing!!!

Testing Rails exception handling

Posted in Rails, Testing by Vikram Venkatesan on December 23, 2009

I recently found a good article by Neeraj on rescue_action_in_public that saved a good amount of time for me while testing/modifying the rescue behaviour in ApplicationController.

But, testing it was not so trivial. Typicaly, the test.rb will be configuring all test requests to be considered local by through config.action_controller.consider_all_requests_local = true

Now, we need to override this setting only for a few test cases. Also, local_request? must be made to return false.
The former is quite straight forward, done by adding the following line of code to your test case.

@controller.consider_all_requests_local = false

For local_request? to return false, we must set @request.remote_addr to something other than 0.0.0.0 for the test framework to consider the request to be not local.
Rails provides a cleaner way to do this. Calling rescue_action_in_public! will do the job for us. So, putting the pieces together, here is a sample test code to get started

class DummyController < ApplicationController
  # Add actions for triggering different error cases (404 - Not Found, 403 - Forbidden, etc.,). Or have a single action and stub it.

  # Some dummy action throwing an exception
  def some_action
    raise "Some error"
  end
end

class ErrorHandlingTest < ActionController::TestCase
  tests DummyController

  def setup
    super
    @controller.consider_all_requests_local = false
    rescue_action_in_public!
  end

  def test_rescue_exception
    get :some_action
    # Assert the expected behaviour as follows
    #   assert_template "#{RAILS_ROOT}/app/views/common/500.html"
  end
end

Performance problems with AssociationCollection

Posted in Rails by Vikram Venkatesan on December 21, 2009

While debugging down some performance problem recently, i narrowed down the problem to a piece of code that iterated a has_many association using each method. To check whether the block passed to ‘each’ is the culprit, I emptied the block; that din’t help much. Finally i changed the call to AssociationCollecton.to_a#each, and groovy!, it reduced the time taken by that whole block by more than 10 times. Since that piece of code was called thousands of time in the request i was trying to optimize, it was an optimiziation in order of seconds for that request. What more would i ask?!

I took that chance to learn the AssociationProxy and AssocationCollection code to find what exactly was causing the delay. I learn’t that AssociationCollection delegats all unknown calls to it to @target, the actual collection Array, through method_missing. So, each call was also going through the method_missing path, creating so much delay. Following is a piece of code explains the above behaviour.

t = Time.now

# Replace with your real object and association
some_object.some_has_many_assoc.each { #empty}
puts &amp;quot;#{(Time.now - t) * 1000000} microseconds&amp;quot;

To validate my understanding, i tried some_object.some_has_many_assoc.send(:load_target).each{} , so as to call the ‘each’ method directly on the collection; that ran many times faster than the previous version.

I knew method_missing can lead to performance problems, but din’t expect it to happen at such a magnitude!

DRYing up rails tests

Posted in Rails, Testing by Vikram Venkatesan on November 12, 2009

The rails tests are probably the most repetitive and unstructured pieces of code that we write, in my opinion. We write tests for all scenarios and cases, with each one repeating the same preparation code before the actual test begins. Also, it’s always hard to find whether there is a test for a given scenario/case, no matter how disciplined we are in naming and structuring the tests. This is true for both unit and functional tests.

Cucumber is definitely a sigh of relief. But, it is a tool specifically built to aid BDD, used to write tests at the highest level of abstraction called Integration testing.  It can’t be used to DRY up the unit or functional tests. I personally feel it’s not so productive since the developers write the complete English-Ruby conversion code and there is no reuse of scenarios and structuring. If there are two scenarios differing in the last step, you will be duplicating most the statements.

Since any test case is nothing but a combination of many scenarios (be it unit or functional), what would be helpful is a way to nest the tests based on multiple scenarios. I have tried to come up with a test library that can aid in writing such tests with minimal code and in a more structured way.

The library is available as a plugin @ http://github.com/venkatev/nested_scopes