A Journey On Rails

ActiveRecord :joins and :include

Posted in Rails, Uncategorized by Vikram Venkatesan on February 4, 2010

Yesterday, while fixing a bug related to eager loading in our project, I learned a few new things about :joins and :include.

Eager loading using :include with conditions

Consider there are two models, User and Experience(title: string, :user_id: integer,…) and User model has the following named scopes.

class User
 # With 'professor' title.
 named_scope :professors,
 :include => :experiences,
 :conditions => "experiences.job_title = 'professor'"

 # With the given title
 named_scope :with_title, lambda{|title|
   {:include    => :experiences,
    :conditions => ["experiences.job_title IN (?)", title]
   }
 }
end

Say, User.professors returns [u1, u2]. When we call u1.experiences, we will get only the professor experience of u1, even if there are other experiences. That’s because experiences association is eager loaded. So, u1.experiences does not trigger a SQL. It just returns what it fetched. Hence, those records that were filtered out will not be returned.

Solution: Use :joins key, which is the right way to join with an association for filtering. :include must be used only when the intent is to eager load the collections. For some reasons, we used :include instead of :joins in our project, and that paid of well in the form of bugs.

Another interesting learning related to joins and include was around the kind of queries that will be fired by them. I found this article really helpful in understanding that.

Beware of duplicates when using :joins

Replacing :include with :joins solved the eager loading issue. But, it created another problem; duplicate records were returned by the named scope. Say, we call User.with_title([‘professor’, ‘lecturer’]) to get all professors and lecturers. Since :joins always generate an INNER JOIN in a single SQL query, it may result in duplicate records being returned. On the other hand, :include works by firing separate SQL’s to load the associations to be eager loaded (this behaviour is since Rails 2.0; earlier, :include used to fire single query similar to :joins). It first makes a query to fetch the main table records (here, User). Then, it uses the id’s of the fetched records to make another query to fetch the other associations to be loaded, thus giving raise to n + 1 sql queries, where n is the number of associations to be loaded. Following are some sample SQL queries (for the models, say Tree and Branch, where Tree has_many Branches and the named scope is present in Tree model) that would be fired by the :include call.

SELECT trees.* FROM branches WHERE trees.name = 'Banyan'

# Say, the above query returns trees with ids [5, 7, 10]
SELECT branches.* FROM branches WHERE tree_id IN (5, 7, 10)

In the above Tree and Branch exmaple, the :conditions did not include any of the included tables. If your named scope (or any ActiveRecord finder statement) references the included tables’ columns in either conditions or order, then :include uses a single query with LEFT OUTER JOINs to load all the included associations, thus working almost the same as :joins. The reason I said almost is that, though both of them use a single query using JOINs, :include ensures duplicate records are not returned (in our case, the User records).

:joins and :include … when to use what?

Here are a few rules that I myself follow when using :joins and :include

  • Use :joins if you just want to use the association in conditions or ordering.
  • Use :include if you want to eager load the association using fewer queries thus avoiding the 1 + N problem.
  • Do not use the included associations in the conditions or order clause.

As per the rules, we must be using :joins. So, lets rewrite our sample example.

class User
 # With 'professor' title.
 named_scope :professors,
             :joins      => :experiences,
             :conditions => "experiences.job_title = 'professor'"

 # With the given title
 named_scope :with_title, lambda{|title|
   {:joins      => :experiences,
    :conditions => ["experiences.job_title IN (?)", title]
   }
 }
end

How to solve the duplicate problem?

Ok, we know :joins may result in duplicate records and :include doesn’t. But, just for that, we shouldn’t use :include, which was the mistake I did earlier. One way to fix it is by telling rails to fetch only DISTINCT records using the :select key.

class User
 # With 'professor' title.
 named_scope :professors,
             :select     => "DISTINCT users.*"
             :joins      => :experiences,
             :conditions => "experiences.job_title = 'professor'"

 # With the given title
 named_scope :with_title, lambda{|title|
   {:select     => "DISTINCT users.*",
    :joins      => :experiences,
    :conditions => ["experiences.job_title IN (?)", title]
   }
 }
end

User.with_title([‘professor’, ‘lecturer’]) will now fetch only distinct user records (say, 2 records)! But, User.with_title([‘professor’, ‘lecturer’]).count will return 3. That’s because, whenever we override the :select fragment, ActiveRecord simple replaces it with SELECT COUNT(table.*) to construct the COUNTER sql. That is the reason, whenever we use :finder_sql, we must also specify :counter_sql too (rails doc also warns about this). In our case, SELELCT DISTINCT users.* will be replaced with SELECT users.*, thus counting duplicates too. Same will be the case with empty? and any? calls too.  A :counter_sql like option even if there, wouldn’t be the right choice since we may have to specify the complete sql which may not be possible. One solution that worked for me was by overriding the count method on the named scope’s collection, as shown below

class User
  # With 'professor' title.
  named_scope(:professors,
              :select     => "DISTINCT users.*"
              :joins      => :experiences,
              :conditions => "experiences.job_title = 'professor'") do
    # Delegate to length
    def count; length; end
  end

  # With the given title
  named_scope(:with_title, lambda{|title|
     {:select     => "DISTINCT users.*",
      :joins      => :experiences,
      :conditions => ["experiences.job_title IN (?)", title]
     }
  } do
    # Delegate to length
    def count; length; end
  end

end

Readonly records returned by :joins

When you pass a SQL fragment to :joins, the resulting records will be readonly (http://api.rubyonrails.org/classes/ActiveRecord/Base.html – ‘find’ rdoc mentions about that too). Same applies to any finder; named scopes too. This is because, other tables’ attributes are also fetched using a JOIN query, and when you try to save that object, rails won’t know how to save those extra attributes. You can pass :readonly => false to bypass the behaviour.

What about eager loading?

If you noticed, we lost the eager loading benefits by using :joins. Using both :joins and :include in the same finder will result in table aliasing error where the same table is joined twice by active record. If someone knows a way to do this, please do share.

<br />
class User<br />
 # With 'professor' title.<br />
 named_scope :professors,<br />
 :include =&gt; :experiences,<br />
 :conditions =&gt; &quot;experiences.job_title = 'professor'&quot;</p>
<p> # With the given title<br />
 named_scope :with_title, lambda{|title|<br />
 {:include    =&gt; :experiences,<br />
 :conditions =&gt; [&quot;experiences.job_title IN (?)&quot;, title]<br />
 }<br />
 }<br />
end<br />