Eager loading

I’ll start with something simple as I promised before. Eager loading is one of the basic features of Rails but when it’s missing or misused it can lead to huge performance hits.

For those who don’t know what eager loading is please refer to lazy loading. When you figured that out, eager loading is the opposite - it let’s you fetch the same data using less queries to the database. The classical example is a blog which has posts and these posts have comments. If you have this in your Post model:

has_many :comments

And this in your Comment model:

belongs_to :post

And this in your controller:

@posts = Post.find(:all)

And finally this in your view:

<% @posts.each do |post| %>
  <%= post.content %>
  <% post.comments.each do |comment| %>      
    <%= comment.content %>
  <% end %>  
<% end %>

You’d be making a lot of SQL queries to display this view, since it would query the database every time you call comment.content to fetch that specific comment data. If you had 10 posts with 10 comments each, you’d be making 101 queries:

This is where Eager Loading comes in! If you change your controller to this:

@posts = Post.find(:all, :include => :comments)

Rails would only have to make 2 queries:

Watch out for :conditions and :order on eager loaded data

Rails recently changed the way it fetches data using eager loading. In the most recent versions it uses the strategy I stated above but in previous versions it did all the work in one query using sql’s LEFT JOIN - this changed because a couple of smaller queries were much faster that only one big query using left joins. Be carefull when coding, although, since you shouldn’t reference eager loaded tables in conditions or when ordering - Rails will behave differently. The following snippet of code:

@posts = Post.find(:all, :include => :comments, :order => "comments.id")

Would use only one query with a LEFT JOIN inside since it needs to order all data (posts+comments) using information from the eager loaded table. However the best solution is to apply ordering or conditions in the association itself. You can change the model into this:

has_many :ordered_by_id_comments, :class_name => 'Comment', :order => "id"

And then include ordered_by_id_comments, which would return the expected result without using left joins. Don’t also forget to watch out for unintended consequences, like applying a condition like:

@posts = Post.find(:all, :include => :comments, :conditions => ["comments.content LIKE %?%", "first"])

This query, since it would use a LEFT JOIN, would also discard all posts without a comment with the word “first” on any of it’s comments. Be careful when doing this since Rails won’t behave in the way you’ve expected it to.

Eager loading of deeper associations

You can eager load deep hierarchies of associations using hashes. In our earlier example, imagine comments had authors and you wanted to display their names accordingly. Now image that these authors had a referring website, on another table, and you wanted to display them too:

<% @posts.each do |post| %>
  <%= post.content %>
  <% post.comments.each do |comment| %>
    <%= comment.content %>
    <%= comment.author.name %>
    <%= comment.author.referrer.url %>    
  <% end %>
<% end %>

Even if you still had eager loaded the post’s comments, you’d still be making a single query for each comment.author and another one for each comment.author.referrer since that information is in 2 different separate tables. There’s an easy fix:

@posts = Post.find(:all, :include => {:comments => {:author => :referrer}})

This would fetch all information in 4 queries:

Which is much better then making a lot of queries for each individual comment, author and referrer.

On a side note, you should always try to limit the data you fetch from the database since this can really speed up your queries. If you only need the Post content on that view, your find statement should look like this:

@posts = Post.find(:all, :select => "content", :include => {:comments => {:author => :referrer}})

This would speed up the query since it would only fetch the needed data.

Time to go pragmatic

I’ve created a rails application for testing purposes with posts and comments and the associations I mentioned. I created 554 posts and 165180 comments with random content. I also associated each comment with a random post. I’ll measure the time it takes to fetch the data from the database.

First lets display all posts and their comments, without using eager loading.

Comment Load (442.5ms)   SELECT * FROM `comments` WHERE (`comments`.post_id = 552)
Comment Load (452.4ms)   SELECT * FROM `comments` WHERE (`comments`.post_id = 553)
Comment Load (446.0ms)   SELECT * FROM `comments` WHERE (`comments`.post_id = 554)
Completed in 283061ms

283 seconds to load all the information? No one can afford to waste that much time. This is where eager loading comes in. Let’s use it on comments and watch the result:

Completed in 21169ms (View: 7274, DB: 1509) | 200 OK [http://localhost/posts/]

Less than 10% of the time. Now let’s try to only fetch the information we’re interested in. In this example it means discarding the created_at and updated_at fields, only loading the id (needed because of eager loading) and content of each post - we’re still bringing in the largest field. The results are:

Completed in 18488ms (View: 5776, DB: 1489) | 200 OK [http://localhost/posts]

We’re 3 seconds down. Now let’s see what happens if we only fetch the content of each post, discarding it’s id, created_at and updated_at fields.

Completed in 13306ms (View: 2470, DB: 424) | 200 OK [http://localhost/posts]

Down to 13 seconds. It’s currently taking less than 5% of the time it took on a normal find. Now imagine this on real life examples, eager loading lot’s of tables and selecting a few fields from tables with 15+ rows. In that case, we’re not talking about 5% cut-down but a much greater improvement.

Side notes and additional resources

Eager loading is a powerful tool but it should be used carefully. You don’t want to be loading too much information since it can slow your application.

There’s an awesome gem that helps you out with that - Bullet. It’ll warn you when you have a N+1 case (you’re loading information without eager loading) and it’ll also warn you when you’re eager loading data that it’s not going to be used. It brings an extra gift - a counter cache warning, every time it’s missing and should be present. Don’t only rely on this gem since it can be wrong or work unexpectedly in some cases - always remember to recheck your code.

Posted 2 years ago • Comments
blog comments powered by Disqus
Page 1 of 1