Eager loading: preload vs. eager_load

Ruby on Rails has a long history of so called eager loading, which is meant to address the N+1 queries problem.

Eager loading

The Ruby on Rails Guide explains exactly what is meant by Eager Loading:

Active Record lets you specify in advance all the associations that are going to be loaded. This is possible by specifying the includes method of the Model.find call. With includes, Active Record ensures that all of the specified associations are loaded using the minimum possible number of queries.

There are 2 approaches resulting in 2 different kind of SQL.

The example

The example is based on the associations:

class Order < ApplicationRecord

class User < ApplicationRecord
  has_many :orders

Eager loading is relevant whenever all User and their orders have to be loaded:

User.all.each { |user| user.orders.map(&:total) }

This naive approach generates following SQL:

SELECT "users".* FROM "users";
SELECT "orders".* FROM "orders" WHERE "orders"."user_id" = 1;
SELECT "orders".* FROM "orders" WHERE "orders"."user_id" = 2;
SELECT "orders".* FROM "orders" WHERE "orders"."user_id" = 3;

The example describes the N + 1 problem: one additional database query to the orders table is made per user.
Rails provides solutions.

1.) eager_load

Querying with ActiveRecord::QueryMethods#eager_load:


generates a LEFT JOIN with the associated table (orders):

SELECT "users"."id" AS t0_r0, "users"."email" AS t0_r1, "users"."created_at" AS t0_r2, "users"."updated_at" AS t0_r3, "orders"."id" AS t1_r0, "orders"."user_id" AS t1_r1, "orders"."product_id" AS t1_r2, "orders"."created_at" AS t1_r3, "orders"."updated_at" AS t1_r4
FROM "users"
  ON "orders"."user_id" = "users"."id";

It results in one single huge statement, instead of the original N + 1 database queries. ActiveRecord iterates over the result set and builds the corresponding User and Order objects.
This approach is robust, especially in case of references in the selection (WHERE condition) to the joined tables.
Rails uses the target table for referencing in the *WHERE`condition, if no table was defined:

User.eager_load(:orders).where(created_at: Time.current)


SELECT "users"."id" AS t0_r0, "users"."email" AS t0_r1, "users"."created_at" AS t0_r2, "users"."updated_at" AS t0_r3, "orders"."id" AS t1_r0, "orders"."user_id" AS t1_r1, "orders"."product_id" AS t1_r2, "orders"."created_at" AS t1_r3, "orders"."updated_at" AS t1_r4
  FROM "users"
  LEFT OUTER JOIN "orders"
    ON "orders"."user_id" = "users"."id"
  WHERE "users"."created_at" = '2017-02-27 22:23:34';

2.) preload

The other approach with ActiveRecord::QueryMethods#preload:


can be an alternative. It generates SQL, which includes an additional database query per association:

SELECT "users".* FROM "users";
SELECT "orders".* FROM "orders" WHERE "orders"."user_id" IN (1, 2, 3);

Afterwards ActiveRecord iterates over the result sets again and creates appropriate objects.
However, the selection can not contain a reference to associated tables. Because otherwise it ends up in SQL errors:

User.preload(:orders).where(user_id: 1)
# => ActiveRecord::StatementInvalid: SQLite3::SQLException: no such column: users.user_id

On the other hand, preload clearly is an optimization. Unnecessary database queries can be avoided.
So, for example, if no user is found:

User.where(created_at: 1.day.from_now).preload(:orders)
# => []

There obviously can not be any related orders. Then the query for orders is skipped:

SELECT "users".* FROM "users" WHERE "users"."created_at" = '2017-02-27 22:33:59';

Convenience: includes

These two approaches are the foundation for ActiveRecord::QueryMethods#includes.
Since Ruby on Rails 4, the includes must be explicitly defined with ActiveRecord::QueryMethods#references, if a selection refers to a table. Depending on whether it refers to the target table or a joined table, then a preload or an eager_load is performed.
In case of referring to the target table:

    .where(created_at: 1.day.ago..Time.current)

The resulting SQL:

SELECT "users".* 
  FROM "users"
  WHERE ("users"."created_at" BETWEEN '2017-02-26 22:38:18' AND '2017-02-27 22:38:18');
SELECT "orders".*
  FROM "orders"
  WHERE "orders"."user_id" IN (1, 2, 3);

In case of referencing a JOIN table:

    .where("paid_at < ?", 1.year.ago)

The resulting SQL:

SELECT "users"."id" AS t0_r0, "users"."email" AS t0_r1, "users"."created_at" AS t0_r2, "users"."updated_at" AS t0_r3, "users"."paid_at" AS t0_r4, "orders"."id" AS t1_r0, "orders"."user_id" AS t1_r1, "orders"."product_id" AS t1_r2, "orders"."created_at" AS t1_r3, "orders"."updated_at" AS t1_r4
  FROM "users"
  LEFT OUTER JOIN "orders"
    ON "orders"."user_id" = "users"."id"
  WHERE "paid_at" = '2016-02-27 22:42:25';

With includes, one also has to consider table is referenced in the WHERE condition. Whenever the selection refers to a joined table, references is mandatory. Otherwise, there is either a SQLException (no suchColumn) or even worse, it returns an erroneous result set.
If there is no references defined, Rails assumes the selection to refer to the target table (like with eager_load). Basically, it always makes sense to explicitly reference the columns table name. Otherwise, there might be a SQL exception (ambiguous column name).