In Ruby on Rails there are two fundamental approaches for N:M associations: has_and_belongs_to_many and has_many :through. Both have their advantages and disadvantages. That means both types are relevant likewise.
But the story adheres tenaciously, that has_many :through is to preferred generally, because of its flexibility. Sure, it is more flexible in some terms. The has_many :through model can be provided with additional attributes without changing the association itself. However the flexibility is not necessary in each case. Essentially has_and_belongs_to_many is more lightweight and leaves a smaller footprint.
Worrying about a possibly necessary structural change should never be the reason for a decision in favour of has_many :through. Angst is always a bad adviser.
By all means the migration from a has_and_belongs_to_many to a has_many :through association is as easy and natural as the following example.
Basically there are users having many addresses and each address can have many users:
class User < ActiveRecord::Base
has_and_belongs_to_many :addresses
end
class Address < ActiveRecord::Base
has_and_belongs_to_many :users
end
That relationship is set up quickly, since it simply requires the two attributes user_id and address_id in the intermediate table named addresses_users by default:
$ rails g migration CreateAddressesUsersJoinTable users addresses && rake db:migrate
Done.
Migrating the existing relationship into a has_many :through is pretty easy if necessary.
1. The intermediate model
Beginning with the model for the intermediate table user_addresses with additional time stamp attributes:
$ rails g migration UserAddress user:belongs_to address:belongs_to && rake db:migrate
generates and executes following migration:
class CreateUserAddresses < ActiveRecord::Migration
def change
create_table :user_addresses do |t|
t.belongs_to :user, index: true, foreign_key: true
t.belongs_to :address, index: true, foreign_key: true
t.timestamps null: false
end
end
end
2. Set the has_many :through association
The old HABTM association has to be renamed temporarily, for the reason of setting the has_many :through association unambiguously (one end of the relationship is sufficient). That makes the model up for copying the data (association links):
class User < ActiveRecord::Base
has_and_belongs_to_many :deprecated_addresses,
join_table: :addresses_users, class_name: Address,
association_foreign_key: :address_id
has_many :user_addresses
has_many :addresses, through: :user_addresses
end
3. Migrating the existing data
This step is necessary to copy the existing association links. The migration file:
$ rails g migration MigrateAddressesUsers
creates UserAddress objects:
class MigrateAdressesUsersAssociations < ActiveRecord::Migration
def change
User.transaction do
User.find_each do |user|
user.addresses = user.deprecated_addresses
end
end
end
end
It absolutely makes sense to migrate the association table data in batches. By default find_each fetches 1000 records, but the batch size depends on the specific use case.
The migration process so far can be deployed. Clearing the deprecated HABTM relationships should be executed in a second deploy.
5. Dropping the HABTM table and relationship
The migration file for dropping the obsolete HABTM table should be migrated In a second deploy:
$ rails g migration DropTableAddressesUsers
It should look like:
class DropTableAddressesUsers < ActiveRecord::Migration
def change
drop_table :addresses_users
end
end
Finally migrating it:
$ rake db:migrate
And both models without the HABTM associations:
class User < ActiveRecord::Base
has_many :user_addresses
has_many :addresses, through: :user_addresses
end
class Address < ActiveRecord::Base
has_many :user_addresses
has_many :users, through: :user_addresses
end
Conclusion
The process of migrating a has_and_belongs_to_many to a has_many :through relationship looks more extensive than it actually is. But some steps are necessary anyway, if there was a decision in favour of the has_many :through association in the first place. Actually it merely comes with additional effort for migrating the links (association data). However separating the migration path into 2 isolated deploys is necessary, due to copying the existing links.
Basically HABTM is the association of choice, unless there is a reasonable chance, that the relationship will have additional attributes. It is because a dedicated HABTM relationship comes with higher performance and a migration towards a has_many :through relationship is unproblematic, if ever necessary at all.