Row-based Replication

Row-based Replication was introduced in MySQL 5.1. It’s not the default (yet), but I really like it. I wanted to tell you what limitations you may face with statement-based, and why you should join me in making the switch.

Write scalability of certain statements

Statement based binary logging requires additional locking to ensure slave-consistency. To illustrate, here is a naive example:

CREATE TABLE users (
 user_id INT NOT NULL primary key,
 user_name varchar(60) NOT NULL,
 is_deleted TINYINT DEFAULT 0,
 UNIQUE KEY(user_name)
);
INSERT INTO users (user_id, user_name, is_deleted) VALUES
 (1, 'ted', 0),
 (2, 'bill', 0),
 (3, 'fred', 0),
 (4, 'mike', 1),
 (5, 'barry', 0);
session1> START TRANSACTION;
session1> DELETE FROM users WHERE is_deleted = 1; # 1 row matches (user_id 4), deleted.
session2> START TRANSACTION;
session2> UPDATE users SET is_deleted = 1 WHERE user_id = 5; # 1 row matches.
session2> COMMIT;
session1> COMMIT;

Because statements are only written to the binary log once committed, on the slave session #2 would apply first, and result in two rows being updated by session #1, leading to incorrect results!
InnoDB prevents against this race condition by setting additional locking. After session #1 is modified no other sessions will be updated/insert into the range of is_deleted=1 until session #1 commits.
Actually.. it gets even worse than that, because it will lock the rows based on whatever index can be used. In my case, there’s no index so I will lock every rows in the table (eek).
Even with good indexing, statements that have the pattern WHERE col BEWEEN x AND y or WHERE id > N will require gap locking to prevent data being inserted into the range, again to prevent the same inconsistency on the slave.
InnoDB has always had a setting to disable this locking, called innodb_locks_unsafe_for_binlog. As the name suggests, this is totally unsupported and you will likely get replication drift.
However, astute readers will notice that since row-based replication sends the actual row images of changes to the slaves, this race condition problem no longer exists. And indeed, once you switch to row-based replication binlog-format=ROW, and change the isolation level to transaction-isolation=READ-COMMITTED, these additional locks are no longer set!

Batched insert scalability

When you enable row-based replication, it is safe to set innodb_autoinc_lock_mode = 2. This improves auto-increment scalability with batch INSERT statements.
With the default of innodb_autoinc_lock_mode = 1, the per-table auto_increment generator needs to be locked for the duration of batch insert statements. This is because as rows are written to the binary log, only the first INSERT_ID is seeded to the slave, and locking gurantees deterministic slave replay. i.e.

# Binary logged multi-insert
SET INSERT_ID=1;
INSERT into my_table (id, value) VALUES
(NULL, 'Row 1'),
(NULL, 'Row 2'),
(NULL, 'Row 3');

Can you see the race-condition? In between each row we need to protect the auto-increment value so nobody could grab one of the IDs, leading to a conflict on the slave as it tries to just use the next number.
This locking is batched-insert only. It previously also applied to single-row inserts, but that was fixed in MySQL 5.1. You can enable the 5.0 and below behaviour by setting innodb_autoinc_lock_mode = 0, but I can’t think of a good reason to do this.

Reasons to not use Row-based replication

I think the two biggest complaints have always been:

  1. Row-based binary logging produces much larger binary log files. This is largely addressed in 5.6 by setting binlog_row_image=minimal.

  2. Row-based events are harder to debug, since you could not previously see the actual statement that generated the row events. This is addressed in MySQL 5.6 with
    binlog-rows-query-log-events. MySQL 5.7 will also make this easier because the need to hand-audit events is reduced with idempotent binary log replay.

Summary

Statement-based replication was an okay choice in the past where we had single-core machines, but today scaling requires us to lock less, and perform more work in parallel. It’s all about concurrency, concurrency, concurrency.
Have you made the switch? Something else holding you back? Let me know your thoughts in the comments.