Row-based Replication

Row-based Replication was introduced in MySQL 5.1. It’s not the default (yet), but I really like it. I wanted to tell you what limitations you may face with statement-based, and why you should join me in making the switch.

Write scalability of certain statements

Statement based binary logging requires additional locking to ensure slave-consistency. To illustrate, here is a naive example:

CREATE TABLE users (
 user_id INT NOT NULL primary key,
 user_name varchar(60) NOT NULL,
 is_deleted TINYINT DEFAULT 0,
 UNIQUE KEY(user_name)
);

INSERT INTO users (user_id, user_name, is_deleted) VALUES
 (1, 'ted', 0),
 (2, 'bill', 0),
 (3, 'fred', 0),
 (4, 'mike', 1),
 (5, 'barry', 0);

session1> START TRANSACTION;
session1> DELETE FROM users WHERE is_deleted = 1; # 1 row matches (user_id 4), deleted.
session2> START TRANSACTION;
session2> UPDATE users SET is_deleted = 1 WHERE user_id = 5; # 1 row matches.
session2> COMMIT;
session1> COMMIT;

Because statements are only written to the binary log once committed, on the slave session #2 would apply first, and result in two rows being updated by session #1, leading to incorrect results!

InnoDB prevents against this race condition by setting additional locking. After session #1 is modified no other sessions will be updated/insert into the range of is_deleted=1 until session #1 commits.

Actually.. it gets even worse than that, because it will lock the rows based on whatever index can be used. In my case, there’s no index so I will lock every rows in the table (eek).

Even with good indexing, statements that have the pattern WHERE col BEWEEN x AND y or WHERE id > N will require gap locking to prevent data being inserted into the range, again to prevent the same inconsistency on the slave.

InnoDB has always had a setting to disable this locking, called innodb_locks_unsafe_for_binlog. As the name suggests, this is totally unsupported and you will likely get replication drift.

However, astute readers will notice that since row-based replication sends the actual row images of changes to the slaves, this race condition problem no longer exists. And indeed, once you switch to row-based replication binlog-format=ROW, and change the isolation level to transaction-isolation=READ-COMMITTED, these additional locks are no longer set!

Batched insert scalability

When you enable row-based replication, it is safe to set innodb_autoinc_lock_mode = 2. This improves auto-increment scalability with batch INSERT statements.

With the default of innodb_autoinc_lock_mode = 1, the per-table auto_increment generator needs to be locked for the duration of batch insert statements. This is because as rows are written to the binary log, only the first INSERT_ID is seeded to the slave, and locking gurantees deterministic slave replay. i.e.

# Binary logged multi-insert
SET INSERT_ID=1;
INSERT into my_table (id, value) VALUES
(NULL, 'Row 1'),
(NULL, 'Row 2'),
(NULL, 'Row 3');

Can you see the race-condition? In between each row we need to protect the auto-increment value so nobody could grab one of the IDs, leading to a conflict on the slave as it tries to just use the next number.

This locking is batched-insert only. It previously also applied to single-row inserts, but that was fixed in MySQL 5.1. You can enable the 5.0 and below behaviour by setting innodb_autoinc_lock_mode = 0, but I can’t think of a good reason to do this.

Reasons to not use Row-based replication

I think the two biggest complaints have always been:

  1. Row-based binary logging produces much larger binary log files. This is largely addressed in 5.6 by setting binlog_row_image=minimal.

  2. Row-based events are harder to debug, since you could not previously see the actual statement that generated the row events. This is addressed in MySQL 5.6 with
    binlog-rows-query-log-events. MySQL 5.7 will also make this easier because the need to hand-audit events is reduced with idempotent binary log replay.

Summary

Statement-based replication was an okay choice in the past where we had single-core machines, but today scaling requires us to lock less, and perform more work in parallel. It’s all about concurrency, concurrency, concurrency.

Have you made the switch? Something else holding you back? Let me know your thoughts in the comments.

Published by

morgo

I joined MySQL AB in 2006, left, and am now back at Oracle working on the MySQL team. I’ve also worked at Percona and InPowered.

  • Shlomi Noach

    Hi Morgan,
    (and congrats on your new position!)

    Like you I made the switch long ago – with the immediate result of eliminating replication drift — no more weird data inconsistencies between master and slaves (that I could find).

    However recently I came back upon the advantages in using SBR; namely the fact that you can create triggers on slaves (without creating them on master) and that they would actually work. With RBR there is no INSERT command on the slave; with SBR there is.

    Two recent use cases:

    1. Doing an online schema change (oak-online-alter-table / pt-tonline-schema-change) on a slave: migrating a slave from InnoDB to TokuDB while still running replication. Both tools rely on triggers.

    2. Doing table auditing (recording changes made to tables by INSERT/DELETE/UPDATE operations) – again via triggers, and without overloading the master.

    3. (Did I say two?) A new tool developed just a couple days ago by my colleague and myself — to be announced shortly. Very interesting functionality, again relying on slave triggers.

    Cheers,
    Shlomi

    • Shlomi – thanks for commenting! In the case of pt-online-schema-change, I know that it changes back to SBR for the session it runs in (which should be enough). But this is a great point to include.

      • Justin Swanhart

        Morgan – it is a problem if you set ROW on a relay slave because it will transliterate from statement->row in the middle (that is, it will write row events instead of statement events) because the SE captures RBR (–log-slave-updates), not the slave thread.

        I recommend that you leave all slaves default to STATEMENT and set the format to ROW (under a FTWRL) when you promote the master.

        Some other things:
        binlog_cache_size should be tuned for RBR otherwise your disk writing will go up by a lot for batches

        max_binlog_cache_size = 4G by default, which means that a transactions can’t change more than 4G of rows without getting an error.

        In versions prior to 5.6 you can diagnose row based replication problems with mysqlbinlog -v. In 5.6 query events are optional. They kind of bloat the log big time though. Remember that UUID events now take up extra space and checksum events too.

        • I try to not tier my slaves when I can.. but your points are valid. The default for binlog_cache_size is unchanged at 32K even with MySQL 5.7. It probably should be larger.

        • Shlomi Noach

          Justin,
          I actually never did get the full picture: is it possible to do RBR to one slave, and have it relay as SBR to another? How about the opposite direction?

          • Justin Swanhart

            You can do SBR -> RBR because it is actually the SE that records the binlog entries. Thus, if you force BINLOG_FORMAT=ROW on the slave, it will re-write any SBR into RBR. This will however, lose the original SBR context and any downstream machines get RBR entries from that point on.

            It is not possible to “upconvert” from RBR -> SBR. This is because a 1 million row write_rows event could come from a LOAD DATA INFILE, a CREATE TABLE .. AS, or an INSERT..SELECT, or even an INSERT .. VALUES statement.

            Basically once an event is RBR it will always be RBR, and SBR can be converted to RBR by a relay slave. Personally I think that either this is a bug, or that there should be an option:
            –log-slave-updates-preserve-binlog-format

          • Shlomi Noach

            Thanks, Justin.
            I think it is important to have ABR convertible to RBR – otherwise how would one migrate her SBR entire topology to RBR?

  • Justin Swanhart

    Keep in mind that switching to READ-COMMITTED has some other side effects. Gaps and next-key locks are indeed no longer taken. Extra locks from unindexed or not-well-indexed scans, however, are released too. That is, those locks that did not actually match the WHERE clause but were scanned (and thus locked) are unlocked. This can get you higher overall concurrency but there is a higher chance of deadlock.

    http://www.mysqlperformanceblog.com/2012/08/28/differences-between-read-committed-and-repeatable-read-transaction-isolation-levels/

  • Great post Morgan. Avoiding drift on slaves is a huge reason to switch to row-based replication. Also the idempotent binary log replay mode I didn’t know was coming. That will probably put the nail in the coffin of statement based and bring MySQL replication into the big leagues.

  • Some additional reasons to use Row-based replication by Ike Walker: http://mechanics.flite.com/blog/2013/11/18/an-advantage-of-mysql-row-based-replication/

  • This is also a valid reason not to use the third option for binlog_format: mixed.

  • Robby

    Can anyone tell me what the process would be to switch to rbr from an existing system using sbr? Is it as simple as changing the mysql settings or is it best to resync all data after making the change?

  • Pingback: database mysql 5.6初始配置调优 | 极客521()

  • Zain Syed

    I have enabled Row based replication, but create user is being sent to the slave as well. Why is this happening? Mysql version 5.6