Proposal to change MySQL replication defaults

In the MySQL team @ Oracle, we've been discussing changing the default values for MySQL replication so that they are safer to use.

Specifically:

Setting Current Value Proposed New Value
sync_binlog 0 1
master-info-repository FILE TABLE
relay-log-info-repository FILE TABLE

I have written about all of these settings before in my post on deciding whether or not to make MySQL durable. The short version is that:

  • By default, if MySQL crashes you could lose data.
  • You don't get to chose what you lose, it could be a $10 order, or a $10M order.
  • Independent of losing orders, DBA time is expensive. Having to perform post-failure investigation to handle discrepancies has a high cost.
  • The cost of durability is much lower with SSDs. In a comment, Daniël van Eeden also correctly said that the cost is much lower in 5.6 with group commit (thanks Daniël!).
  • I recommend most users should be running with MySQL as fully durable.

Now to explain each of these proposed changes:

Sync_binlog

By setting sync_binlog=1 you are ensuring that any changes written to the binary log will not be lost in the event of a crash. This is a problem because with the current default of sync_binlog=0, a replication master crashing may lose events and require all slaves to be re-imaged with a fresh copy of data to be consistent.

It's important to note that changing this setting has a downside: syncing the binary log causes a performance impact. As I mentioned above, this was reduced significantly in MySQL 5.6 with group commit, but it will still be evident in some workloads. The most unfortunate may be in single-threaded performance on systems with hard-drives and no RAID controller with a battery backed write cache.

Master-info-repository/relay-log-info-repository

These two options were first introduced in MySQL 5.6 as part of a feature called transactional replication. When both set to TABLE, they store the internal replication coordinates in an InnoDB table, and COMMIT changes as part of the same transaction as the replication events being applied.

What this means in practical terms, is that slaves are in a consistent state when they crash. Replication can just resume processing events from the point of failure, and data does not need to be re-imaged from the master or another slave.

Conclusion

As with previous proposals, we are seeking feedback on how these changes will impact you.

  • Do you currently use sync_binlog and transactional replication?
  • Have you considered these features, but decided not to switch? (This is a case where we'd really love to hear from you).
  • Do you agree that these changes will make MySQL safer, and easier to administer?

Please leave a comment, or get in touch!

  • Kristian Köhntopp

    When enabling TABLE for master relay info repository, you are introducing InnoDB tables to mysql.*

    I welcome the change, but request that you

    - do that when you turn mysql.* into InnoDB anyway
    - get rid of the log tables in mysql.* as well, as they have no place in a config schema anyway

    It is useful to limit the number of engines in your config schema in order to simplify administration. It is also useful to limit the size of your config schema in order to enable fast restore. Log tables in a config schema work against that.

    • http://www.tocker.ca/ Morgan Tocker

      I have to agree with you here - I don't see a good fit got the log tables moving forward: http://www.tocker.ca/2013/11/21/commentary-on-mysql-slow-query-collection-sources.html

      Would you agree that deprecation + removal (rather than relocating to elsewhere) is the correct approach?

      • Kristian Köhntopp

        I agree that the config schema is not the correct place for this.

        I can see why having that information in a table instead of a log file is sexy - it makes stuff searchable.

        I agree that the information logged at the moment is incomplete (when looking just at that log). And I know about other ways of getting that information out of the server, i.e. P_S and tooling for that.

        So

        - if you are keeping the log files around, and if you want to keep them around as tables
        -- these tables cannot be MyISAM, ever, or anything else that table locks, because otherwise you are again limiting server performance with one giant lock. That's something you have been working very hard to get rid of.
        -- these tables cannot reside in the mysql.* config schema.
        -- these tables or logs need to include more complete information that presently is available

        Actually, I'd like to see TABLE_CATALOG put to use and have a sys namespace that includes current mysql.* as sys.config. There might be a sys.log, the ps_helpers and friends as sys.tools and p_s and i_s might also go there.

        That would be completely separated from any user catalog and the schemas and tables in there. Much cleaner that the current setup.

        On the other hand there little point in reinventing something that pumps p_s data into fluentd, elasticsearch and kibana (or graphite and grafana). Plenty of these things exist home cooked or ready made, and they are proven to scale.

        Poor windows guys, I don't know how much of that works on their platform.

  • Kristian Köhntopp

    I have had several HP machines running a very extremely heavy write load on SSD for over a year:

    MySQL 5.6 on machines that hold materialized views of Booking.com's hotel room availability data. These boxes see a lot of writes and need to run on SSD for write performance.

    After initial problems with SSD reliability these machine got the magic firmware update and since then show pretty stable performance and no failures. Their SSD lifetime prediction is also not alarming.

    A redo log will usually generate write load only on a single hot point on the SSD and their internal wear leveling can deal with that just fine.

    It's the graphite boxes with thousands of *.wsp files that are concurrently being written to, that eat through SSD in weeks, despite memcache write caching and stuff.