MySQL 5.6.14 Community Release Notes

Thank you to the MySQL Community, on behalf of the MySQL team @ Oracle. Your bug reports, testcases and patches have helped create a better MySQL 5.6.14.

In particular:

  • Thanks to Justin Swanhart for pointing out an issue with statistics on partitioned tables – bug #69179. While we marked Justin’s bug as a duplicate, Justin continued to provide commentary on an earlier bug opened, and we appreciate his feedback.
  • Justin Swanhart also noticed a change in behaviour in MySQL 5.6 with an assertion raised in InnoDB – bug #70007. We’ve since reverted to the previous behaviour.
  • Thanks to Zhai Weixiang, who noticed that InnoDB code intended to only run in debug versions of MySQL was unintentionally running in all versions of mysql. Bug #69724.
  • Thanks to both Robert Lineweaver and Ryan Kenney for reporting bugs #69693 and #69707, as well as providing excellent test cases and research into versions effected. The problems both related to InnoDB mishandling some foreign key names.
  • Thanks to Roel Van de Paar who reported an assertion when innodb_log_group_home_dir does not exist. Bug #69000
  • Thanks to Elena Stepanova for reporting bug #65146. From Elena’s contribution, we both updated the documentation and now produce a warning when START TRANSACTION WITH CONSISTENT SNAPSHOT is used in isolation levels other than REPEATABLE READ.
  • Thanks to Mark Callaghan for reporting an issue with InnoDB’s background thread and shutting down the mysql server. Bug #63276.
  • Thanks to Alexey Kopytov for detecting a situation where an infinite loop could occur in handling InnoDB’s compressed pages. Alexey’s bug report contained detailed code analysis and was very helpful. Bug #61132.
  • Thanks to Shahriyar Rzayev for reporting the original bug, and uploading detailed data to assist in reproduction. It was from these contributions we were later able to devise a testcase and reproduce the bug internally! Bug #69898
  • Thanks to Justin Swanhart and Tsubasa Tanaka who both identified a bug in MASTER_DELAY not type checking input. Bugs #69249 and #69469.
  • Thanks to Hartmut Holzgraefe for reporting an issue with Row-based-replication and decimal data type between MySQL versions. Bug #57166.
  • Thanks to Mark Callaghan for reporting a performance regression in 5.6. Bug #68825
  • Thanks to Jean Weisbuch for identifying an issue with information_schema. Bug #68076.
  • Thanks to Simon Mudd for discovering an issue where the event scheduler did not correctly report when it could not create a new thread. Bug #67191
  • Thanks to Sergey Petrunya who reported bug #67507, as well as the community of users who provided input on the bug – mysqldump is often run from cron, so changes in behaviour risk backups not being performed.
  • Thanks to a private bug reporter who demonstrated that MySQL inadequently protects against stack overflow, and provided an example patch. Bug #35019
  • Thanks to “Dude Letme BE” who reported symbols were missing from libmysql.dll. Bugs #69204, #62394.
  • Thanks to Chito Angeles for reporting a bug in InnoDB fulltext search, and reducing it to a simplified testcase. Bug #69932.

Thank you again all the names listed above. In particular, I would like to call out the two names that appear more than once: Justin Swanhart (3), Mark Callaghan (2).

- Morgan

How do you use the Query Cache?

We are looking for community feedback on the use-cases for the Query Cache in MySQL.

As astute followers will notice, the query cache is now off by default in 5.6, but realistically it’s always been disabled because the previous configuration was ON with zero memory allocated to it.

The reason for its disabling, is that the query cache does not scale with high-throughput workloads on multi-core machines. This is due to an internal global-lock, which can often be seen as a hotspot in performance_schema.

The ideal scenario for the query cache tends to be largely read-only, where there are a number of very expensive queries which examine millions of rows only to return a few. A hypothetical example might be a complex query to build a list of values for a drop-down list that always appears on a webpage form. In a situation like this, the query cache can mask performance problems caused by missing indexes, which makes it helpful for novice users.

My opinion however (and it does not necessarily reflect that of the MySQL team), is that this scenario is becoming less frequent:

  • Many users are building queries via ORMs, which can often offer caching directly.
  • Tooling is getting better, and novice users can discover missing indexes via things like MySQL Workbench’s visual explain.
  • Any updates can invalidate the cache, and good performance practice teaches us that we should not pay attention to the average, but try and tighten the variance between user requests.

Have I missed anything? Please leave a comment, or get in touch. I would love to hear how you are using the Query Cache.

MySQL Connect Highlights

I’m now back home in Toronto, recovering from a very busy weekend at MySQL Connect.

This was my first MySQL Connect, and I really enjoyed it. It was great to be able to meet old friends from my MySQL AB days, but also meet some of the new faces that work on MySQL at Oracle. As Edward Screven mentioned in his keynote, the MySQL team is twice the size under Oracle that it was at Sun.

Some additional highlights for me:

  • Saturday Morning’s Keynote (now online).
    The video includes both Edward Screven & Tomas Ulin’s keynotes as well as Mark Leith demonstrating the new MySQL Enterprise Monitor 3.0 with agentless / cloud friendly architecture and using Performance Schema.
  • The announcement of MySQL Fabric.
  • Hearing Facebook, Twitter & Mozilla’s positive reinforcement – they love the new features in MySQL 5.6 and plan to upgrade.
  • Replication changes in 5.7 – Intra-Schema parallel slaves and labs demo of multi-source. Improved semi-sync performance.
  • Speaking to users about replication in general. A lot of people are planning to switch to GTIDs to simplify administration.
  • Speaking to users at the booth about our new Workbench 6.0 GA – everyone is excited.
  • Participating in our Storage Engine Partner Meeting, and talking about some of the new optimizer enhancements in 5.6 – MRR, BKA and ICP.
  • InnoDB changes in 5.7, nice to read Sunny’s blog and note we don’t need to explicitly start read-only transactions.
  • 5.6.14 and 5.5.34 releases.

Locking and Concurrency Control (Video)

A couple of months ago, Freshbooks commissioned me to build a presentation on Locking and Concurrency Control. You can see the result here:

With Freshbooks being a Cloud Accounting platform, they wanted to be able to provide new Developers training, to make sure good practices were followed.

My thanks to them for letting me share the content publicly!

What to tune in MySQL 5.6 after installation

As the result of
a number of improvements to default values, MySQL 5.6 requires far less configuration than previous versions of MySQL. Having said that, I wanted to write about the settings that you may need to change.

InnoDB Settings

  • innodb_buffer_pool_size – Defaults to 128M. This is the main setting you want to change, since it sets how much memory InnoDB will use for data+indexes loaded into memory. For a dedicated MySQL server, the recommended size is 50-80% of installed memory. So for example, a server with 64G of RAM should have around a 50G buffer pool.

    The danger of setting this value too high is that there will be no memory left for the operating system and some MySQL-subsystems that rely on filesystem cache such as binary logs, and InnoDB’s transaction logs.

  • innodb_log_file_size – defaults to 48M. High write throughput systems may want to increase this to allow background checkpoint activities to smooth out over longer periods of time, leading to improved performance. Values up to 4G are quite safe. Historically, the downside of operating with a large log file was increased crash recovery times, but this has vastly improved in 5.5 and 5.6.

  • innodb_flush_method – defaults to fdatasync. if you are using a hardware RAID controller, you may want to set this to O_DIRECT. This prevents the “double buffering” effect of when reading InnoDB buffer pool pages there is both a copy in InnoDB and in operating system’s filesystem cache.

    If you do not have a hardware RAID controller, or with some SANs O_DIRECT may lead to worse performance. The manual and Bug #54306 clarify this in more detail.

  • innodb_flush_neighbors – defaults to 1. This setting should be set to 0 (disabled)
    on SSDs which do not have any performance gains with sequential IO. In some hardware configurations it may also be beneficial to disable with RAID, because logically sequential blocks are not guaranteed to be physically sequential.

  • innodb_io_capacity and innodb_io_capacity_max – these settings influence how much background work InnoDB will do per second. In a previous post I described how most write IO (other than writing InnoDB’s log) is backgrounded. If you have knowledge of your hardware (and how many operations per second it can perform) it makes sense to consume this capacity rather than let it remain idle.

    The analogy I like to use here is like a flight taking off with empty seats – sometimes it is better to fill them with people who were destined for later flights just in case there is bad weather ahead. i.e. it is better to get rid of all background work at the earliest convenience to reduce liabilities.

    Some simple math: 200 IOPS per disk that can write, so a RAID10 controller with 10 disks = (10/2) * 200 = 1000. I say “simple math”, because a RAID controller will often will be able to provide additional merging and effectively increase the number of IOPS you will be capable of. In the case of SSDs, IOPS count will likely be in the thousands.

    The risk of setting this too high is that you do not want background work to be so overwhelming it competes with foreground work for disk capacity. Historically there was some risk of setting this too high that InnoDB would hold an internal lock leading to decreased performance (my understanding is that this is greatly improved in 5.6).

  • innodb_lru_scan_depth – defaults to 1024. This is a new setting introduced in mysql 5.6. Mark Callaghan has written some advice on configuring it. The simple version is that if you raise innodb_io_capacity, so should you increase innodb_lru_scan_depth.

Replication

Lets assume that this server will want to be used for replication, or point-in-time recovery, in which case we want:

Misc

  • timezone=GMT Change timezone to GMT. A growing number of sysadmins suggest to keep all servers in GMT. I am personally a big fan of this, since almost all businesses are global these days. The locale you first started operating in seems kind of arbitrary.

  • character-set-server=utf8mb4 and collation-server=utf8mb4_general_ci – As I wrote in my previous post, utf8 is a much better defaults for new applications. You may optionally also want to set skip-character-set-client-handshake to ignore applications wanting to set the character-set otherwise.

  • sql-mode – MySQL defaults to be very forgiving, and will silently truncate data. In my previous post, I said that for new applications it is better to set this to be STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,
    NO_AUTO_CREATE_USER,NO_AUTO_VALUE_ON_ZERO,
    NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE,
    NO_ZERO_IN_DATE,ONLY_FULL_GROUP_BY.

  • skip-name-resolve – Disables reverse name lookups on incoming connections. DNS resolution can be a little slow/flakey on some systems, so if you don’t require hostnames as part of privileges I recommend avoiding it.

  • max_connect_errors – As Todd Farmer writes: “[this feature] provides no meaningful protection against brute force access attacks”. And indeed max_connect_errors doesn’t even apply when skip-name-resolve is set (as recommended just above).

    A firewall is better suited here, and usually what I will do is block port 3306 both publicly and internally so that only applications intended to have access can connect to MySQL. I will usually then set max_connect_errors to 100000 just so I can avoid any “double configuration” and guarantee it won’t get in the way.

  • max-connections – The default is 151. I see a lot of users with this set to a higher value in the 300-1000 range.

    Usually setting this higher is unavoidable, but it makes me a little nervous since a 16-core machine only has an opportunity to be over subscribed by maybe 2x-10x while blocking on IO. What you are hoping for here is that many of the open connections are sleeping idle. If they all become active, you can thread-thrash.

    The way I would ideally like to have this problem solved is by application connection-pools better configured to service the database rather than keep large amounts of connections open, or non-pooled applications connect, execute work and disconnect as fast as possible. Another way to solve it from 5.5 onwards (and one of the few differences between MySQL community edition and enterprise edition) is the thread pool plugin.

Conclusion

So lets assume that we’re installing MySQL on a server with:

  • 64G RAM
  • Hardware RAID Controller (lets assume capable of at least 2000 IOPS)
  • Requires Replication
  • It’s a new application
  • We have a firewall
  • We don’t require hostnames for privileges
  • We have a global application and don’t want to be tied to one timezone.
  • We want the application to be durable.

Here might be our sample configuration:

# InnoDB settings
innodb_buffer_pool_size=50G
innodb_log_file_size=2G
innodb_flush_method=O_DIRECT
innodb_io_capacity=2000
innodb_io_capacity_max=6000
innodb_lru_scan_depth=2000

# Binary log/replication
log-bin
sync_binlog=1
sync_relay_log=1
relay-log-info-repository=TABLE
master-info-repository=TABLE
expire_logs_days=10
binlog_format=ROW
transaction-isolation=READ-COMMITTED
innodb_autoinc_lock_mode = 2

# Other
timezone=GMT
character-set-server=utf8
collation-server=utf8_general_ci
sql-mode="STRICT_TRANS_TABLES,
 ERROR_FOR_DIVISION_BY_ZERO,
 NO_AUTO_CREATE_USER,
 NO_AUTO_VALUE_ON_ZERO,
 NO_ENGINE_SUBSTITUTION,
 NO_ZERO_DATE,
 NO_ZERO_IN_DATE,
 ONLY_FULL_GROUP_BY"
skip-name_resolve
max-connect-errors=100000
max-connections=500

# Unique to this machine
server-id=123

Hopefully that addresses the major issues. Let me know if you have any other safe advice,
and I’ll add to the list!

Improving MySQL’s default configuration

One of MySQL’s continued values is to be easy to use. A key part of this, is making sure the default configuration is kept up to date and safe for everyone.

Internally, these configuration changes are made in close cooperation between our engineering and support teams, who are in the best position to provide insight into customer impact.

For MySQL 5.6, hats off to James Day who led the project internally. Taken directly from James’ blog, here is the list of changes made:

 Setting  Old  New  Notes
back_log  50  50 + ( max_connections / 5 ) capped at 900
binlog_checksum  NONE  CRC32  New variable in 5.6. 5.5 doesn’t accept the checksums. Use NONE if you want to read binlog with 5.5 or on master if have replication slaves using 5.5.
binlog-row-event-max-size  1k  8k  no binlog_row_event_max_size variable, just the option form.
flush_time  1800  Windows changes from 1800 to 0  Was already 0 on other platforms
host_cache_size  128  128 + 1 for each of the first 500 max_connections + 1 for every 20 max_connections over 500, capped at 2000  New variable in 5.6
innodb_autoextend_increment  8  64  64 is 64 megabytes
innodb_buffer_pool_instances  0  8. On 32 bit Windows only, if innodb_buffer_pool_size is greater than
1300M, default is innodb_buffer_pool_size
/ 128M
innodb_concurrency_tickets  500  5000
innodb_file_per_table  0  1
innodb_log_file_size  5M  48M  InnoDB will change size to match my.cnf value. Also see innodb_log_compressed_pages and binlog_row_image
innodb_old_blocks_time 0  1000 1 second
innodb_open_files  300  300. If innodb_file_per_table is ON, higher of table_open_cache or 300
innodb_purge_batch_size  20  300
innodb_purge_threads  0  1
innodb_stats_on_metadata  on  off
join_buffer_size 128k  256k
max_allowed_packet  1M  4M
max_connect_errors  10  100
open_files_limit  0  5000  See Note 1
query_cache_size  0  1M
query_cache_type  ON/1  OFF/0
sort_buffer_size  2M  256k
sql_mode  none  NO_ENGINE_SUBSTITUTION  See later post about default my.cnf for STRICT_TRANS_TABLES
sync_master_info  0  10000  Recommend: master_info_repository=table
sync_relay_log  0  10000
sync_relay_log_info  0  10000  Recommend: relay_log_info_repository=table. Also see Replication Relay and Status Logs
table_definition_cache  400  400 + table_open_cache / 2, capped at 2000
table_open_cache  400  2000   Also see table_open_cache_instances
thread_cache_size  0  8 + max_connections/100, capped at 100

Note 1: In 5.5 there were already rules to ask the OS for the highest of (10 + max_connections + table_cache_size * 2) or (max_connections * 5) or the specified/default open_files_limit. The default is now calculated but the other rules are still used. If the OS refuses to allow as many as requested, max_connections and table_cache_size are reduced and you will see a “Changed limits” warning in the error log, as before.

Additional recommendations for new applications

Part of continuing to be easy to use means that as well as having a good set of defaults, users
also have an expectation of backwards compatibility. So there is a tradeoff that sometimes needs
to be taken to make sure that novice users can upgrade their legacy applications with minimal issues.

This tradeoff doesn’t apply to new applications, and in which case I recommend the following additional
configuration changes which should be enabled:

[mysqld]

# Force UTF-8 character-set:
character-set-server=utf8
collation-server=utf8_general_ci

# Force MySQL to be more strict and not allow legacy
# defaults which truncate data, accept invalidate dates etc
# Shown vertically to be a little easier to read.

sql-mode="STRICT_TRANS_TABLES,
  ERROR_FOR_DIVISION_BY_ZERO,
  NO_AUTO_CREATE_USER,
  NO_AUTO_VALUE_ON_ZERO,
  NO_ENGINE_SUBSTITUTION,
  NO_ZERO_DATE,
  NO_ZERO_IN_DATE,
  ONLY_FULL_GROUP_BY"

UTF-8 is a much smarter default than latin1 today. A simple Google search
shows that converting is an issue for a number of users. I would even go so far as to encourage setting
skip-character-set-client-handshake to ignore applications trying to set the character-set otherwise.

MySQL (arguably) became popular for not being strict about SQL and allowing you to insert any values. This may have made sense if a poor application did not perform error checking, and the alternative was not to store any data. I think that times have changed now – many users will use frameworks and ORMs, and are happy to catch errors up front, rather than later find their data is corrupt.

Moving forward

We are open to feedback on how we introduce configuration changes. You can leave a comment here,
or feel free to contact me.

To get the discussion rolling, what do you think about bundling a small program (mysql-upgrade-defaults) with packages that can optionally be run after installation?

Ubuntu Charm Championship

The folks at Ubuntu are running a contest to see who can write the best juju charm!

Juju is Ubuntu’s cloud deployment/service orchestration tool. Charms are
the scripts used by juju to do software deployment and management.

Currently MySQL has ~8800 downloads from the Charm Store, making it
the most popular service deployed with Juju.

The contest runs until October 22, and has over $60K in prizes to be won. For more information see:
https://github.com/juju/charm-championship.

Spring Cleaning – Deprecation of NULL synonym \N

In the MySQL team, we’ve been discussing removing the non-standard \N synonym for NULL. For Example:

mysql> SELECT NULL is \N;
+------------+
| NULL is \N |
+------------+
|          1 |
+------------+
1 row in set (0.00 sec)

mysql> INSERT INTO tablea VALUES (3, \N);
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM tablea WHERE b IS \N;
+---+------+
| a | b    |
+---+------+
| 3 | NULL |
+---+------+
1 row in set (0.00 sec)

The use of “\N” in files loaded with LOAD DATA INFILE or created with SELECT INTO OUTFILE will be unaffected by this change and continue to be supported.

The two options we’ve laid out are:

Option #1:
– Add deprecation warnings in MySQL 5.6 and MySQL 5.7
– Remove in MySQL 5.8

Option #2:
– Add deprecation warnings in MySQL 5.6.
– Remove in MySQL 5.7

Do you rely on this behavior in your application?
Will option #2 affect you negatively?

Please feel free to leave a comment, or contact me.

Thanks!

Migrating from PostgreSQL to MySQL

A couple of weeks before I joined Oracle, I was working as a freelance consultant. One of my customers decided to migrate from PostgreSQL to MySQL.

The backstory:

  • My client initially decided to build their application with PostgreSQL, with the founding team already very experienced with MySQL. Their primary reason: schema changes on live MySQL deployments were difficult.

  • MySQL 5.6 introduced online DDL, addressing a large number of their issues.

  • As they experienced significant traction in their private beta, they decided to switch to MySQL 5.6, where they had more comfort in the size of the support & consulting ecosystem behind it.

I should point out in advance that I think migrating from one database to another is rarely done, because while SQL is a standard, the performance characteristics of each technology are so wildly different.

In our case, straight after migrating database performance dropped by enough that the application was effectively offline! We did manage to get all of this performance back however, and the application is now running smoothly. This is our war-story so to speak, and hopefully proves useful to the next person.

What did we discover?

From a functionality perspective, there were a couple of mangled character-set problems, but the client managed to sort those out themselves and reached feature parity very quickly.

Most of the heavy lifting was in adding specific optimizations and workarounds. These were my top 4 in order of applying them:

Sort optimization workaround

One of the core features of the application was a continual running list of items that were sorted similar to a facebook feed. That is, that there were some search parameters to chose if the item was applicable to a particular user, and then everything was to be ordered by a global sort order (roughly reverse chronological), limiting to typically the top few records (LIMIT 0,100).

If all of the filter parameters were fixed value references, I could have used a composite index with the sort key as the last entry in the index.. a nice little trick with B-tree indexes being pre-sorted. No such luck in this case, since some parameters were ranged:

SELECT items.*
FROM
 items
WHERE
  items.community_id = N
 AND items.ranged_key1 BETWEEN x AND y
 AND items.ranged_key2 BETWEEN x AND y
ORDER BY sort_order DESC LIMIT 10;

What we had to come up with, was a solution to create a table that contained sort order (and ref filters), then join on another table to apply additional ranged filters:

CREATE TABLE items (
 item_id INT NOT NULL PRIMARY KEY auto_increment,
 community_id INT NOT NULL,
 ranged_key1 INT NOT NULL,
 ranged_key2 INT NOT NULL,
 is_deleted INT NULL,
 published_at INT NULL,
 created_at INT NULL
);
CREATE TABLE item_sort_order (
 item_id INT NOT NULL PRIMARY KEY,
 community_id INT NOT NULL,
 sort_order INT NOT NULL,
 INDEX (community_id, sort_order) /* InnoDB: Includes item_id as clustered index */
);

SELECT STRAIGHT_JOIN items.*
FROM
item_sort_order INNER JOIN items USING (item_id)
WHERE
  items.community_id = N
 AND item_sort_order.community_id = N
 AND items.ranged_key1 BETWEEN x AND y
 AND items.ranged_key2 BETWEEN x AND y
ORDER BY item_sort_order.sort_order DESC LIMIT 100;

A smarter optimizer may be able to apply some sort of index merge using just a single table without the workaround shown here. This solution was inspired by the optimizer feature request in MySQL BUG #25402.

Composite Indexes

While I think it is probably universally true that it is better to extend an existing index into a composite index, PostgreSQL seemed to be more forgiving of having many single column indexes and applying index-merge.

While MySQL also supports index-merge (since 5.0), wherever I could spot the use-case for a composite index, performance improved when I added one.

Optimizer Hints

MySQL picked a few wrong execution plans, and I had to overwrite it with optimizer hints. Some examples:

  1. SELECT * FROM mytable WHERE pkey in (1,2,3,4,5) AND filter1=ABC and filter2=ABC;
  2. SELECT * FROM items WHERE .. conditions .. AND is_deleted IS NULL;

For (1) I found MySQL would occasionally pick index on filter1 or filter2, but it almost always made sense to pick the PRIMARY KEY. So we added a FORCE INDEX (PRIMARY) hint.

For (2) because we had so many indexes, MySQL was using its new optimizer feature to use cardinality for indexes rather than index dives (in an effort to improve performance to pick a plan). The problem is, the is_deleted index (which has a timestamp of when an item was deleted) has very good cardinality and we’re always searching for is_deleted IS NULL!

Facebook’s advice to set eq_range_index_dive_limit to a much higher value was very useful, although the eventual solution was to remove the index on the is_deleted column, eliminating it as a candidate.

Cleanup

When I had added composite indexes, and applied my necessary optimizer hints, performance had restored to an acceptable level. My next step was to install Mark Leith’s ps_helper and delete all unused indexes, as well as redundant indexes which could now be satisfied by one of my composite indexes.

What was particularly valuable about performing this step, is that when I went back to many of my queries that I had added optimizer hints for, the hints were no longer required! I think that many people consider indexes to hurt only write performance, but there is a cost-associated with choosing the correct index, and:
– The cost clearly goes up when there are multiple index candidates
or the multiple choices forces a trade-off to pick an index that is not the best, but simply “good enough”.

Conclusion

It wasn’t out of the box, but we did get most of the performance back within a few hours – with all issues taking a couple of days to address in full.

MySQL 5.6 is a great release, I really enjoyed being able to add indexes so quickly without much impact at all, and reading the impact of my changes through performance_schema (using ps_helper). The unused_indexes view is really useful!

Row-based Replication

Row-based Replication was introduced in MySQL 5.1. It’s not the default (yet), but I really like it. I wanted to tell you what limitations you may face with statement-based, and why you should join me in making the switch.

Write scalability of certain statements

Statement based binary logging requires additional locking to ensure slave-consistency. To illustrate, here is a naive example:

CREATE TABLE users (
 user_id INT NOT NULL primary key,
 user_name varchar(60) NOT NULL,
 is_deleted TINYINT DEFAULT 0,
 UNIQUE KEY(user_name)
);

INSERT INTO users (user_id, user_name, is_deleted) VALUES
 (1, 'ted', 0),
 (2, 'bill', 0),
 (3, 'fred', 0),
 (4, 'mike', 1),
 (5, 'barry', 0);

session1> START TRANSACTION;
session1> DELETE FROM users WHERE is_deleted = 1; # 1 row matches (user_id 4), deleted.
session2> START TRANSACTION;
session2> UPDATE users SET is_deleted = 1 WHERE user_id = 5; # 1 row matches.
session2> COMMIT;
session1> COMMIT;

Because statements are only written to the binary log once committed, on the slave session #2 would apply first, and result in two rows being updated by session #1, leading to incorrect results!

InnoDB prevents against this race condition by setting additional locking. After session #1 is modified no other sessions will be updated/insert into the range of is_deleted=1 until session #1 commits.

Actually.. it gets even worse than that, because it will lock the rows based on whatever index can be used. In my case, there’s no index so I will lock every rows in the table (eek).

Even with good indexing, statements that have the pattern WHERE col BEWEEN x AND y or WHERE id > N will require gap locking to prevent data being inserted into the range, again to prevent the same inconsistency on the slave.

InnoDB has always had a setting to disable this locking, called innodb_locks_unsafe_for_binlog. As the name suggests, this is totally unsupported and you will likely get replication drift.

However, astute readers will notice that since row-based replication sends the actual row images of changes to the slaves, this race condition problem no longer exists. And indeed, once you switch to row-based replication binlog-format=ROW, and change the isolation level to transaction-isolation=READ-COMMITTED, these additional locks are no longer set!

Batched insert scalability

When you enable row-based replication, it is safe to set innodb_autoinc_lock_mode = 2. This improves auto-increment scalability with batch INSERT statements.

With the default of innodb_autoinc_lock_mode = 1, the per-table auto_increment generator needs to be locked for the duration of batch insert statements. This is because as rows are written to the binary log, only the first INSERT_ID is seeded to the slave, and locking gurantees deterministic slave replay. i.e.

# Binary logged multi-insert
SET INSERT_ID=1;
INSERT into my_table (id, value) VALUES
(NULL, 'Row 1'),
(NULL, 'Row 2'),
(NULL, 'Row 3');

Can you see the race-condition? In between each row we need to protect the auto-increment value so nobody could grab one of the IDs, leading to a conflict on the slave as it tries to just use the next number.

This locking is batched-insert only. It previously also applied to single-row inserts, but that was fixed in MySQL 5.1. You can enable the 5.0 and below behaviour by setting innodb_autoinc_lock_mode = 0, but I can’t think of a good reason to do this.

Reasons to not use Row-based replication

I think the two biggest complaints have always been:

  1. Row-based binary logging produces much larger binary log files. This is largely addressed in 5.6 by setting binlog_row_image=minimal.

  2. Row-based events are harder to debug, since you could not previously see the actual statement that generated the row events. This is addressed in MySQL 5.6 with
    binlog-rows-query-log-events. MySQL 5.7 will also make this easier because the need to hand-audit events is reduced with idempotent binary log replay.

Summary

Statement-based replication was an okay choice in the past where we had single-core machines, but today scaling requires us to lock less, and perform more work in parallel. It’s all about concurrency, concurrency, concurrency.

Have you made the switch? Something else holding you back? Let me know your thoughts in the comments.