June 2014 – Master MySQL

June 9, 2014

MySQL 5.6.19 Community Release Notes

Thank you to the MySQL Community, on behalf of the MySQL team @ Oracle. Your bug reports, testcases and patches have helped create a better MySQL 5.6.19.

In particular:

Thank you to Daniël van Eeden for reporting that the utility charset2html was unmaintained and obsolete. We followed Daniël’s suggestion and removed it from MySQL distributions. Bug #71897.
Thank you to Dario Kampkaspar for reporting that upgrading from 5.6.10 to a more recent 5.6 release would fail. Bug #72079.
Thank you to Fangxin Flou for reporting that InnoDB would call memset too often. Fangxin also provided a suggested patch to fix the issue. Bug #71014.
Thank you to Daniël van Eeden for reporting an assertion when enabling the innodb_table_monitor. Bug #69641.
Thank you to Domas Mituzas for reporting an off by one error in the handling of innodb_max_dirty_pages_pct. Thanks also to Valeriy Kravchuk for suggesting that the value be a float, and Mark Callaghan for providing a sample patch. Bug #62534.
Thank you to Guangpu Feng for reporting an issue where an aborted TRUNCATE TABLE command could still be written to the binary log. Guangpu also provided detailed code-analysis, which helped with our investigation. Bug #71070.
Thank you to Justin Swanhart for reporting an issue where the server did not always correctly handle file permissions on the auto.cnf file. Bug #70891.
Thank you to Frédéric Condette for reporting an issue when using replication with GTIDs and replicate-ignore-db. Bug #71376.
Thank you to Michael Gmelin for reporting compilation errors when building in C++11 mode. Michael also provided a patch with a suggested fix. Bug #66803.
Thank you to Hartmut Holzgraefe for reporting that CMake produced not-useful warnings. Bug #71089.
Thank you to John Curtusan for reporting that on Windows REPAIR TABLE and OPTIMIZE TABLE failed on MyISAM tables with .MYD files larger than 4GB. Bug #69683.
Thank you to Elena Stepanova for reporting an issue that could break row-based replication. Elena also managed to reduce the issue down to a simple test case. Bug #71179.
Thank you to Yu Hongyu for reporting that an UPDATE statement could fail to update all applicable rows. Bug #69684.
Thank you to Tom Lane for his comments on why the test “outfile_loaddata” could be failing. His suggestion proved to be correct, and helped tremendously in tracking down the root cause. Bug #46895.

In Addition:

We thank the MySQL Community for their feedback on our proposal to deprecate unused command line programs. The mysqlbug, mysql_waitpid, and mysql_zap utilities are now deprecated in 5.6.19, and will be removed in 5.7.

Thank you again to all the community contributors listed above. If I missed a name here, please let me know!

Please also note:
There was no MySQL Community Server 5.6.18. That version number was used for an out-of-schedule release of the Enterprise Edition to address the OpenSSL “Heartbleed” issue. This issue did not affect the Community Edition because it uses yaSSL, not OpenSSL, so a new release of the Community Server was not needed, and 5.6.17 is followed by 5.6.19.

– Morgan

June 5, 2014

Semi-sync replication is not slow!

If you read Yoshinori’s post about Semi-sync at Facebook, he lists the objective of using semi-sync as an alternative to running full durability on a master. That is to say that once you can guarantee writes have safely been shipped across the network, you may not strictly need to guarantee that they are safe locally.

This is something that I have been wanting to benchmark for a long time, and reading Jay’s post about Semi-sync replication performance in MySQL 5.7 DMR4 and a conversation in last week’s #dbhangops inspired me to explore this in more detail. For my tests, I will be using Master-Slave replication and three alternative definitions of durability:

Local Durability. By ensuring that sync_binlog=1 and innodb_flush_log_at_trx_commit=1 any changes that are committed locally will be crash-safe (provided that underlying hardware does not lie). A slave will be connected, but using asynchronous replication.
Network Durability. By enabling semi-sync replication, each change will be guaranteed on more than one machine. However, each machine will take a relaxed view to how safe the changes are locally (sync_binlog=0, innodb_flush_log_at_trx_commit=0). This is the MySQL Cluster definition of durability, and also some NoSQL systems. It requires that each replica has physical isolation (in the case of AWS: an availability zone).
Local + Network Durability. This is a strict definition where both semi-sync and network durability is employed. While it might seem excessively strict, what I like about this option, is that it has the potential to reduce administrative costs during failure events.

Testing

Basic setup details:

AWS EC2 m3.medium instances
Master in us-east-1b, slave in us-east-1d
Latest Amazon Linux AMI
MySQL 5.7 DMR4
Datadir stored on 40GB Provisioned IOPS EBS volume (900 IOPS)
sysbench 0.5 update_index.lua test

I measured both the network latency (ping) and disk latency (dd bs=512 count=1000 if=/dev/zero of=testfile oflag=dsync) prior to starting the test. Interestingly, both were the same at about 2ms.

My first test was to compare single threaded performance:

I would say that the results are somewhat as expected:

Performance when not durable is awesome! (but probably not useful)
Local durability is quite expensive in single-threaded workloads. There is not much opportunity for group-commit, and the 2ms latency I have for an fsync hurts to be able to safely write the InnoDB redo logs + the binary logs.
Network durability actually performs better than local durability.
The combination of Local+Network durability is worse than network only or local only.

Now to repeat the same test, but in 32-threads:

What is interesting to note is that:

No durability still performs the best, but by a much smaller margin.
The improved group-commit in MySQL 5.6+ is really helping the local durability throughput increase.
Network durability is marginally ahead of local durability.
Network+Local is about 15% worse than just local durability.

Comparing Local versus Network Durability

So network durability scored better than local in both tests!?

I think at this point it’s important to explain the difference between local and network durability in the event of a crash, using our current set of tools that are available. If we are using asynchronous replication with a durable local master, and crash safe slaves, then resuming replication is actually possible in the event of a crash. This is not true in the case of network durability, as writes could be lost on the failing machine. This is an important loss of functionality that is difficult to justify.

I would also point out that I think it is easier to hardware-accelerate local durability than it is network durability. The local fsync speed can be improved with an SSD or a RAID Controller with a battery-backed write cache. Network latency can be lowered too, but it might be harder to do so while keeping physical isolation.

Conclusion

For the reasons I listed, I still think local durability is still more useful when choosing just one option for durability. But at the same time, semi-sync is no slouch and it’s great to see new developments in this area in MySQL 5.7.

What I did not test, is configuring MySQL to be mostly-durable locally, and fully network durable. There is a lot of potential to meet somewhere in the middle and very selectively disable features such as the InnoDB doublewrite buffer.

I do not think it is fair to label semi-sync as “slow”, and I am hoping to convince more users that they should take a look 🙂

June 4, 2014

Build a MySQL Fabric Farm in one step using AWS CloudFormation

I have been building a CloudFormation template for MySQL Fabric as an experiment to kick the tyres and further my understanding of how it all works.

For those not familiar, CloudFormation is a way of representing a collection of Amazon Cloud resources (a “stack”) into a static json file (a “template”). In my previous company, we managed our AWS account almost exclusively via CloudFormation, and there are two key advantages I see with managing resources via templates:

It allows for all environments to be synchronized (dev/qa/production).
It allows for very fast disaster recovery (you can very quickly pick up and restore in another region).

CloudFormation is also a great-fit for MySQL Fabric, since it can remove many of the essential bootstrap steps that come with building a distributed system. It took a bit of work, but I managed to build a template to bring Fabric installation down to three simple questions:

Here are some of the interesting parts:

I am using the latest Amazon Linux AMI + the official MySQL yum repositories.
The fabric HA group GLOBAL1 is created in an AutoScalingGroup.
When the instance in the AutoScalingGroup starts up, it talks back to the fabric backing store to add itself to the HA group.
If the AutoScalingGroup is expanded to 2, a new instance will talk to the fabric backing store, find a peer in the same group and create a backup. Once the backup is restored on the new server, it will add itself to the Fabric HA group.

The less interesting part of course is that this is a proof-of-concept. It needs a bit of work to improve error checking etc. You can find the resulting code in my mysql-fabric-scripts project. The file stack.json can be uploaded directly into the Amazon Web Console.