Semi-sync replication is not slow!

If you read Yoshinori's post about Semi-sync at Facebook, he lists the objective of using semi-sync as an alternative to running full durability on a master. That is to say that once you can guarantee writes have safely been shipped across the network, you may not strictly need to guarantee that they are safe locally.

This is something that I have been wanting to benchmark for a long time, and reading Jay's post about Semi-sync replication performance in MySQL 5.7 DMR4 and a conversation in last week's #dbhangops inspired me to explore this in more detail. For my tests, I will be using Master-Slave replication and three alternative definitions of durability:

  • Local Durability. By ensuring that sync_binlog=1 and innodb_flush_log_at_trx_commit=1 any changes that are committed locally will be crash-safe (provided that underlying hardware does not lie). A slave will be connected, but using asynchronous replication.
  • Network Durability. By enabling semi-sync replication, each change will be guaranteed on more than one machine. However, each machine will take a relaxed view to how safe the changes are locally (sync_binlog=0, innodb_flush_log_at_trx_commit=0). This is the MySQL Cluster definition of durability, and also some NoSQL systems. It requires that each replica has physical isolation (in the case of AWS: an availability zone).
  • Local + Network Durability. This is a strict definition where both semi-sync and network durability is employed. While it might seem excessively strict, what I like about this option, is that it has the potential to reduce administrative costs during failure events.

Testing

Basic setup details:

  • AWS EC2 m3.medium instances
  • Master in us-east-1b, slave in us-east-1d
  • Latest Amazon Linux AMI
  • MySQL 5.7 DMR4
  • Datadir stored on 40GB Provisioned IOPS EBS volume (900 IOPS)
  • sysbench 0.5 update_index.lua test

I measured both the network latency (ping) and disk latency (dd bs=512 count=1000 if=/dev/zero of=testfile oflag=dsync) prior to starting the test. Interestingly, both were the same at about 2ms.

My first test was to compare single threaded performance:

singlethreaded

I would say that the results are somewhat as expected:

  • Performance when not durable is awesome! (but probably not useful)
  • Local durability is quite expensive in single-threaded workloads. There is not much opportunity for group-commit, and the 2ms latency I have for an fsync hurts to be able to safely write the InnoDB redo logs + the binary logs.
  • Network durability actually performs better than local durability.
  • The combination of Local+Network durability is worse than network only or local only.

Now to repeat the same test, but in 32-threads:

32threads

What is interesting to note is that:

  • No durability still performs the best, but by a much smaller margin.
  • The improved group-commit in MySQL 5.6+ is really helping the local durability throughput increase.
  • Network durability is marginally ahead of local durability.
  • Network+Local is about 15% worse than just local durability.

Comparing Local versus Network Durability

So network durability scored better than local in both tests!?

I think at this point it's important to explain the difference between local and network durability in the event of a crash, using our current set of tools that are available. If we are using asynchronous replication with a durable local master, and crash safe slaves, then resuming replication is actually possible in the event of a crash. This is not true in the case of network durability, as writes could be lost on the failing machine. This is an important loss of functionality that is difficult to justify.

I would also point out that I think it is easier to hardware-accelerate local durability than it is network durability. The local fsync speed can be improved with an SSD or a RAID Controller with a battery-backed write cache. Network latency can be lowered too, but it might be harder to do so while keeping physical isolation.

Conclusion

For the reasons I listed, I still think local durability is still more useful when choosing just one option for durability. But at the same time, semi-sync is no slouch and it's great to see new developments in this area in MySQL 5.7.

What I did not test, is configuring MySQL to be mostly-durable locally, and fully network durable. There is a lot of potential to meet somewhere in the middle and very selectively disable features such as the InnoDB doublewrite buffer.

I do not think it is fair to label semi-sync as "slow", and I am hoping to convince more users that they should take a look :)

Benchmarking InnoDB page compression performance

InnoDB page compression is a feature that really interests me. I wrote this about it recently when describing how to improve the performance of large tables in MySQL:

"Use innodb page compression. For some workloads (particularly those with lots of char/varchar/text data types) compression will allow the data to be more compact, stretching out that performance curve for longer. It may also allow you to more easily justify SSDs which are typically smaller in capacity. InnoDB page compression was improved a lot in MySQL 5.6, courtesy of Facebook providing a series of patches."

After writing that, I decided to setup an experiment.

The Experiment

I wanted to find data that was typical to be stored in a database, but would also compress well. There is a huge potential for skew here, since if I used dummy data such as 'AAAAAAAAAAAA' it will compress very well. Likewise, jpeg images stored in blobs would unlikely compress any more than they already are. So I arrived at using the English version of Wikipedia, which is hopefully representative of "real data".

So after downloading the data set and importing it from its XML format into MySQL, I ran the following script:

DROP TABLE IF EXISTS text_16K_compressed;
DROP TABLE IF EXISTS text_8K_compressed;
DROP TABLE IF EXISTS text_4K_compressed;
DROP TABLE IF EXISTS text_16K_uncompressed;

CREATE TABLE `text_16K_compressed` (
  `old_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `old_text` mediumblob NOT NULL,
  `old_flags` tinyblob NOT NULL,
  PRIMARY KEY (`old_id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16;

CREATE TABLE `text_8K_compressed` (
  `old_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `old_text` mediumblob NOT NULL,
  `old_flags` tinyblob NOT NULL,
  PRIMARY KEY (`old_id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;

CREATE TABLE `text_4K_compressed` (
  `old_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `old_text` mediumblob NOT NULL,
  `old_flags` tinyblob NOT NULL,
  PRIMARY KEY (`old_id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;

CREATE TABLE `text_16K_uncompressed` (
  `old_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `old_text` mediumblob NOT NULL,
  `old_flags` tinyblob NOT NULL,
  PRIMARY KEY (`old_id`)
) ENGINE=InnoDB;

select now();
INSERT INTO text_16K_compressed SELECT * FROM text;
select now();
INSERT INTO text_8K_compressed SELECT * FROM text;
select now();
INSERT INTO text_4K_compressed SELECT * FROM text;
select now();
INSERT INTO text_16K_uncompressed SELECT * FROM text;
select now();

With the results, there's really two aspects to measure compression by here - footprint and import time:

So for 4K and 8K pages, there is a >50% saving in disk space, since the table compresses down from 51G to 21G. In addition, it's pretty clear that (at least on my hardware) the compression doesn't have any impact on the import time: the reduced footprint actually helped the 8K compressed pages take less time to insert than the uncompressed.

So I believe I did manage to prove that compression is very useful. However I also acknowledge that there is some skew in my test:

  • INSERT SELECT is single threaded. Ideally a test should reflect a real-life requirement.
  • My single 5400 RPM hard drive does not reflect common configuration in production.
  • I also disabled the doublewrite buffer and set innodb-flush-log-at-trx-commit=2. It would be nice to demonstrate if this impacts the test.
  • I really should have waited for all dirty pages to flush between all tests. This was an oversight on my part, and I think this advantaged 16K uncompressed insert (which is just a little better than the others).

Full Disclosure

Some details about my setup:

  • Linux Mint 15 (Ubuntu 13.04)
  • MySQL 5.7.2
  • Intel i5-2400 4 core CPU @ 3.10GHz
  • Single 5400 RPM 1TB Hard Drive (WDC WD10EADS-00L5B1)
  • 32G RAM

MySQL Configuration changes from default:

innodb-buffer-pool-size=16G
innodb-log-file-size=4G
innodb-flush-log-at-trx-commit=2
innodb-doublewrite=0
innodb-file-format=barracuda

Raw output from tests:

morgo@shuttle ~/Downloads $ ~/sandboxes/msb_5_7_2/use mediawiki < create-compressed-tables.sql
now()
2013-10-28 16:12:04
now()             
2013-10-29 01:34:48
now()
2013-10-29 11:20:04
now()
2013-10-29 21:13:27
now()
2013-10-30 07:37:48

morgo@shuttle ~/sandboxes/msb_5_7_2/data/mediawiki $ ls -lS text*
-rw-rw---- 1 morgo morgo 63472402432 Oct 28 14:35 text.ibd
-rw-rw---- 1 morgo morgo 53741617152 Oct 30 07:40 text_16K_uncompressed.ibd
-rw-rw---- 1 morgo morgo 38176555008 Oct 29 01:51 text_16K_compressed.ibd
-rw-rw---- 1 morgo morgo 21768437760 Oct 29 21:32 text_4K_compressed.ibd <-- See below
-rw-rw---- 1 morgo morgo 21768437760 Oct 29 11:40 text_8K_compressed.ibd
-rw-rw---- 1 morgo morgo        8642 Oct 28 16:12 text_16K_compressed.frm
-rw-rw---- 1 morgo morgo        8642 Oct 28 16:12 text_16K_uncompressed.frm
-rw-rw---- 1 morgo morgo        8642 Oct 28 16:12 text_4K_compressed.frm
-rw-rw---- 1 morgo morgo        8642 Oct 28 16:12 text_8K_compressed.frm
-rw-rw---- 1 morgo morgo        8642 Oct 27 19:22 text.frm

An earlier version of this post featured incorrect results for 4K compressed pages (credit: @Skunnyk). Here are the results for 4K pages re-processed:

morgo@shuttle ~/Downloads $ ~/sandboxes/msb_5_7_2/use mediawiki < create-compressed-tables.sql
..
now()
2013-10-31 11:31:24
now()
2013-10-31 22:13:32

morgo@shuttle ~/sandboxes/msb_5_7_2/data/mediawiki $ ls -lS text_4*ibd
-rw-rw---- 1 morgo morgo 20858273792 Oct 31 22:24 text_4K_compressed.ibd