Benchmarking InnoDB page compression performance

InnoDB page compression is a feature that really interests me. I wrote this about it recently when describing how to improve the performance of large tables in MySQL:

"Use innodb page compression. For some workloads (particularly those with lots of char/varchar/text data types) compression will allow the data to be more compact, stretching out that performance curve for longer. It may also allow you to more easily justify SSDs which are typically smaller in capacity. InnoDB page compression was improved a lot in MySQL 5.6, courtesy of Facebook providing a series of patches."

After writing that, I decided to setup an experiment.

The Experiment

I wanted to find data that was typical to be stored in a database, but would also compress well. There is a huge potential for skew here, since if I used dummy data such as 'AAAAAAAAAAAA' it will compress very well. Likewise, jpeg images stored in blobs would unlikely compress any more than they already are. So I arrived at using the English version of Wikipedia, which is hopefully representative of "real data".

So after downloading the data set and importing it from its XML format into MySQL, I ran the following script:

DROP TABLE IF EXISTS text_16K_compressed;
DROP TABLE IF EXISTS text_8K_compressed;
DROP TABLE IF EXISTS text_4K_compressed;
DROP TABLE IF EXISTS text_16K_uncompressed;

CREATE TABLE `text_16K_compressed` (
  `old_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `old_text` mediumblob NOT NULL,
  `old_flags` tinyblob NOT NULL,
  PRIMARY KEY (`old_id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16;

CREATE TABLE `text_8K_compressed` (
  `old_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `old_text` mediumblob NOT NULL,
  `old_flags` tinyblob NOT NULL,
  PRIMARY KEY (`old_id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;

CREATE TABLE `text_4K_compressed` (
  `old_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `old_text` mediumblob NOT NULL,
  `old_flags` tinyblob NOT NULL,
  PRIMARY KEY (`old_id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;

CREATE TABLE `text_16K_uncompressed` (
  `old_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `old_text` mediumblob NOT NULL,
  `old_flags` tinyblob NOT NULL,
  PRIMARY KEY (`old_id`)
) ENGINE=InnoDB;

select now();
INSERT INTO text_16K_compressed SELECT * FROM text;
select now();
INSERT INTO text_8K_compressed SELECT * FROM text;
select now();
INSERT INTO text_4K_compressed SELECT * FROM text;
select now();
INSERT INTO text_16K_uncompressed SELECT * FROM text;
select now();

With the results, there's really two aspects to measure compression by here - footprint and import time:

So for 4K and 8K pages, there is a >50% saving in disk space, since the table compresses down from 51G to 21G. In addition, it's pretty clear that (at least on my hardware) the compression doesn't have any impact on the import time: the reduced footprint actually helped the 8K compressed pages take less time to insert than the uncompressed.

So I believe I did manage to prove that compression is very useful. However I also acknowledge that there is some skew in my test:

  • INSERT SELECT is single threaded. Ideally a test should reflect a real-life requirement.
  • My single 5400 RPM hard drive does not reflect common configuration in production.
  • I also disabled the doublewrite buffer and set innodb-flush-log-at-trx-commit=2. It would be nice to demonstrate if this impacts the test.
  • I really should have waited for all dirty pages to flush between all tests. This was an oversight on my part, and I think this advantaged 16K uncompressed insert (which is just a little better than the others).

Full Disclosure

Some details about my setup:

  • Linux Mint 15 (Ubuntu 13.04)
  • MySQL 5.7.2
  • Intel i5-2400 4 core CPU @ 3.10GHz
  • Single 5400 RPM 1TB Hard Drive (WDC WD10EADS-00L5B1)
  • 32G RAM

MySQL Configuration changes from default:

innodb-buffer-pool-size=16G
innodb-log-file-size=4G
innodb-flush-log-at-trx-commit=2
innodb-doublewrite=0
innodb-file-format=barracuda

Raw output from tests:

morgo@shuttle ~/Downloads $ ~/sandboxes/msb_5_7_2/use mediawiki < create-compressed-tables.sql
now()
2013-10-28 16:12:04
now()             
2013-10-29 01:34:48
now()
2013-10-29 11:20:04
now()
2013-10-29 21:13:27
now()
2013-10-30 07:37:48

morgo@shuttle ~/sandboxes/msb_5_7_2/data/mediawiki $ ls -lS text*
-rw-rw---- 1 morgo morgo 63472402432 Oct 28 14:35 text.ibd
-rw-rw---- 1 morgo morgo 53741617152 Oct 30 07:40 text_16K_uncompressed.ibd
-rw-rw---- 1 morgo morgo 38176555008 Oct 29 01:51 text_16K_compressed.ibd
-rw-rw---- 1 morgo morgo 21768437760 Oct 29 21:32 text_4K_compressed.ibd <-- See below
-rw-rw---- 1 morgo morgo 21768437760 Oct 29 11:40 text_8K_compressed.ibd
-rw-rw---- 1 morgo morgo        8642 Oct 28 16:12 text_16K_compressed.frm
-rw-rw---- 1 morgo morgo        8642 Oct 28 16:12 text_16K_uncompressed.frm
-rw-rw---- 1 morgo morgo        8642 Oct 28 16:12 text_4K_compressed.frm
-rw-rw---- 1 morgo morgo        8642 Oct 28 16:12 text_8K_compressed.frm
-rw-rw---- 1 morgo morgo        8642 Oct 27 19:22 text.frm

An earlier version of this post featured incorrect results for 4K compressed pages (credit: @Skunnyk). Here are the results for 4K pages re-processed:

morgo@shuttle ~/Downloads $ ~/sandboxes/msb_5_7_2/use mediawiki < create-compressed-tables.sql
..
now()
2013-10-31 11:31:24
now()
2013-10-31 22:13:32

morgo@shuttle ~/sandboxes/msb_5_7_2/data/mediawiki $ ls -lS text_4*ibd
-rw-rw---- 1 morgo morgo 20858273792 Oct 31 22:24 text_4K_compressed.ibd