Benchmarking InnoDB page compression performance

InnoDB page compression is a feature that really interests me. I wrote this about it recently when describing how to improve the performance of large tables in MySQL:

"Use innodb page compression. For some workloads (particularly those with lots of char/varchar/text data types) compression will allow the data to be more compact, stretching out that performance curve for longer. It may also allow you to more easily justify SSDs which are typically smaller in capacity. InnoDB page compression was improved a lot in MySQL 5.6, courtesy of Facebook providing a series of patches."

After writing that, I decided to setup an experiment.

The Experiment

I wanted to find data that was typical to be stored in a database, but would also compress well. There is a huge potential for skew here, since if I used dummy data such as 'AAAAAAAAAAAA' it will compress very well. Likewise, jpeg images stored in blobs would unlikely compress any more than they already are. So I arrived at using the English version of Wikipedia, which is hopefully representative of "real data".

So after downloading the data set and importing it from its XML format into MySQL, I ran the following script:

DROP TABLE IF EXISTS text_16K_compressed;
DROP TABLE IF EXISTS text_8K_compressed;
DROP TABLE IF EXISTS text_4K_compressed;
DROP TABLE IF EXISTS text_16K_uncompressed;

CREATE TABLE `text_16K_compressed` (
  `old_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `old_text` mediumblob NOT NULL,
  `old_flags` tinyblob NOT NULL,
  PRIMARY KEY (`old_id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16;

CREATE TABLE `text_8K_compressed` (
  `old_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `old_text` mediumblob NOT NULL,
  `old_flags` tinyblob NOT NULL,
  PRIMARY KEY (`old_id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;

CREATE TABLE `text_4K_compressed` (
  `old_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `old_text` mediumblob NOT NULL,
  `old_flags` tinyblob NOT NULL,
  PRIMARY KEY (`old_id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;

CREATE TABLE `text_16K_uncompressed` (
  `old_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `old_text` mediumblob NOT NULL,
  `old_flags` tinyblob NOT NULL,
  PRIMARY KEY (`old_id`)
) ENGINE=InnoDB;

select now();
INSERT INTO text_16K_compressed SELECT * FROM text;
select now();
INSERT INTO text_8K_compressed SELECT * FROM text;
select now();
INSERT INTO text_4K_compressed SELECT * FROM text;
select now();
INSERT INTO text_16K_uncompressed SELECT * FROM text;
select now();

With the results, there's really two aspects to measure compression by here - footprint and import time:

So for 4K and 8K pages, there is a >50% saving in disk space, since the table compresses down from 51G to 21G. In addition, it's pretty clear that (at least on my hardware) the compression doesn't have any impact on the import time: the reduced footprint actually helped the 8K compressed pages take less time to insert than the uncompressed.

So I believe I did manage to prove that compression is very useful. However I also acknowledge that there is some skew in my test:

  • INSERT SELECT is single threaded. Ideally a test should reflect a real-life requirement.
  • My single 5400 RPM hard drive does not reflect common configuration in production.
  • I also disabled the doublewrite buffer and set innodb-flush-log-at-trx-commit=2. It would be nice to demonstrate if this impacts the test.
  • I really should have waited for all dirty pages to flush between all tests. This was an oversight on my part, and I think this advantaged 16K uncompressed insert (which is just a little better than the others).

Full Disclosure

Some details about my setup:

  • Linux Mint 15 (Ubuntu 13.04)
  • MySQL 5.7.2
  • Intel i5-2400 4 core CPU @ 3.10GHz
  • Single 5400 RPM 1TB Hard Drive (WDC WD10EADS-00L5B1)
  • 32G RAM

MySQL Configuration changes from default:

innodb-buffer-pool-size=16G
innodb-log-file-size=4G
innodb-flush-log-at-trx-commit=2
innodb-doublewrite=0
innodb-file-format=barracuda

Raw output from tests:

morgo@shuttle ~/Downloads $ ~/sandboxes/msb_5_7_2/use mediawiki < create-compressed-tables.sql
now()
2013-10-28 16:12:04
now()             
2013-10-29 01:34:48
now()
2013-10-29 11:20:04
now()
2013-10-29 21:13:27
now()
2013-10-30 07:37:48

morgo@shuttle ~/sandboxes/msb_5_7_2/data/mediawiki $ ls -lS text*
-rw-rw---- 1 morgo morgo 63472402432 Oct 28 14:35 text.ibd
-rw-rw---- 1 morgo morgo 53741617152 Oct 30 07:40 text_16K_uncompressed.ibd
-rw-rw---- 1 morgo morgo 38176555008 Oct 29 01:51 text_16K_compressed.ibd
-rw-rw---- 1 morgo morgo 21768437760 Oct 29 21:32 text_4K_compressed.ibd <-- See below
-rw-rw---- 1 morgo morgo 21768437760 Oct 29 11:40 text_8K_compressed.ibd
-rw-rw---- 1 morgo morgo        8642 Oct 28 16:12 text_16K_compressed.frm
-rw-rw---- 1 morgo morgo        8642 Oct 28 16:12 text_16K_uncompressed.frm
-rw-rw---- 1 morgo morgo        8642 Oct 28 16:12 text_4K_compressed.frm
-rw-rw---- 1 morgo morgo        8642 Oct 28 16:12 text_8K_compressed.frm
-rw-rw---- 1 morgo morgo        8642 Oct 27 19:22 text.frm

An earlier version of this post featured incorrect results for 4K compressed pages (credit: @Skunnyk). Here are the results for 4K pages re-processed:

morgo@shuttle ~/Downloads $ ~/sandboxes/msb_5_7_2/use mediawiki < create-compressed-tables.sql
..
now()
2013-10-31 11:31:24
now()
2013-10-31 22:13:32

morgo@shuttle ~/sandboxes/msb_5_7_2/data/mediawiki $ ls -lS text_4*ibd
-rw-rw---- 1 morgo morgo 20858273792 Oct 31 22:24 text_4K_compressed.ibd

Published by

morgo

I joined MySQL AB in 2006, left, and am now back at Oracle working on the MySQL team. I've also worked at Percona and InPowered.

  • Partha Dutta

    What is the performance like in MySQL 5.6?

    • http://www.tocker.ca/ Morgan Tocker

      The 5.7 performance enhancements have mostly focused on other workloads. I would expect compression performance here to be very similar in 5.6.

  • Tim Callaghan

    Morgan, while the test you ran shows impressive compression without a performance penalty, but I suspect that isn't what happens on most workloads. I blogged about iiBench performance and compression at http://www.tokutek.com/2011/09/compression-benchmarking-size-vs-speed-i-want-both. When an InnoDB page is compressed and it does not get small enough (the key block size) then it must be split and re-compressed. This "compression misses" are very expensive.

    I've always been curious why InnoDB compression isn't enabled by more users, to be honest I rarely ever find anyone running it. If performance weren't impacted, then why wouldn't a user compress their InnoDB tables?

    The work at Facebook to dynamically pad tables in an attempt to avoid these compression misses is interesting, and was implemented in MySQL 5.6, https://blogs.oracle.com/mysqlinnodb/entry/innodb_compression_improvements_in_mysql. While it is certainly helpful in improving performance on InnoDB compression, it also means that is "padding" will consume space in the cache and thus reduces the number of rows that can be cached.

    • http://www.tocker.ca/ Morgan Tocker

      Hi Tim, Thank you for the input - I'm sure this is just the comment readers are looking to find.

      I too have been curious why more users don't enable compression. Sometimes I fear users put too much emphasis on compression overhead, and not enough on slow disks :) There has got to be more to it though.

      I was running MySQL 5.7.2, so it should have the dynamic pad. Your comment makes me look at 16K compressed though, as its footprint is much larger than 8K/4K.

      • Tim Callaghan

        I'd be careful using your benchmark as expected behavior for others. If your blocks _always_ compress to be smaller than the key block size then you'll never feel the pain of split/re-compress. And the pain is quite painful.

        • http://www.tocker.ca/ Morgan Tocker

          I think it would be fair to illustrate worst case - as planing to constrain outliers is important.

          If I tried to compress 4 million small jpeg thumbnails, do you think that would trigger it?

          • Tim Callaghan

            Sure, or use random [a..z, A..Z, 0..9] for your Text column so you achieve some compression. Also, you may want to add a few other columns and index them so your example is a little more complicated (and interesting) than a key-value workload.

          • http://www.tocker.ca/ Morgan Tocker

            Ack. Let me see what I can do.

            The key-value like workload in this case was the actual schema from mediawiki.

  • Shlomi Noach

    My own experience with InnoDB compression on multiple systems shows that a 8k compression makes for typically double the load/insert time compared to 16k compression.

    With 4k compression the load/insert time is in most cases prohibitively higher.

    A 8k compression is as far as I go with InnoDB compression (and I do use it extensively). I'm always aware of the overhead I'm paying.

    BTW, to just ALTER a table into COMPRESSED format is a matter of anywhere between x2 and x10 the time it would take to ALTER into uncompressed.
    I've witnessed the above on my own servers as well as on mutiple customers servers. I had at least one customer who completely gave up on InnoDB compression after experiencing the load overhead, and one who completely gave it up even while ALTERing their tables into COMPRESSED format.

    The is a significant increase in CPU which I must confess I just can't justify for a mere x2 compression ratio.

    • http://www.tocker.ca/ Morgan Tocker

      Shlomi, a few questions:

      1) Was this before 5.6, which had a number of enhancements for compression?

      2) What was the type of data being compressed? Here I'm obviously showing a very good example (wikipedia pages).

      3) What was the data size and IO performance? I don't have the data, but I'm suspecting it might initially be slower in simpler workloads when all in memory.

      • Shlomi Noach

        1) yes, all on 5.5 (and 5.1 of course)
        2) generally mixed. Integers, strings, dates. I don't have a particular table for you right now; many hundreds passed under my hands. But rarely an all-integers (though these also went roughly same way)
        3) All sizes; but most interesting of course are larger than memory; so most interesting cases are hundreds of GB uncompressed. Though in-memory (like couple of GB or even down to MB scale) are also suffering from long time to convert to and high CPU.

        • http://www.tocker.ca/ Morgan Tocker

          I think what would be interesting to answer some of your questions, is to run my same experiment, but on 5.5.

  • Shane Bester

    your text_4K_compressed tables uses 8k key block size?

    • http://www.tocker.ca/ Morgan Tocker

      Yep. See update under "the experiment". Re-running it now, still a few hours away.