OPTIMIZE/CHECK/REPAIR/ANALYZE TABLE InnoDB Edition

I find a good interview question for a MySQL DBA position is to ask what the following commands actually do in InnoDB, which has been the default storage engine since MySQL 5.5. From my perspective there is a lot of miss-understanding what still applies.

ANALYZE TABLE

From the MySQL manual:

ANALYZE TABLE analyzes and stores the key distribution for a table. During the analysis, the table is locked with a read lock for InnoDB and MyISAM.

What this means is, as part of query optimization MySQL will often have to decide which is the best index if there are multiple candidates, which indexes should be avoided, and what order should tables be joined in. Indexes need to eliminate work - so if for example you were trying to index a column called "Country" in a table full of all people in the USA, then it would be faster to avoid that index.

What is also important to note, is that InnoDB will often update these statistics internally without you needing to do so, and that the 'read lock' that the manual describes is kind of weird (shameless plug for a bug I filed back in 2007).

Up until 5.6, statistics are in memory only, and very coarse - sampling just a small amount of data. With 5.6, there is now the new Persistent Optimizer Statistics enhancement as well as innodb_stats_auto_recalc is also a variable, and STATS_PERSISTENT, STATS_AUTO_RECALC, and STATS_SAMPLE_PAGES are configurable per table as CREATE TABLE options. It's a shame features like this don't get enough press.

I still find myself using ANALYZE TABLE as part of debugging slow queries to confirm that it is not a stale causing a problem, but I don't find any routine operational use case for it.

Pro-tip: My explaination above with a "Country" table being filled with all "USA" was a bit of a simplification, since not all optimizer decisions will use these pre-computed statistics. I believe in many cases if we are just talking about a single table (and not trying to determine join order), the optimizer will just ask InnoDB to estimate how many records there are in a given range. I am hoping these sort of behaviors will be more exposed in future MySQL versions. I was very happy to see EXPLAIN for UPDATE and DELETE statements in MySQL 5.6 - and in some cases the new JSON EXPLAIN format seems to show more information.

CHECK TABLE

CHECK TABLE to me is a MyISAM-ism. As noted in the manual:

If CHECK TABLE finds a problem for an InnoDB table, the server shuts down to prevent error propagation. Details of the error will be written to the error log.

InnoDB is an ACID compliant (durable) data store, so it doesn't have the same inconsistency situations that MyISAM does. The behavior that the manual is describing here is the same behavior that happens if the server detects corruption through standard operation - which is detected via CRC checksums on each page it stores.

As Oli writes check table probably won't work on very large tables (> 200-400 GB), so this command does not really have any practical use for me.

REPAIR TABLE

From the MySQL manual:

REPAIR TABLE only applies to MyISAM, ARCHIVE, and CSV tables. See Section 14.3, "The MyISAM Storage Engine", and Section 14.6, "The ARCHIVE Storage Engine", and Section 14.5, "The CSV Storage Engine"

So no use for this either.

OPTIMIZE TABLE

For InnoDB, the wording in the in the MySQL manual and the error message is very specific on this one:

For InnoDB tables, OPTIMIZE TABLE is mapped to ALTER TABLE, which rebuilds the table to update index statistics and free unused space in the clustered index. Beginning with MySQL 5.1.27, this is displayed in the output of OPTIMIZE TABLE when you run it on an InnoDB table, as shown here:

mysql> OPTIMIZE TABLE foo;
+----------+----------+----------+-------------------------------------------------------------------+
| Table    | Op       | Msg_type | Msg_text                                                          |
+----------+----------+----------+-------------------------------------------------------------------+
| test.foo | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| test.foo | optimize | status   | OK                                                                |
+----------+----------+----------+-------------------------------------------------------------------+

So what this means, is that internally we create a new table - much like the existing table, then we trickle load the data into it one row at a time. For the clustered index (aka primary key) this may result in a significant space saving if data was inserted out of order, or if there have been modifications which have caused there to be some gaps which have affected fill-factor. It's important to note while explaining this, that some gaps are expected - as InnoDB only fills pages 15/16ths full.

For the secondary key indexes they will be trickle loaded one row at a time in the order of the clustered index - which may result in them going straight back to being fragmented anyway. InnoDB's implementation of MVCC does have multiple versions in the secondary indexes however, so it is possible that gaps may be reclaimed here.

In MySQL 5.5, InnoDB introduced a feature called "fast index create" that could create these secondary indexes more optimally by presorting the data first and then creating the index. However, this feature is not tied into OPTIMIZE TABLE (yet) in official Oracle MySQL releases. See Bug #57583.

Due to it's massively high cost, I find I run OPTIMIZE a lot less than other people (read: almost never), and Baron Schwartz has even has a humorous way of describing it:

.. like something you'd hear from a naive Windows user who buys a $99 piece of software to make his PC "boot faster" or "fix his registry" or something.

Again, this post being related to InnoDB - OPTIMIZE was important for MyISAM Dynamic tables. Read the manual for more.