With databases, choices of algorithms influence performance characteristics significantly. It is often very easy to find a situation where one database will perform much worse than others, which is why you will hear database engine developers cry out that benchmarks are flawed.
The best benchmark is one that closely matches what your application actually does, which is why you see the TPC create benchmarks to match hypothetical situations – like a warehouse that has inventory arriving and being shipped out all the time. These situations in turn create “workloads” on the database. To give some context, a workload may perform differently on a different database engine because of how many concurrent modifications are required, the ratio of reads/writes, how much data is modified in a transaction, and where in the data set the reads and writes are (are there hot records or hot tables?). These are just examples – in reality there are too many factors to list.
Which gets me to my main point – what workloads is MySQL with the InnoDB storage engine really good at? In answering this, I really want to try to focus on core data structures and algorithms more than specific features that may or may not be implemented.
Total data size may exceed memory but working set does not
If you have for example 100G of data, you do not necessarily require 100G of RAM. MySQL will adjust and keep only the most frequently accessed data in memory (aka working set). This behavior is the result of using a B+Tree index. To simplify and compare this to say memcached where the index is a hash – it really requires as much memory as there is data.
(Note: Memcached of course addresses this by allowing you to distribute the index across many nodes, and hash indexes are generally smaller, so please don’t miss the key point I am making here.)
Modifications are small and transactions are short
When you modify a row in InnoDB, the old version of it is relocated to undo space, and the ‘new’ version is written in place (implementing MVCC). When all existing transactions have finished with the old version of the row, along comes a purge process.
Not all databases implement MVCC this way. For example another way of implementing MVCC is to just append/write new rows in the same table-space. The advantage of the relocation method is that if there is sufficient memory to keep the undo pages in memory (and they are not required for very long) is that this relocation may be a logical relocation only. i.e. physical IO relocation doesn’t have to happen. However, in the case that there is not enough memory/the undo pages need to be evicted from the buffer pool, then I could see this potentially taking more effort. I believe rolling back large transactions may be more effort with InnoDB’s design as well – although I have not put much thought into it.
Primary key lookups are fast
InnoDB implements a Clustered Index on the primary key. Clustering in this case refers to a close arrangement/organization on disk, not multiple compute instances working together. This makes primary key lookups in InnoDB particularly fast – although admitedly probably not as fast as databases that impliment a hash index.
Indexed ranged lookups are supported
Some index structures are suitable for fixed primary key lookups, or emulate other types of access by pre-computing a cache of values and packing them in one row/column. Because InnoDB uses a B+Tree index, it is very flexible in being able to support ranged lookups on primary or secondary keys (example: find rows
BETWEEN x AND y). These indexes are also pre-sorted, which (depending on the query) can allow the rows to be efficiently delivered as they are read rather than buffered first.
Query performance is highly uniform
InnoDB is optimized to provide stable response times to user-serving applications, rather than just peak throughput in queries per second. There are a number of features which help deliver this: converting as much IO as possible to be in the background with the transaction log files, adaptive flushing to make sure that background work doesn’t fall behind, even the implimentation of MVCC I described above.
InnoDB also supports covering indexes, which means that for queries that do not
SELECT * they sometimes have a chance to retrieve all data from the secondary index. For some cases where not all data fits in memory covering indexes can be super critical to uniformity of performance because the number of unique pages that will need to be looked at is considerably lower.
Table has only a primary key and non-unique secondary keys
So I mentioned already that total data-size can exceed main memory, but InnoDB also has a novel mechanism to delay INSERT/UPDATE/DELETE modifications to secondary indexes if the pages are not loaded into memory. What it will do is buffer the changes in a secondary structure, and then make the modification the next time the page is loaded into memory, or in the background provided the server is not under load. This feature only works when index is not unique, because otherwise it is not safe to make a modification not knowing if it violates a constraint. Ideally, the primary key also inserts data in order.
(In previous versions of InnoDB, this feature was described as the insert buffer).
What am I missing? Please try and focus on the algorithms not the features, and the positive not the negative!