Migrating from PostgreSQL to MySQL

A couple of weeks before I joined Oracle, I was working as a freelance consultant. One of my customers decided to migrate from PostgreSQL to MySQL.
The backstory:

  • My client initially decided to build their application with PostgreSQL, with the founding team already very experienced with MySQL. Their primary reason: schema changes on live MySQL deployments were difficult.

  • MySQL 5.6 introduced online DDL, addressing a large number of their issues.

  • As they experienced significant traction in their private beta, they decided to switch to MySQL 5.6, where they had more comfort in the size of the support & consulting ecosystem behind it.

I should point out in advance that I think migrating from one database to another is rarely done, because while SQL is a standard, the performance characteristics of each technology are so wildly different.
In our case, straight after migrating database performance dropped by enough that the application was effectively offline! We did manage to get all of this performance back however, and the application is now running smoothly. This is our war-story so to speak, and hopefully proves useful to the next person.

What did we discover?

From a functionality perspective, there were a couple of mangled character-set problems, but the client managed to sort those out themselves and reached feature parity very quickly.
Most of the heavy lifting was in adding specific optimizations and workarounds. These were my top 4 in order of applying them:

Sort optimization workaround

One of the core features of the application was a continual running list of items that were sorted similar to a facebook feed. That is, that there were some search parameters to chose if the item was applicable to a particular user, and then everything was to be ordered by a global sort order (roughly reverse chronological), limiting to typically the top few records (LIMIT 0,100).
If all of the filter parameters were fixed value references, I could have used a composite index with the sort key as the last entry in the index.. a nice little trick with B-tree indexes being pre-sorted. No such luck in this case, since some parameters were ranged:

SELECT items.*
FROM
 items
WHERE
  items.community_id = N
 AND items.ranged_key1 BETWEEN x AND y
 AND items.ranged_key2 BETWEEN x AND y
ORDER BY sort_order DESC LIMIT 10;

What we had to come up with, was a solution to create a table that contained sort order (and ref filters), then join on another table to apply additional ranged filters:

CREATE TABLE items (
 item_id INT NOT NULL PRIMARY KEY auto_increment,
 community_id INT NOT NULL,
 ranged_key1 INT NOT NULL,
 ranged_key2 INT NOT NULL,
 is_deleted INT NULL,
 published_at INT NULL,
 created_at INT NULL
);
CREATE TABLE item_sort_order (
 item_id INT NOT NULL PRIMARY KEY,
 community_id INT NOT NULL,
 sort_order INT NOT NULL,
 INDEX (community_id, sort_order) /* InnoDB: Includes item_id as clustered index */
);
SELECT STRAIGHT_JOIN items.*
FROM
item_sort_order INNER JOIN items USING (item_id)
WHERE
  items.community_id = N
 AND item_sort_order.community_id = N
 AND items.ranged_key1 BETWEEN x AND y
 AND items.ranged_key2 BETWEEN x AND y
ORDER BY item_sort_order.sort_order DESC LIMIT 100;

A smarter optimizer may be able to apply some sort of index merge using just a single table without the workaround shown here. This solution was inspired by the optimizer feature request in MySQL BUG #25402.

Composite Indexes

While I think it is probably universally true that it is better to extend an existing index into a composite index, PostgreSQL seemed to be more forgiving of having many single column indexes and applying index-merge.
While MySQL also supports index-merge (since 5.0), wherever I could spot the use-case for a composite index, performance improved when I added one.

Optimizer Hints

MySQL picked a few wrong execution plans, and I had to overwrite it with optimizer hints. Some examples:

  1. SELECT * FROM mytable WHERE pkey in (1,2,3,4,5) AND filter1=ABC and filter2=ABC;
  2. SELECT * FROM items WHERE .. conditions .. AND is_deleted IS NULL;

For (1) I found MySQL would occasionally pick index on filter1 or filter2, but it almost always made sense to pick the PRIMARY KEY. So we added a FORCE INDEX (PRIMARY) hint.
For (2) because we had so many indexes, MySQL was using its new optimizer feature to use cardinality for indexes rather than index dives (in an effort to improve performance to pick a plan). The problem is, the is_deleted index (which has a timestamp of when an item was deleted) has very good cardinality and we’re always searching for is_deleted IS NULL!
Facebook’s advice to set eq_range_index_dive_limit to a much higher value was very useful, although the eventual solution was to remove the index on the is_deleted column, eliminating it as a candidate.

Cleanup

When I had added composite indexes, and applied my necessary optimizer hints, performance had restored to an acceptable level. My next step was to install Mark Leith’s ps_helper and delete all unused indexes, as well as redundant indexes which could now be satisfied by one of my composite indexes.
What was particularly valuable about performing this step, is that when I went back to many of my queries that I had added optimizer hints for, the hints were no longer required! I think that many people consider indexes to hurt only write performance, but there is a cost-associated with choosing the correct index, and:
– The cost clearly goes up when there are multiple index candidates
or the multiple choices forces a trade-off to pick an index that is not the best, but simply “good enough”.

Conclusion

It wasn’t out of the box, but we did get most of the performance back within a few hours – with all issues taking a couple of days to address in full.
MySQL 5.6 is a great release, I really enjoyed being able to add indexes so quickly without much impact at all, and reading the impact of my changes through performance_schema (using ps_helper). The unused_indexes view is really useful!