Optimizer Enhancements in MySQL 5.7

The MySQL optimizer is getting better. MySQL 5.6 introduced:

  • File sort optimizations with small limit
  • Index Condition Pushdown
  • Batched Key Access and Multi Range Read
  • Postponed Materialization
  • Improved Subquery execution
  • EXPLAIN for Insert, Update, and Delete
  • Optimizer Traces
  • Structured EXPLAIN in JSON format

This was in addition to the InnoDB storage engine now offering improved statistics collection, leading to more stable query plans.

In Evgeny Potemkin’s session at MySQL Connect titled “MySQL’s EXPLAIN Command New Features“, two new features for 5.7 were announced. They are both incredibly useful, so I wanted to write a little about them.

EXPLAIN FOR CONNECTION

Normally with EXPLAIN, what you would be doing is finding the execution plan of a query you are intending to run, and then interpreting the output how you see fit.

What MySQL 5.7 will do, is give you the ability to see the execution plan of a running query in another connection. i.e.

EXPLAIN FORMAT=JSON FOR CONNECTION 2;

Why it’s useful:
* Plans can change depending on input parameters. i.e. WHERE mydate BETWEEN '2013-01-01' and '2013-01-02' may use an index, but WHERE mydate BETWEEN '2001-01-01' and '2013-10-17' may not.
* Plans can change as data changes.
* Plans can also change depending on the context of a transaction, with InnoDB offering multi-version concurrency control.
* Optimizer statistics can change, and it’s not impossible that the reason for the executing query being slow has something to do with it. It’s great to have conclusive proof and be able to rule this out.

Execution cost in EXPLAIN

MySQL uses cost based optimization to pick the best query execution plan when there are multiple choices available. It is very similar to how a GPS navigator adds up estimated time and picks the best route to a destination.

What this feature does is exposes the cost as a numeric value when running EXPLAIN FORMAT=JSON. To take an example using the world sample database:

mysql [localhost] {msandbox} (world) > EXPLAIN FORMAT=JSON SELECT City.* 
FROM City INNER JOIN Country ON City.countrycode=Country.code 
ORDER BY City.NAME ASC LIMIT 100\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "4786.00"
    },
    "ordering_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "2151.00"
      },
      "nested_loop": [
        {
          "table": {
            "table_name": "country",
            "access_type": "index",
            "possible_keys": [
              "PRIMARY"
            ],
            "key": "PRIMARY",
            "used_key_parts": [
              "Code"
            ],
            "key_length": "3",
            "rows_examined_per_scan": 239,
            "rows_produced_per_join": 239,
            "filtered": 100,
            "using_index": true,
            "cost_info": {
              "read_cost": "6.00",
              "eval_cost": "47.80",
              "prefix_cost": "53.80",
              "data_read_per_join": "61K"
            },
            "used_columns": [
              "Code"
            ]
          }
        },
        {
          "table": {
            "table_name": "City",
            "access_type": "ref",
            "possible_keys": [
              "CountryCode"
            ],
            "key": "CountryCode",
            "used_key_parts": [
              "CountryCode"
            ],
            "key_length": "3",
            "ref": [
              "world.country.Code"
            ],
            "rows_examined_per_scan": 9,
            "rows_produced_per_join": 2151,
            "filtered": 100,
            "cost_info": {
              "read_cost": "2151.00",
              "eval_cost": "430.20",
              "prefix_cost": "2635.00",
              "data_read_per_join": "151K"
            },
            "used_columns": [
              "ID",
              "Name",
              "CountryCode",
              "District",
              "Population"
            ]
          }
        }
      ]
    }
  }
}

Why it’s useful:

  • This exposes more transparency into optimizer decisions. DBAs can better understand what part of a query is considered expensive, and try to optimize. I think this is important, because I have heard a lot of DBAs make blanket recommendations like “joins are bad” or “sorting is bad”, but there needs to be context on how much data needs to be sorted. It makes us all speak the same language: estimated cost.
  • Cost refinement is an ongoing effort. As well as the introduction of new fast SSD storage, MySQL is introducing new optimizations (such as index-condition pushdown). Not all of these optimizations will be the best choice every time, and MySQL should ideally be able to make a right choice for all situations.

How do you use the Federated Storage Engine?

We are looking for community feedback on the use-cases for the Federated Storage Engine in MySQL.

Specifically, I would be interesting to hear which users find that the introduction of multi-source replication meets (or does not meet) their requirements.

For a bit of background:

The Federated Storage Engine was introduced in 5.0, but has long been disabled by default. It is useful for ad-hoc queries across MySQL servers, but it misses some of MySQL’s newer optimizations, and does not perform as well as we would like. Among it’s limitations it also does not support transactions.

By using multi-source replication, many of the downsides mentioned above are negated, since the tables can be made available to query locally. The disadvantage of course, is that there is a need to provision more storage (however, given the performance limitations of federated it is not certain it could fill this use-case either).

So please leave a comment, or get in touch. I would love to hear how you are using the Federated Engine, and what you plan to use multi-source replication for.

MySQL Certification – Filling gaps in my knowledge

Like many others, I’ll be taking the MySQL 5.6 certification exam at the end of this month. As Todd blogged, the quick reference pages are available online for the Developer Exam and the DBA Exam. I’ve long considered myself more of a DBA, so I know the the developer exam that I need to study up on these topics:

But there are also a few DBA topics I could benefit from a refresh on:

I’ll be writing more on my experiences studying these topics (and which manual pages helped me out) in the coming days.

MySQL 5.6 – Now with fewer gripes

I often find myself in conversations with long time MySQL users who tell me about little annoyances in MySQL that make the product harder to use. Sometimes these annoyances can stay in a product for seemingly forever, because they require a change in behaviour (breaking backwards compatibility) or they are the result of architectural limitations not easily fixed.

In either case, I think it’s important to commend the MySQL engineering team at Oracle, because as well introducing a number of big ticket features for MySQL 5.6, they actually invested a lot of time into closing out a lot of old bugs.

Here is my list of historical gripes, now resolved:

  1. innodb-file-per-table ON by default means it is easier to reclaim space caused by run-away tables.
  2. When you change innodb_log_file_size in configuration, InnoDB will now resize logs to match automatically.
  3. Temporal data types can now support microseconds.
  4. Silent storage engine substitution is disabled by default (sql_mode=NO_ENGINE_SUBSTITUTION).
  5. Subqueries are now optimized.
  6. Stored procedures offer a way to get more information on errors.
  7. Timestamps can be maintained on more than one column.
  8. Replication slaves are crash safe. This means when a slave crashes you no longer need to re-image its data from the master or another slave. It can resume from where it was.
  9. Replication failover is much easier with GTIDs. Previously it was very difficult when you had a master fail with many slaves. You could not simply pick a new master from one of the slaves and reposition the other slaves under it.
  10. Indexes can now be added online, which means less downtime for a lot of people.
  11. There’s a standard way to find unused indexes (performance_schema).

I should point out that don’t think MySQL’s alone in having these nuances – I think the political correct name is a “legacy feature” 🙂

In MySQL 5.7, my gripe about Control-C not working in the client has already been fixed. The Bug System also has the ability to vote for a bug as impacting you. So if you see something annoying, please click ‘Affects me’ to leave feedback.

Did I miss something? Please leave a comment.

MySQL 5.6.14 Community Release Notes

Thank you to the MySQL Community, on behalf of the MySQL team @ Oracle. Your bug reports, testcases and patches have helped create a better MySQL 5.6.14.

In particular:

  • Thanks to Justin Swanhart for pointing out an issue with statistics on partitioned tables – bug #69179. While we marked Justin’s bug as a duplicate, Justin continued to provide commentary on an earlier bug opened, and we appreciate his feedback.
  • Justin Swanhart also noticed a change in behaviour in MySQL 5.6 with an assertion raised in InnoDB – bug #70007. We’ve since reverted to the previous behaviour.
  • Thanks to Zhai Weixiang, who noticed that InnoDB code intended to only run in debug versions of MySQL was unintentionally running in all versions of mysql. Bug #69724.
  • Thanks to both Robert Lineweaver and Ryan Kenney for reporting bugs #69693 and #69707, as well as providing excellent test cases and research into versions effected. The problems both related to InnoDB mishandling some foreign key names.
  • Thanks to Roel Van de Paar who reported an assertion when innodb_log_group_home_dir does not exist. Bug #69000
  • Thanks to Elena Stepanova for reporting bug #65146. From Elena’s contribution, we both updated the documentation and now produce a warning when START TRANSACTION WITH CONSISTENT SNAPSHOT is used in isolation levels other than REPEATABLE READ.
  • Thanks to Mark Callaghan for reporting an issue with InnoDB’s background thread and shutting down the mysql server. Bug #63276.
  • Thanks to Alexey Kopytov for detecting a situation where an infinite loop could occur in handling InnoDB’s compressed pages. Alexey’s bug report contained detailed code analysis and was very helpful. Bug #61132.
  • Thanks to Shahriyar Rzayev for reporting the original bug, and uploading detailed data to assist in reproduction. It was from these contributions we were later able to devise a testcase and reproduce the bug internally! Bug #69898
  • Thanks to Justin Swanhart and Tsubasa Tanaka who both identified a bug in MASTER_DELAY not type checking input. Bugs #69249 and #69469.
  • Thanks to Hartmut Holzgraefe for reporting an issue with Row-based-replication and decimal data type between MySQL versions. Bug #57166.
  • Thanks to Mark Callaghan for reporting a performance regression in 5.6. Bug #68825
  • Thanks to Jean Weisbuch for identifying an issue with information_schema. Bug #68076.
  • Thanks to Simon Mudd for discovering an issue where the event scheduler did not correctly report when it could not create a new thread. Bug #67191
  • Thanks to Sergey Petrunya who reported bug #67507, as well as the community of users who provided input on the bug – mysqldump is often run from cron, so changes in behaviour risk backups not being performed.
  • Thanks to a private bug reporter who demonstrated that MySQL inadequently protects against stack overflow, and provided an example patch. Bug #35019
  • Thanks to “Dude Letme BE” who reported symbols were missing from libmysql.dll. Bugs #69204, #62394.
  • Thanks to Chito Angeles for reporting a bug in InnoDB fulltext search, and reducing it to a simplified testcase. Bug #69932.

Thank you again all the names listed above. In particular, I would like to call out the two names that appear more than once: Justin Swanhart (3), Mark Callaghan (2).

- Morgan