Five reasons why vertical scalability matters

The latest benchmarks show that MySQL 5.7 is now able to scale to 60 cores, which is quite an incredible feat when you compare to the 4-8 core scaling of MySQL 5.1 just a few years ago. These improvements are the result of a lot of heavy lifting to reorganize internal locking structures, and I have an earlier blog post on what is a mutex anyway? which may help serve as an introduction.

While I consider horizontal scaling and projects like MySQL Fabric to be very important, it should be stated that horizontal and vertical scaling are really orthogonal choices. That is to say that a given database technology should ideally support both options, and today I wanted to zoom in on some of the advantages I see with being able to scale vertically:

  1. Having more cores offers more consistent performance. Think of a single CPU as like shopping at a convenience store with one person on the checkout. The experience is very good when there is nobody else in line, but it degrades very quickly when just a couple of shoppers are lined up before you.

    To add to that, even if you know that there are an average of 60 customers per hour, you can not expect them to arrive at an equal distribution of one customer per minute. What usually happens is a more random pattern (see Poisson distribution) of arrivals.

    The solution to this problem is to have more staff available at the checkout. The more staff available, the less variance in the time that it takes to serve a customer. Similarly, having multiple CPUs means that query times will degrade much nicer as there are subtle spikes in load.

  2. Simplified debugging and performance characteristics. For some applications, where the lifetime growth requirements can be answered by a single server (or single master, multiple HA slaves), having a single primary server can be beneficial.

    The number of transactions/second that a single-server can now respond to is also much higher than it used to be. Dimitri's 5.7 Sysbench OLTP_RW shows 15K transactions/second (or over 500K point select queries/second), and prematurely introducing architectural-complexity through horizontal scaling may increase the effort required to troubleshoot problems. In some cases it may also artificially prevent desirable features such as strong consistency (ACID).

  3. Good insurance for the unknown. Some applications grow in unexpected ways, and being able to scale up offers a great upgrade path that is less likely to change performance characteristics than horizontal scaling, and require fewer application changes.

    Often the cost of higher-end commodity hardware is less than that of custom-development time. I once experienced the case where a legacy application that was planned for decommission started having performance problems. Paying for the biggest EC2 instance type was worth it for a few months, and we probably would have paid more if other options were available (they were not at the time, but are now).

  4. Increased efficiency at scale. That is to say that even with automation, it is easier to manage 1000 16-core instances than it is to manage 16000 single core instances.

  5. An alternative consolidation strategy to virtualization. For some organizations, backing up and maintaining many small database servers presents operational complexity, even when the underlying servers lie on virtualized hardware. Having a larger single database instance with many database schemas can offer an alternative that may be easier to manage.

    I concede that there is some functionality missing on the MySQL-side to truly realize this potential, since it would be nice to be able to set more fine-grained quotas per application and limit the ability to accidentally Denial of Service other applications. However, some headway has been made with performance_schema now able to instrument things that were previously not possible. Most notably, in MySQL 5.7, memory can now instrumented per user.

Conclusion

I wanted to close with an example of how the horizontal and vertical should work together. Lets say that you operate a SaaS application with millions of users, and have the option to either massively-shard or use a single larger server. Both options may be sub-optimal:

Vertical Scaling Horizontal Scaling

PRO: Having vertical scalability allows each user to have some burstable performance that can be absorbed by larger hardware.

CON: It is possible that an extremely busy users will impact all other users, creating an all-eggs-in-one-basket scenario.

PRO: Multiple horizontal shards allows some natural fencing where the extreme spikes can be contained to group of users. This assumes that indivual users do not need cross-shard queries, otherwise it hinges on my point above under "Simplified debugging and performance characteristics".

CON: If the application is excessively horizontally scaled, some of the busier users may have a bad experience as their shard becomes overloaded too quickly. Quite often these busy users can be the ones with the most revenue associated to them.

By horizontally scaling across many vertically scalable servers, you can get closer to get the best of both worlds.

Testing the UNION ALL Optimization in MySQL 5.7 DMR3

When MySQL 5.7 DMR3 was released, I couldn't wait to try out the new UNION ALL optimization that no longer requires data to be materialized in a temporary table.

This new optimization can be seen via EXPLAIN, but it is also nice to run it through SHOW PROFILES, which breaks down query execution step by step. However, this feature is now deprecated (since it overlaps with performance_schema), and will be removed in a future version.

So today I wanted to show you a combination of:
* What a UNION ALL statement looks like in MySQL 5.6 (EXPLAIN, SHOW PROFILES).
* How it is improved in MySQL 5.7 (EXPLAIN, SHOW PROFILES).
* How you can easily emulate the SHOW PROFILES feature with performance_schema + ps_helper :)

Initial Setup

The UNION ALL query I am going to demonstrate uses the following fake sample data:

use test;
CREATE TABLE `table_a` (
  `col1` int(11) NOT NULL AUTO_INCREMENT,
  `col2` char(255) DEFAULT NULL,
  PRIMARY KEY (`col1`)
) ENGINE=InnoDB;

CREATE TABLE `table_b` (
  `col1` int(11) NOT NULL AUTO_INCREMENT,
  `col2` char(255) DEFAULT NULL,
  PRIMARY KEY (`col1`)
) ENGINE=InnoDB;

INSERT INTO table_a (col2) VALUES ('A'), ('AA'), ('AAA');
INSERT INTO table_b (col2) VALUES ('B'), ('BB'), ('BBB');

The query that I want to use is this one:

mysql> SELECT * FROM table_a UNION ALL SELECT * FROM table_b;
+------+------+
| col1 | col2 |
+------+------+
|    1 | A    |
|    2 | AA   |
|    3 | AAA  |
|    1 | B    |
|    2 | BB   |
|    3 | BBB  |
+------+------+
6 rows in set (0.00 sec)

MySQL 5.6

We can see from EXPLAIN that MySQL 5.6 requires a step to insert the rows into a temporary table, from which the results can be returned:

mysql> EXPLAIN SELECT * FROM table_a UNION ALL SELECT * FROM table_b\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: table_a
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
        Extra: NULL
*************************** 2. row ***************************
           id: 2
  select_type: UNION
        table: table_b
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
        Extra: NULL
*************************** 3. row ***************************
           id: NULL
  select_type: UNION RESULT
        table: <union1,2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using temporary
3 rows in set (0.00 sec)

Via SHOW PROFILES, we can also see there are multiple stages of Sending data. I believe the best way to describe this state as 'shipping rows around in the server and between storage engines':

mysql> set profiling=1;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show profiles;
+----------+------------+-------------------------------------------------------+
| Query_ID | Duration   | Query                                                 |
+----------+------------+-------------------------------------------------------+
|        1 | 0.00039200 | SELECT * FROM table_a UNION ALL SELECT * FROM table_b |
+----------+------------+-------------------------------------------------------+
2 rows in set, 1 warning (0.01 sec)

mysql> show profile for query 1;
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000059 |
| checking permissions | 0.000005 |
| checking permissions | 0.000005 |
| Opening tables       | 0.000091 |
| System lock          | 0.000011 |
| optimizing           | 0.000004 |
| statistics           | 0.000012 |
| preparing            | 0.000009 |
| optimizing           | 0.000002 |
| statistics           | 0.000005 |
| preparing            | 0.000004 |
| executing            | 0.000003 |
| Sending data         | 0.000053 |
| executing            | 0.000002 |
| Sending data         | 0.000029 |
| optimizing           | 0.000006 |
| statistics           | 0.000006 |
| preparing            | 0.000004 |
| executing            | 0.000002 |
| Sending data         | 0.000019 |
| removing tmp table   | 0.000007 |
| Sending data         | 0.000002 |
| query end            | 0.000007 |
| closing tables       | 0.000010 |
| freeing items        | 0.000019 |
| cleaning up          | 0.000016 |
+----------------------+----------+
26 rows in set, 1 warning (0.00 sec)

MySQL 5.7

The output from EXPLAIN now shows the query executing in two steps instead of three. The warning here is benign. It is caused by EXPLAIN EXTENDED now being permanently enabled:

mysql> EXPLAIN SELECT * FROM table_a UNION ALL SELECT * FROM table_b\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: table_a
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 2
  select_type: UNION
        table: table_b
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
     filtered: 100.00
        Extra: NULL
2 rows in set, 1 warning (0.00 sec)

And using SHOW PROFILES we can see that there are now only two Sending data steps:

mysql> set profiling=1;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show profiles;
+----------+------------+-------------------------------------------------------+
| Query_ID | Duration   | Query                                                 |
+----------+------------+-------------------------------------------------------+
|        1 | 0.00038900 | SELECT * FROM table_a UNION ALL SELECT * FROM table_b |
+----------+------------+-------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> show profile for query 1;
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000094 |
| checking permissions | 0.000013 |
| checking permissions | 0.000008 |
| Opening tables       | 0.000068 |
| System lock          | 0.000012 |
| optimizing           | 0.000015 |
| statistics           | 0.000012 |
| preparing            | 0.000010 |
| optimizing           | 0.000003 |
| statistics           | 0.000005 |
| preparing            | 0.000006 |
| executing            | 0.000002 |
| Sending data         | 0.000046 |
| executing            | 0.000003 |
| Sending data         | 0.000014 |
| query end            | 0.000010 |
| removing tmp table   | 0.000012 |
| query end            | 0.000003 |
| closing tables       | 0.000010 |
| freeing items        | 0.000027 |
| cleaning up          | 0.000016 |
+----------------------+----------+
21 rows in set, 1 warning (0.00 sec)

You will notice that there is in fact still a temporary table, denoted by the step removing tmp table. The release notes explain this point very carefully:

"The server no longer uses a temporary table for UNION statements that
meet certain qualifications. Instead, it retains from temporary table
creation only the data structures necessary to perform result column
typecasting. The table is not fully instantiated and no rows are
written to or read from it; rows are sent directly to the client. As
a result, The result is reduced memory and disk requirements, and
smaller delay before the first row is sent to the client because the
server need not wait until the last query block is executed. EXPLAIN
and optimizer trace output will change: The UNION RESULT query block
will not be present because that block is the part that reads from
the temporary table."

Example using ps_helper

And now to demonstrate how to replace SHOW PROFILES with performance_schema! To do this, I decided to base my scripts on ps_helper by Mark Leith.

There is no specific reason you need to do this, but I admire the way ps_helper works and its something I've wanted to try extending for a while. It also includes useful helper functions to intelligently truncate/format SQL, and convert times to a human readable format.

So the steps are:

Not specifically a limitation of performance_schema, but one small difference with my script is that it is enabled globally as follows:

call ps_helper.enable_profiling();

In any new session, you can then use the profiling feature as follows:

mysql> SELECT * FROM table_a UNION ALL SELECT * FROM table_b;
+------+------+
| col1 | col2 |
+------+------+
|    1 | A    |
|    2 | AA   |
|    3 | AAA  |
|    1 | B    |
|    2 | BB   |
|    3 | BBB  |
+------+------+
6 rows in set (0.00 sec)

mysql> call ps_helper.show_profiles();
+----------+-----------+-------------------------------------------------------+
| Event_ID | Duration  | Query                                                 |
+----------+-----------+-------------------------------------------------------+
|       58 | 58.57 us  | select @@version_comment limit 1                      |
|       70 | 73.19 us  | select USER()                                         |
|       82 | 297.47 us | SELECT * FROM table_a UNION ALL SELECT * FROM table_b |
+----------+-----------+-------------------------------------------------------+
3 rows in set (0.04 sec)

Query OK, 0 rows affected (0.04 sec)

mysql> call ps_helper.show_profile_for_event_id(82);
+----------------------+-----------+
| Status               | Duration  |
+----------------------+-----------+
| init                 | 55.55 us  |
| checking permissions | 2.15 us   |
| checking permissions | 3.29 us   |
| Opening tables       | 68.30 us  |
| System lock          | 7.43 us   |
| optimizing           | 2.34 us   |
| statistics           | 9.97 us   |
| preparing            | 7.53 us   |
| optimizing           | 606.00 ns |
| statistics           | 2.99 us   |
| preparing            | 2.62 us   |
| executing            | 591.00 ns |
| Sending data         | 73.23 us  |
| executing            | 484.00 ns |
| Sending data         | 23.93 us  |
| query end            | 7.46 us   |
| removing tmp table   | 3.37 us   |
| closing tables       | 7.68 us   |
| freeing items        | 15.27 us  |
| cleaning up          | 804.00 ns |
+----------------------+-----------+
20 rows in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

I renamed the command SHOW PROFILE FOR QUERY x to show_profile_for_event_id, since this better maps to performance_schema naming, but otherwise it behaves pretty much the same.

For some reason, there is a second query end state in the SHOW PROFILES version after removing the temporary table. I'm not sure why it is, and if not including it will be an issue.

Conclusion

It's great to see these optimizations introduced into MySQL - I can think of a number of users who can benefit from a better UNION ALL.

I implemented my own SHOW PROFILES feature really as an educational step for myself to make sure I knew performance_schema, and it was actually quite a quick and joyful experience. I plan to work with the maintainer of ps_helper and see if this can be included in future versions.

The MySQL error log in MySQL 5.7

The MySQL error log has received some attention in MySQL 5.7, with a new setting called log_error_verbosity.

There are three possible values, as documented in the manual:

Verbosity Value Message Types Logged
1 Errors only
2 Errors and warnings
3 Errors, warnings, and notes (default)

As Giuseppe has written about previously, writing notes or "informational events" can create debugging problems because they reduce the signal to noise ratio. There is now an easy way to reduce the logging to be only warnings and errors!

A closer look at Online DDL in MySQL 5.6

MySQL 5.6 introduced online DDL for InnoDB tables, which means that a number of ALTER TABLE statements no longer block access to data modification while running.

I've written about this feature before: it was enough to make one of my freelancing customers fall in love with MySQL again. Today I wanted to go through what's covered and what's not.

The Manual

The MySQL manual actually does all the work for me, since there is a table under 14.2.12.1. Overview of Online DDL.

The particular column you want to pay attention to is "Allows Concurrent DML?", which means that means that neither reads or writes are blocked.

Operation In-Place? Copies Table? Allows Concurrent DML? Allows Concurrent Query? Notes
CREATE INDEX,
ADD
INDEX
Yes* No* Yes Yes Some restrictions for FULLTEXT index; see next row.
Currently, the operation is not in-place (that is, it
copies the table) if the same index being created was also
dropped by an earlier clause in the same
ALTER TABLE statement.
ADD FULLTEXT
INDEX
Yes No* No Yes Creating the first FULLTEXT index for a table
involves a table copy, unless there is a user-supplied
FTS_DOC_ID column. Subsequent
FULLTEXT indexes on the same table can
be created in-place.
DROP INDEX Yes No Yes Yes Modifies .frm file only, not the data file.
OPTIMIZE TABLE Yes Yes Yes Yes Uses ALGORITHM=INPLACE as of MySQL 5.6.17.
ALGORITHM=COPY is used if
old_alter_table=1 or
mysqld --skip-new
option is enabled. OPTIMIZE TABLE using
online DDL (ALGORITHM=INPLACE) is not
supported for tables with FULLTEXT indexes.
Set default value for a column Yes No Yes Yes Modifies .frm file only, not the data file.
Change auto-increment value
for a column
Yes No Yes Yes Modifies a value stored in memory, not the data file.
Add a foreign key
constraint
Yes* No* Yes Yes To avoid copying the table, disable
foreign_key_checks during
constraint creation.
Drop a foreign key
constraint
Yes No Yes Yes The foreign_key_checks option can be
enabled or disabled.
Rename a column Yes* No* Yes* Yes To allow concurrent DML, keep the same data type and only change the
column name.
Add a column Yes Yes Yes* Yes Concurrent DML is not allowed when adding an
auto-increment
column. Although ALGORITHM=INPLACE is
allowed, the data is reorganized substantially, so it is
still an expensive operation.
Drop a column Yes Yes Yes Yes Although ALGORITHM=INPLACE is allowed, the data is
reorganized substantially, so it is still an expensive
operation.
Reorder columns Yes Yes Yes Yes Although ALGORITHM=INPLACE is allowed, the data is
reorganized substantially, so it is still an expensive
operation.
Change ROW_FORMAT property Yes Yes Yes Yes Although ALGORITHM=INPLACE is allowed, the data is
reorganized substantially, so it is still an expensive
operation.
Change KEY_BLOCK_SIZE property Yes Yes Yes Yes Although ALGORITHM=INPLACE is allowed, the data is
reorganized substantially, so it is still an expensive
operation.
Make column NULL Yes Yes Yes Yes Although ALGORITHM=INPLACE is allowed, the data is
reorganized substantially, so it is still an expensive
operation.
Make column NOT NULL Yes* Yes Yes Yes When SQL_MODE includes
strict_all_tables or
strict_all_tables, the operation fails
if the column contains any nulls. Although
ALGORITHM=INPLACE is allowed, the data
is reorganized substantially, so it is still an expensive
operation.
Change data type of column No Yes No Yes
Add primary key Yes* Yes Yes Yes Although ALGORITHM=INPLACE is allowed, the data is
reorganized substantially, so it is still an expensive
operation. ALGORITHM=INPLACE is not
allowed under certain conditions if columns have to be
converted to NOT NULL. See
Example 14.9, "Creating and Dropping the Primary Key".
Drop primary key and add another Yes Yes Yes Yes ALGORITHM=INPLACE is only allowed when you add a new
primary key in the same ALTER
TABLE
; the data is reorganized substantially, so
it is still an expensive operation.
Drop primary key No Yes No Yes Restrictions apply when you drop a primary key primary key without
adding a new one in the same ALTER
TABLE
statement.
Convert character set No Yes No Yes Rebuilds the table if the new character encoding is different.
Specify character set No Yes No Yes Rebuilds the table if the new character encoding is different.
Rebuild with FORCE option Yes Yes Yes Yes Uses ALGORITHM=INPLACE as of MySQL 5.6.17.
ALGORITHM=COPY is used if
old_alter_table=1 or
mysqld --skip-new
option is enabled. Table rebuild using online DDL
(ALGORITHM=INPLACE) is not supported
for tables with FULLTEXT indexes.
Rebuild with "null" ALTER TABLE ...
ENGINE=INNODB
Yes Yes Yes Yes Uses ALGORITHM=INPLACE as of MySQL 5.6.17.
ALGORITHM=COPY is used if
old_alter_table=1 or
mysqld --skip-new
option is enabled. Table rebuild using online DDL
(ALGORITHM=INPLACE) is not supported
for tables with FULLTEXT indexes.
Set table-level persistent
statistics options
(STATS_PERSISTENT,
STATS_AUTO_RECALC
STATS_SAMPLE_PAGES)
Yes No Yes Yes Modifies .frm file only, not the data file.

Update: I have updated the table to reflect changes in MySQL 5.6.17.

Find that hard to remember?

What I also like about MySQL 5.6, is that it also introduces syntax to force ALTER TABLE statements to not run if they can not be performed online. Here is an example of using LOCK=NONE (must allow concurrent DML):

mysql> CREATE TABLE a (id INT NOT NULL PRIMARY KEY, a char(100),
 b char(100));
Query OK, 0 rows affected (0.02 sec)

mysql> ALTER TABLE a ADD INDEX idx_a (a), LOCK=NONE;
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE a DROP PRIMARY KEY,  LOCK=NONE;
ERROR 1846 (0A000): LOCK=NONE is not supported. 
 Reason: Dropping a primary key is not allowed without 
 also adding a new primary key. Try LOCK=SHARED.

I recommend using this syntax whenever writing migration scripts so you are never surprised by how much impact they will have.

What's upcoming?

In general I would comment that the number of operations which switched to being online in 5.6 is quite substancial. In MySQL 5.7 two additional improvements to online DDL are available, and are already noted in the MySQL 5.7 manual page:

  1. Extending a VARCHAR column is online.
  2. Renaming an index is online.

Have you used Online DDL in 5.6? Please leave a comment and share your experiences.

MySQL New Feature Checklist

I spotted this checklist attached to a Worklog for an upcoming change in MySQL 5.7.3 (not yet released). What it is, is a series of Yes/No questions to be answered by developers, which is used as part of the QA process.

  1. Any new/modification in syntax introduced
  2. Any change in existing behavior of SQL
  3. Any change in mysql protocol
  4. Any change in result formatting or data types
  5. Any change in INFORMATION_SCHEMA and PERFORMANCE_SCHEMA
  6. Concurrency testing Required
  7. Any lock/deadlock testing required
  8. Any multi user/client testing
  9. Do you expect any data race, Data Corruption when multiple users request
  10. Boundary value testing
  11. Replication testing is needed
  12. Any change in binary log formats
  13. Partitioning Changes
  14. Performance testing
  15. Is the feature claim to have performance improvement (time to first row)
  16. Do you see any implication on performance
  17. Any thing related Index
  18. Any thing touching optimizer
  19. Backup Testing
  20. Upgrade downgrade testing
  21. On disk format changes
  22. Changes in system table
  23. Retire or deprecate any mysqld configuration options
  24. Downgrade supported

The list looks pretty complete to me. It's confidence inspiring to see the level of detailed consideration that is going into MySQL 5.7. As Tomas mentioned in his keynote MySQL 5.7 DMR releases are heavily tested and should be considered of release candidate quality. Anitha has also wrote about the QA process in this blog post.

EXPLAIN PARTITIONS and EXPLAIN EXTENDED deprecation

In MySQL 5.7 we are planning to deprecate the syntax:

EXPLAIN PARTITIONS <insert query here>
EXPLAIN EXTENDED <insert query here>

.. and enable these two options by default.

The rationale is that:

  • Simple and consistent is always better. EXPLAIN FORMAT=JSON already behaves like these two flags are enabled, and if you have a partitioned table for example, it is unlikely that you would not want the PARTITIONS option. Having to remember two more flags makes the product harder to use.

  • The optimizer team has been busy refactoring and improving code quality. These two flags are supported by many if-statements, increasing complexity by more than we would like.

The intended deprecation plan is to automatically turn both flags on in MySQL 5.7 and issue deprecation warnings when using the older syntax. In MySQL 5.8, the syntax will be removed.

What do I expect will break?

That's a good question, since EXPLAIN is really a DBA tool, and is unlikely to affect running applications. I expect it to be:

  • Automated tools that depend on column order of EXPLAIN, or EXPLAIN not producing a warning (as of MySQL 5.7).

  • Automated tools that explicitly use EXPLAIN PARTITIONS or EXPLAIN EXTENDED (as of MySQL 5.8).

Both I hope should be fairly simple fixes.

How can I make my tools more resilient?

While we're on the subject of EXPLAIN and automated tools, it is a good segway to lead into EXPLAIN FORMAT=JSON, which is very well suited here. I would expect that with JSON output, applications are more likely to be able to tolerate changes to the information returned and in MySQL 5.7 the JSON is already much more detailed.

Optimizer Enhancements in MySQL 5.7

The MySQL optimizer is getting better. MySQL 5.6 introduced:

  • File sort optimizations with small limit
  • Index Condition Pushdown
  • Batched Key Access and Multi Range Read
  • Postponed Materialization
  • Improved Subquery execution
  • EXPLAIN for Insert, Update, and Delete
  • Optimizer Traces
  • Structured EXPLAIN in JSON format

This was in addition to the InnoDB storage engine now offering improved statistics collection, leading to more stable query plans.

In Evgeny Potemkin's session at MySQL Connect titled "MySQL's EXPLAIN Command New Features", two new features for 5.7 were announced. They are both incredibly useful, so I wanted to write a little about them.

EXPLAIN FOR CONNECTION

Normally with EXPLAIN, what you would be doing is finding the execution plan of a query you are intending to run, and then interpreting the output how you see fit.

What MySQL 5.7 will do, is give you the ability to see the execution plan of a running query in another connection. i.e.

EXPLAIN FORMAT=JSON FOR CONNECTION 2;

Why it's useful:
* Plans can change depending on input parameters. i.e. WHERE mydate BETWEEN '2013-01-01' and '2013-01-02' may use an index, but WHERE mydate BETWEEN '2001-01-01' and '2013-10-17' may not.
* Plans can change as data changes.
* Plans can also change depending on the context of a transaction, with InnoDB offering multi-version concurrency control.
* Optimizer statistics can change, and it's not impossible that the reason for the executing query being slow has something to do with it. It's great to have conclusive proof and be able to rule this out.

Execution cost in EXPLAIN

MySQL uses cost based optimization to pick the best query execution plan when there are multiple choices available. It is very similar to how a GPS navigator adds up estimated time and picks the best route to a destination.

What this feature does is exposes the cost as a numeric value when running EXPLAIN FORMAT=JSON. To take an example using the world sample database:

mysql [localhost] {msandbox} (world) > EXPLAIN FORMAT=JSON SELECT City.* 
FROM City INNER JOIN Country ON City.countrycode=Country.code 
ORDER BY City.NAME ASC LIMIT 100\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "4786.00"
    },
    "ordering_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "2151.00"
      },
      "nested_loop": [
        {
          "table": {
            "table_name": "country",
            "access_type": "index",
            "possible_keys": [
              "PRIMARY"
            ],
            "key": "PRIMARY",
            "used_key_parts": [
              "Code"
            ],
            "key_length": "3",
            "rows_examined_per_scan": 239,
            "rows_produced_per_join": 239,
            "filtered": 100,
            "using_index": true,
            "cost_info": {
              "read_cost": "6.00",
              "eval_cost": "47.80",
              "prefix_cost": "53.80",
              "data_read_per_join": "61K"
            },
            "used_columns": [
              "Code"
            ]
          }
        },
        {
          "table": {
            "table_name": "City",
            "access_type": "ref",
            "possible_keys": [
              "CountryCode"
            ],
            "key": "CountryCode",
            "used_key_parts": [
              "CountryCode"
            ],
            "key_length": "3",
            "ref": [
              "world.country.Code"
            ],
            "rows_examined_per_scan": 9,
            "rows_produced_per_join": 2151,
            "filtered": 100,
            "cost_info": {
              "read_cost": "2151.00",
              "eval_cost": "430.20",
              "prefix_cost": "2635.00",
              "data_read_per_join": "151K"
            },
            "used_columns": [
              "ID",
              "Name",
              "CountryCode",
              "District",
              "Population"
            ]
          }
        }
      ]
    }
  }
}

Why it's useful:

  • This exposes more transparency into optimizer decisions. DBAs can better understand what part of a query is considered expensive, and try to optimize. I think this is important, because I have heard a lot of DBAs make blanket recommendations like "joins are bad" or "sorting is bad", but there needs to be context on how much data needs to be sorted. It makes us all speak the same language: estimated cost.
  • Cost refinement is an ongoing effort. As well as the introduction of new fast SSD storage, MySQL is introducing new optimizations (such as index-condition pushdown). Not all of these optimizations will be the best choice every time, and MySQL should ideally be able to make a right choice for all situations.

Experimenting with MySQL 5.7

I was playing around with MySQL 5.7 this weekend and before having read the changelog, I managed to spot these two little gems.

Duplicate Indexes

"The server now issues a warning if an index is created that duplicates an existing index, or an error in strict SQL mode." Bug #37520

Example Testcase:

mysql> SHOW CREATE TABLE city\G
*************************** 1. row ***************************
       Table: city
Create Table: CREATE TABLE `city` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `Name` char(35) NOT NULL DEFAULT '',
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `District` char(20) NOT NULL DEFAULT '',
  `Population` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `CountryCode` (`CountryCode`),
  CONSTRAINT `city_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `Country` (`Code`)
) ENGINE=InnoDB AUTO_INCREMENT=4080 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> ALTER TABLE city add index (countrycode);
ERROR 1831 (HY000): Duplicate index 'CountryCode_2' defined on the table 'world.city'. 
This is deprecated and will be disallowed in a future release.

Pretty cool - I know this previously caught a lot of people.

Control-C support in the client

"Previously, Control+C in mysql interrupted the current statement if there was one, or exited mysql if not. Now Control+C interrupts the current statement if there was one, or cancels any partial input line otherwise, but does not exit." Bug #66583

Example Testcase:

mysql> this is a test
    -> test
    -> test
    -> ^C

So if I want to quit, I can now control-C then type "quit". This is much more intuitive.