Benchmarking InnoDB page compression performance

InnoDB page compression is a feature that really interests me. I wrote this about it recently when describing how to improve the performance of large tables in MySQL:

“Use innodb page compression. For some workloads (particularly those with lots of char/varchar/text data types) compression will allow the data to be more compact, stretching out that performance curve for longer. It may also allow you to more easily justify SSDs which are typically smaller in capacity. InnoDB page compression was improved a lot in MySQL 5.6, courtesy of Facebook providing a series of patches.”

After writing that, I decided to setup an experiment.

The Experiment

I wanted to find data that was typical to be stored in a database, but would also compress well. There is a huge potential for skew here, since if I used dummy data such as ‘AAAAAAAAAAAA’ it will compress very well. Likewise, jpeg images stored in blobs would unlikely compress any more than they already are. So I arrived at using the English version of Wikipedia, which is hopefully representative of “real data”.
So after downloading the data set and importing it from its XML format into MySQL, I ran the following script:

DROP TABLE IF EXISTS text_16K_compressed;
DROP TABLE IF EXISTS text_8K_compressed;
DROP TABLE IF EXISTS text_4K_compressed;
DROP TABLE IF EXISTS text_16K_uncompressed;
CREATE TABLE `text_16K_compressed` (
  `old_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `old_text` mediumblob NOT NULL,
  `old_flags` tinyblob NOT NULL,
  PRIMARY KEY (`old_id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16;
CREATE TABLE `text_8K_compressed` (
  `old_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `old_text` mediumblob NOT NULL,
  `old_flags` tinyblob NOT NULL,
  PRIMARY KEY (`old_id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
CREATE TABLE `text_4K_compressed` (
  `old_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `old_text` mediumblob NOT NULL,
  `old_flags` tinyblob NOT NULL,
  PRIMARY KEY (`old_id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;
CREATE TABLE `text_16K_uncompressed` (
  `old_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `old_text` mediumblob NOT NULL,
  `old_flags` tinyblob NOT NULL,
  PRIMARY KEY (`old_id`)
) ENGINE=InnoDB;
select now();
INSERT INTO text_16K_compressed SELECT * FROM text;
select now();
INSERT INTO text_8K_compressed SELECT * FROM text;
select now();
INSERT INTO text_4K_compressed SELECT * FROM text;
select now();
INSERT INTO text_16K_uncompressed SELECT * FROM text;
select now();

With the results, there’s really two aspects to measure compression by here – footprint and import time:

So for 4K and 8K pages, there is a >50% saving in disk space, since the table compresses down from 51G to 21G. In addition, it’s pretty clear that (at least on my hardware) the compression doesn’t have any impact on the import time: the reduced footprint actually helped the 8K compressed pages take less time to insert than the uncompressed.
So I believe I did manage to prove that compression is very useful. However I also acknowledge that there is some skew in my test:

  • INSERT SELECT is single threaded. Ideally a test should reflect a real-life requirement.
  • My single 5400 RPM hard drive does not reflect common configuration in production.
  • I also disabled the doublewrite buffer and set innodb-flush-log-at-trx-commit=2. It would be nice to demonstrate if this impacts the test.
  • I really should have waited for all dirty pages to flush between all tests. This was an oversight on my part, and I think this advantaged 16K uncompressed insert (which is just a little better than the others).

Full Disclosure

Some details about my setup:

  • Linux Mint 15 (Ubuntu 13.04)
  • MySQL 5.7.2
  • Intel i5-2400 4 core CPU @ 3.10GHz
  • Single 5400 RPM 1TB Hard Drive (WDC WD10EADS-00L5B1)
  • 32G RAM

MySQL Configuration changes from default:

innodb-buffer-pool-size=16G
innodb-log-file-size=4G
innodb-flush-log-at-trx-commit=2
innodb-doublewrite=0
innodb-file-format=barracuda

Raw output from tests:

morgo@shuttle ~/Downloads $ ~/sandboxes/msb_5_7_2/use mediawiki < create-compressed-tables.sql
now()
2013-10-28 16:12:04
now()
2013-10-29 01:34:48
now()
2013-10-29 11:20:04
now()
2013-10-29 21:13:27
now()
2013-10-30 07:37:48
morgo@shuttle ~/sandboxes/msb_5_7_2/data/mediawiki $ ls -lS text*
-rw-rw---- 1 morgo morgo 63472402432 Oct 28 14:35 text.ibd
-rw-rw---- 1 morgo morgo 53741617152 Oct 30 07:40 text_16K_uncompressed.ibd
-rw-rw---- 1 morgo morgo 38176555008 Oct 29 01:51 text_16K_compressed.ibd
-rw-rw---- 1 morgo morgo 21768437760 Oct 29 21:32 text_4K_compressed.ibd <-- See below
-rw-rw---- 1 morgo morgo 21768437760 Oct 29 11:40 text_8K_compressed.ibd
-rw-rw---- 1 morgo morgo        8642 Oct 28 16:12 text_16K_compressed.frm
-rw-rw---- 1 morgo morgo        8642 Oct 28 16:12 text_16K_uncompressed.frm
-rw-rw---- 1 morgo morgo        8642 Oct 28 16:12 text_4K_compressed.frm
-rw-rw---- 1 morgo morgo        8642 Oct 28 16:12 text_8K_compressed.frm
-rw-rw---- 1 morgo morgo        8642 Oct 27 19:22 text.frm

An earlier version of this post featured incorrect results for 4K compressed pages (credit: @Skunnyk). Here are the results for 4K pages re-processed:

morgo@shuttle ~/Downloads $ ~/sandboxes/msb_5_7_2/use mediawiki < create-compressed-tables.sql
..
now()
2013-10-31 11:31:24
now()
2013-10-31 22:13:32
morgo@shuttle ~/sandboxes/msb_5_7_2/data/mediawiki $ ls -lS text_4*ibd
-rw-rw---- 1 morgo morgo 20858273792 Oct 31 22:24 text_4K_compressed.ibd

Installing the latest MySQL 5.6 on Amazon Linux using official repos

In case you missed the announcement Monday, there are now official yum repositories for Fedora 18, 19 and Enterprise Linux 6 (Red Hat and Oracle Linux).
In my case however, I often use Amazon Web Services, and I find that Amazon Linux is just a little bit friendlier to use, since it has better mirror configuration out of the box.
The good news for me is that the official MySQL yum repositories (unofficially) work just fine. Here is an example installation:

sudo yum localinstall http://repo.mysql.com/mysql-community-release-el6-3.noarch.rpm
sudo yum install mysql-community-server
sudo service mysqld start

Configure mysqld to start on boot:

sudo chkconfig mysqld on
chkconfig --list mysqld

And that’s it, you’re all set up and running. One thing I noticed, is the configuration file included is very clean! This is because MySQL has improved all the defaults in 5.6:

# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.6/en/server-configuration-defaults.html
[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Recommended in standard MySQL setup
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

It doesn’t get much simpler than this. I love it!
Norvald has also blogged about some of the other improvements to the packaging (systemd,
tmpfiles.d, SELinux, multilib).

MySQL New Feature Checklist

I spotted this checklist attached to a Worklog for an upcoming change in MySQL 5.7.3 (not yet released). What it is, is a series of Yes/No questions to be answered by developers, which is used as part of the QA process.

  1. Any new/modification in syntax introduced
  2. Any change in existing behavior of SQL
  3. Any change in mysql protocol
  4. Any change in result formatting or data types
  5. Any change in INFORMATION_SCHEMA and PERFORMANCE_SCHEMA
  6. Concurrency testing Required
  7. Any lock/deadlock testing required
  8. Any multi user/client testing
  9. Do you expect any data race, Data Corruption when multiple users request
  10. Boundary value testing
  11. Replication testing is needed
  12. Any change in binary log formats
  13. Partitioning Changes
  14. Performance testing
  15. Is the feature claim to have performance improvement (time to first row)
  16. Do you see any implication on performance
  17. Any thing related Index
  18. Any thing touching optimizer
  19. Backup Testing
  20. Upgrade downgrade testing
  21. On disk format changes
  22. Changes in system table
  23. Retire or deprecate any mysqld configuration options
  24. Downgrade supported

The list looks pretty complete to me. It’s confidence inspiring to see the level of detailed consideration that is going into MySQL 5.7. As Tomas mentioned in his keynote MySQL 5.7 DMR releases are heavily tested and should be considered of release candidate quality. Anitha has also wrote about the QA process in this blog post.

The slow decline of using MySQL read slaves

Once upon a time, it was a very common MySQL choice to have an application split its queries between two connections, sending reads to a slave and writes directly to the master.

I won’t say this technique is no longer used (it certainly is) but I have been observing its popularity decline slowly over the years. Today I wanted to try and provide insight into some of the possible reasons:

  1. InnoDB as default. MyISAM table locks are very high impact for high throughput systems – worse still, MyISAM does not have MVCC, an algorithm that allows InnoDB reads to be non blocking. Running with multiple replication slaves was very important to distribute read-locks across an array of servers. With InnoDB, for most people this should now be a non-issue.
  2. Single-threaded replication. When hardware has multiple CPUs and multiple disks in RAID (and many SSDs in singular) it often requires multiple concurrent threads to be able to exploit all of the capacity that it can offer. Master servers can exploit this capacity very well, but they then replicate to slaves through a single thread. This creates a situation where the problem is not read scaling (which splitting reads/writes solves very well) but write scaling. The slaves very quickly become the bottleneck as they can not keep up.
  3. Increased desire for ACID behaviour. MySQL replication slaves are asynchronous[1], which means that if the application does an immediate read-after-write, then there is a good chance stale data might be served from a slave. Actually, it’s worse than that: with single-threaded slaves, if there is a 10 second update on the master, there is at least a 10 second delay on all slaves.

    Since the data is in an inconsistent state, mixing reads and writes between masters and slaves like this is not ACID compliant and introduces what I would call “asynchronous failures”. This leads to more time spent debugging errors.

    MySQL has also been reducing the performance hit to run ACID (including durability) in recent versions. I recommend it for most people.

    [1] Semi-sync introduced in MySQL 5.5 only ensures one slave is up to date.

  4. Increased performance-per-server. From MySQL 5.5 onwards there has been a huge focus on performance. Add to that increased memory in new servers, fast SSDs, it is possible that for some users their lifetime performance needs can be answered by a single server without the need for architectural changes.

Of these points, I suspect 1, 2, & 3 are the most likely, and to some degree 2 (single-threaded replication) makes 3 (consistency) worse because of the single threaded blips. There has been a lot of progress made to improve this in new versions:

  • MySQL 5.6 introduced parallel replication slaves across schemas. This increases the write throughput of slaves significantly, you just have to split tables across multiple databases then set –slave-parallel-workers to take advantage.
  • MySQL 5.7 DMR2 takes this one step further and offers intra-schema parallel replication. This offers parallel slaves and can keep all tables in the same schema.

So it is possible that we may actually see a return of read/write split in coming years. But it is also possible that as MySQL Fabric matures we may see users just rely on sharding instead, even though these two solutions should be orthogonal. It is so hard to predict the future.
I should also point out: I still see slaves immensely valuable for HA/DR as well as a place to send very heavy reporting queries to. This hasn’t changed.
What do you think about read/write split?

MySQL Certification Study – DBA Topics

For the MySQL DBA Exam, I wrote that I needed to brush up on Security and High Availability techniques. So let’s take a look at these two in more detail.

Security


There were quite a lot of enhancements to security introduced in MySQL 5.6, and I want to make sure I’m not caught off guard by new questions. The major changes (with links to manual pages) are:

  • Password expiration support
  • Enforce password policy such as minimum length or special chars requirements.
  • sha256_password for stronger password hashing.
  • Secure-auth enabled by default. This means clients using passwords stored in pre-MySQL 4.1 format won’t be able to connect.
  • mysql_config_editor has been introduced. The intention of this utility is to stop users from entering passwords in command line arguments.
  • Passwords are no longer written to log files.

Enterprise Edition:
PAM module for providing external authentication such as Unix passwords or an LDAP directory (actually this change was in MySQL 5.5).
* Native Windows authentication is also supported.
Audit log plugin for policy-based monitoring and logging (technically also introduced in MySQL 5.5).
Additionally, it’s worthwhile looking at these manual pages:
* http://dev.mysql.com/doc/refman/5.6/en/security.html
* http://dev.mysql.com/doc/refman/5.6/en/general-security-issues.html

High Availability Techniques

I wrote about availability a couple of days ago, but this is only part of the problem. The certification wants you to know about the technologies that have been certified as officially supported:

  • DRBD – Most people describe DRBD as “Network level RAID1 for Linux”. It replicates disk-block changes from one server to another, and is typically configured as two servers: one active, one passive, with a virtual IP address pointing to the current ‘active server’.

    Actually, that’s a bit of a simplification: DRBD only provides the disk-block replication. It needs a cluster manager such as Pacemaker or Corosync to do the virtual IP address and failover part.

  • Windows Failover Clustering – This solution is more analogous to Pacemaker than it is to DRBD. What WFC allows you to do is share a virtual IP address between at least two servers that access mysql data files via shared storage. WFC will ensure that only one server will have the mysql data files open at a time.

  • Solaris Cluster – MySQL provides an agent to work with the Solaris cluster framework.

  • OVM Templates for MySQL Enterprise – What this solution does, is allows MySQL to run as a virtual machine. If the underlying physical host fails, the MySQL VM can be migrated to another physical host. Live migrations are also supported if the host hardware requires maintenance. This kind of setup relies on shared storage.

Ten ways to improve the performance of large tables in MySQL

Today I wanted to take a look at improving the performance of tables that cause performance problems based largely on their size. Some of this advice also applies to databases that are large in-aggregate over many tables, but I always find the individually large table a special-case that is problematic.
What you will normally find is that the speed that the table can be modified will trend down as the size increases. Here is what I am going to call the typical B+Tree index performance over time:


This graph taken from a post by MySQL@Facebook. It shows the performance degradation of inserting one billion rows into a table with insert buffer disabled (not recommended, and used for demonstration purposes only). Note that this is in log scale!

The benchmark is called iibench, and was designed by TokuTek.

So we should expect degradation of performance due to the structure of the index, but there are actually some ways that we can try and stretch out the curve, and not degrade as quickly.
Ten potential ways to reduce large table impact:

  1. Make sure to use InnoDB instead of MyISAM. MyISAM can be faster at inserts to the end of a table, but it has both table locking (limiting updates and deletes) and uses a single lock to protect the key buffer when loading data to/from disk, resulting in contention. It also does not have the change buffering feature described just below.

  2. InnoDB has change buffering (previously called the insert buffer), which is a feature to delay building secondary indexes that are not unique, and merge writes. It’s further described by Facebook here. It’s not shown in the graph above, but it can boost insert performance by quite a lot, and it’s enabled by default. It was greatly improved in MySQL 5.5, so it is time to upgrade if you haven’t.

  3. Partitioning may reduce the size of indexes, effectively reducing the table
    into many smaller tables. It also reduces internal index->lock contention, something that has been greatly improved in the MySQL 5.7.2 DMR.

  4. Use innodb page compression. For some workloads (particularly those with lots of char/varchar/text data types) compression will allow the data to be more compact, stretching out that performance curve for longer. It may also allow you to more easily justify SSDs which are typically smaller in capacity. InnoDB page compression was improved a lot in MySQL 5.6, courtesy of Facebook providing a series of patches.

  5. Sort and bulk load data into tables. Inserting in order will result in fewer page splits (which will perform worse on tables not in memory), and the bulk loading is not specifically related to the table size, but it will help reduce redo log pressure.

  6. Remove any unnecessary indexes on the table, paying particular attention to UNIQUE indexes as these disable change buffering. Don’t use a UNIQUE index if you have no reason for that constraint; prefer a regular INDEX.

  7. Related to the points 5 & 6, the type of primary key also matters. It is much better to use either an INT or BIGINT datatype than say a GUID, which will have a curve that degrades much faster. Having no PRIMARY KEY will also affect performance negatively.

  8. If bulk loading a fresh table, delay creating any indexes besides the PRIMARY KEY. If you create them once all data is loaded, then InnoDB is able to apply a pre-sort and bulk load process which is both faster and results in typically more compact indexes. This optimization became true in MySQL 5.5.

  9. More memory can actually help here too. I frequently see people under spec memory on new database servers compared to what it actually costs these days. Simple advice: If SHOW ENGINE INNODB STATUS shows any reads/s under BUFFER POOL AND MEMORY and the number of Free buffers (also under BUFFER POOL AND MEMORY) is zero, you could benefit from more (assuming you have sized innodb_buffer_pool_size correctly on your server. See here.)

  10. As well as memory, SSDs can help too. Much of the performance drop shown on the curve can be attributed to additional IO which is created as the table gets bigger. While a hard drive can do 200 operations per second (IOPS), a typical SSD will do 20K+

EXPLAIN PARTITIONS and EXPLAIN EXTENDED deprecation

In MySQL 5.7 we are planning to deprecate the syntax:

EXPLAIN PARTITIONS <insert query here>
EXPLAIN EXTENDED <insert query here>

.. and enable these two options by default.
The rationale is that:

  • Simple and consistent is always better. EXPLAIN FORMAT=JSON already behaves like these two flags are enabled, and if you have a partitioned table for example, it is unlikely that you would not want the PARTITIONS option. Having to remember two more flags makes the product harder to use.

  • The optimizer team has been busy refactoring and improving code quality. These two flags are supported by many if-statements, increasing complexity by more than we would like.

The intended deprecation plan is to automatically turn both flags on in MySQL 5.7 and issue deprecation warnings when using the older syntax. In MySQL 5.8, the syntax will be removed.

What do I expect will break?

That’s a good question, since EXPLAIN is really a DBA tool, and is unlikely to affect running applications. I expect it to be:

  • Automated tools that depend on column order of EXPLAIN, or EXPLAIN not producing a warning (as of MySQL 5.7).

  • Automated tools that explicitly use EXPLAIN PARTITIONS or EXPLAIN EXTENDED (as of MySQL 5.8).

Both I hope should be fairly simple fixes.

How can I make my tools more resilient?

While we’re on the subject of EXPLAIN and automated tools, it is a good segway to lead into EXPLAIN FORMAT=JSON, which is very well suited here. I would expect that with JSON output, applications are more likely to be able to tolerate changes to the information returned and in MySQL 5.7 the JSON is already much more detailed.

Top 10 advances to availability since MySQL 5.5

Recently I found myself talking to a company that was a big user of MySQL, but just hadn’t followed all of the latest developments so closely. When the conversation went to availability I was actually not as prepared as I would have liked, and forgot about some of the (quite large) improvements that have been made to MySQL recently.
So here is me trying to write it all down for the next time I’m in the same situation. Note that I am using the definition of Continuous availability, which means any service disruptions that impact the business not just reduced redundancy.

#1 – InnoDB as default

When: MySQL 5.5
Why:
* Row locks instead of table-locks means that there is less contention and sudden stalls as applications become more loaded (i.e. performance degrades much better).
* InnoDB also features Multiversion concurrency control, which means that queries that read data do not have to set locks – further increasing concurrency and reducing potential stalls.
* If MySQL crashes, InnoDB is able to perform crash recovery very quickly and come back online (it just replays through its log file). For MyISAM crash recovery takes much longer, as the whole table must be examined.

#2 – Crash safe Replication

When: MySQL 5.6
Why:
* Replication slaves are crash safe. This means when a slave crashes you no longer need to re-image its data from the master or another slave. The slave can resume from where it was.
* This results is less time without HA, and reduced operations team overhead.

#3 – Semi-sync replication

When: MySQL 5.5 (will get even better performance in MySQL 5.7).
Why:
* Semi-sync is an option where you can make sure at least one slave has a copy of the data.
* This means less chance of lost transactions if a master fails.

#4 – Improved Group-commit

When: MySQL 5.6
Why:
* This means that transactions being committed near the same time in InnoDB will merge together and write to transaction logs as one operation.
* This is important because it makes making MySQL run in durable mode not result in as big of a performance drop.
* With the addition of changes like this (and fast SSDs) I recommend durability to a lot of users.

#5 – Replication with GTIDs

When: MySQL 5.6
Why:
* Previously it was very difficult when you had a master fail with many slaves. You could not simply pick a new master from one of the slaves and reposition the other slaves under it.
* With GITDs failover and maintenance operations that change the topology are much easier.

#6 – Online DDL

When: MySQL 5.6
Why:
* A large number of DDL operations (such as adding indexes) now do not block other queries reading or writing to the table. The list of which are blocking writes (‘allows concurrent DML’) is also well documented in the manual.
* Syntax is also provided so that if the query can not run without locking, it will refuse to execute. For example: ALTER TABLE my_table ADD INDEX a (a), LOCK=NONE;.

#7 – Improved InnoDB crash recovery

When: MySQL 5.5
Why:
* It was discovered that InnoDB crash recovery time was taking a long time on new servers with large amounts of RAM.
* The performance was greatly improved with a new crash recovery algorithm in MySQL 5.1 (InnoDB plugin edition only) and MySQL 5.5 (by default).
* This is now a very serious issue if you are running an older version of MySQL on some of the hardware available today.

#8 – Improved adaptive flushing

When: MySQL 5.5 (improved in MySQL 5.6)
Why:
* For performance InnoDB delays writes by first writing to a transaction log in the foreground, and then writing to destined page locations in the background.
* In earlier versions of MySQL it is possible that the log files can ‘fill up’ and have no free space available for new modifications. This can result in sudden stalls as the server quickly frees up and makes log space available.
* In MySQL 5.5 an adaptive flushing algorithm is introduced (default: on) to flush pages more aggressively as free space is running low. This feature results in more consistent and stable response times for all queries.

#9 – LRU not victim to side workloads

When: Introduced MySQL 5.5, MySQL 5.6 (by default)
Why:
* When MySQL needs to perform sudden IO (such as in the case of a mysqldump) it is possible that the ‘good’ contents of the cache could be unloaded from memory.
* In MySQL 5.5, the cache algorithm was changed from a classic LRU to a young sublist (default: 63%) and old sublist (default: 37%) so that these expensive side-load queries could have a fixed resource limit applied to them.
* The configuration variable innodb_old_blocks_time was introduced to specify the minimum time in milliseconds that a page must be in the old sublist before it can be promoted to the young sublist. In 5.6 it defaults to 1000.

#10 – MySQL Utilities

When: Independent Release
Why:
* MySQL now releases an official set of MySQL Utilities that provide command-line utilities for maintaining and administering MySQL servers.
* Notably mysqlfailover provides a scripted way to failover when using Replication with GTIDs. MySQL Fabric (Not yet GA; Labs Release Only) provides sharding with High-Availability groups.

Heads up – Implicit sorting by GROUP BY is deprecated in MySQL 5.6

For those who were unaware, in MySQL the following statements are currently identical:

SELECT MAX(Population), CountryName FROM City GROUP BY CountryName;
SELECT MAX(Population), CountryName FROM City GROUP BY CountryName ORDER BY CountryName;

That is to say that regardless of whether or not you asked for it, whenever you chose to GROUP BY, you will also have data sorted and returned in that order too.
The problem with this, is that it can result in worse performing queries. Sorting either reduces the number of execution plans possible, or requires an additional step to sort the data. Which is why many DBAs advocate writing group by queries with ORDER BY NULL. i.e.

SELECT MAX(Population), CountryName FROM City GROUP BY CountryName;

Should be written as:

SELECT MAX(Population), CountryName FROM City GROUP BY CountryName ORDER BY NULL;

But, as Roland Bouman notes in BUG #30477 there is no standard which requires MySQL to order data in this way, and this behaviour is not present in other databases. So in MySQL 5.6, it was decided that relying on this implicit sorting should be deprecated. From
the manual:

“Relying on implicit GROUP BY sorting in MySQL 5.6 is deprecated. To achieve a specific sort order of grouped results, it is preferable to use an explicit ORDER BY clause. GROUP BY sorting is a MySQL extension that may change in a future release; for example, to make it possible for the optimizer to order groupings in whatever manner it deems most efficient and to avoid the sorting overhead.”

So nothing has changed yet, but please make sure you are using GROUP BY CountryName ORDER BY CountryName if your application requires it.

MySQL Certification Study – Stored Programs

I previously wrote that I am studying for the MySQL 5.6 exam, and that I’m less
confident in my skills as a database developer. When I went through the list of topics in the exam, one thing I knew I would have to study is stored programs.
So first, some definitions. From the manual:

Stored programs include these objects:

  • Stored routines, that is, stored procedures and functions. A stored procedure is invoked using the CALL statement. A procedure does not have a return value but can modify its parameters for later inspection by the caller. It can also generate result sets to be returned to the client program. A stored function is used much like a built-in function. you invoke it in an expression and it returns a value during expression evaluation.
  • Triggers. A trigger is a named database object that is associated with a table and that is activated when a particular event occurs for the table, such as an insert or update.
  • Events. An event is a task that the server runs according to schedule.

So a stored program is the broader name applied to code that runs in the database. A procedure is different to a function because it is executed via a call statement and can modify input parameters rather than just returning a value. Check.
The next point I have to look at is creation and executing stored procedures and functions. That’s covered on this manual page:

mysql> delimiter //
mysql> CREATE PROCEDURE dorepeat(p1 INT)
    -> BEGIN
    ->   SET @x = 0;
    ->   REPEAT SET @x = @x + 1; UNTIL @x > p1 END REPEAT;
    -> END
    -> //
Query OK, 0 rows affected (0.00 sec)
mysql> delimiter ;
mysql> CALL dorepeat(1000);
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT @x;
+------+
| @x   |
+------+
| 1001 |
+------+
1 row in set (0.00 sec)
mysql> CREATE FUNCTION hello (s CHAR(20))
mysql> RETURNS CHAR(50) DETERMINISTIC
    -> RETURN CONCAT('Hello, ',s,'!');
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT hello('world');
+----------------+
| hello('world') |
+----------------+
| Hello, world!  |
+----------------+
1 row in set (0.00 sec)

I also need to know how to implement error handling within a stored procedures. This is actually improved a lot in 5.6 with the addition of GET DIAGNOSTICS. Using the example from the manual page:

CREATE PROCEDURE do_insert(value INT)
BEGIN
  -- Declare variables to hold diagnostics area information
  DECLARE code CHAR(5) DEFAULT '00000';
  DECLARE msg TEXT;
  DECLARE rows INT;
  DECLARE result TEXT;
  -- Declare exception handler for failed insert
  DECLARE CONTINUE HANDLER FOR SQLEXCEPTION
    BEGIN
      GET DIAGNOSTICS CONDITION 1
        code = RETURNED_SQLSTATE, msg = MESSAGE_TEXT;
    END;
  -- Perform the insert
  INSERT INTO t1 (int_col) VALUES(value);
  -- Check whether the insert was successful
  IF code = '00000' THEN
    GET DIAGNOSTICS rows = ROW_COUNT;
    SET result = CONCAT('insert succeeded, row count = ',rows);
  ELSE
    SET result = CONCAT('insert failed, error = ',code,', message = ',msg);
  END IF;
  -- Say what happened
  SELECT result;
END;

So we DECLARE A CONTINUE HANDLER (manual pages here and here) and then optionally use GET DIAGNOSTICS for more fine-grained error information. We can also return an error using the SIGNAL functionality (manual page here).