Proposal to enable sql mode ONLY_FULL_GROUP_BY by default

We are considering enabling the SQL mode ONLY_FULL_GROUP_BY by default in MySQL 5.7. Here is a quick example to demonstrate how this change will impact you:

Sample Data:

CREATE TABLE invoice_line_items (id INT NOT NULL PRIMARY KEY auto_increment, 
invoice_id INT NOT NULL,
description varchar(100)
);

INSERT INTO invoice_line_items VALUES 
 (NULL, 1, 'New socks'),
 (NULL, 1, 'A hat'),
 (NULL, 2, 'Shoes'),
 (NULL, 2, 'T shirt'),
 (NULL, 3, 'Tie');

Default behaviour in MySQL 5.6 and below:

mysql> SELECT id, invoice_id, description 
FROM invoice_line_items GROUP BY invoice_id;
+----+------------+-------------+
| id | invoice_id | description |
+----+------------+-------------+
|  1 |          1 | New socks   |
|  3 |          2 | Shoes       |
|  5 |          3 | Tie         |
+----+------------+-------------+
3 rows in set (0.00 sec)

Proposed default behaviour in MySQL 5.7:

mysql> SELECT id, invoice_id, description 
FROM invoice_line_items GROUP BY invoice_id;
ERROR 1055 (42000): 'test.invoice_line_items.id' isn't in GROUP BY

That is to say that in 5.7 this statement will produce an error.

Notice that columns id and description are not actually unique? Roland Bouman has a blog post from 2007 that describes what MySQL is doing here, and how it is different in behaviour to other databases.

To summarize: MySQL is picking one of the possible values for id and description and the query executes in a non deterministic way. Here's some more information to prove this:

mysql> SELECT id, invoice_id, description, max(description) 
FROM invoice_line_items GROUP BY invoice_id;
+----+------------+-------------+------------------+
| id | invoice_id | description | max(description) |
+----+------------+-------------+------------------+
|  1 |          1 | New socks   | New socks        |
|  3 |          2 | Shoes       | T shirt          |
|  5 |          3 | Tie         | Tie              |
+----+------------+-------------+------------------+
3 rows in set (0.00 sec)

mysql> SELECT GROUP_CONCAT(id), invoice_id, GROUP_CONCAT(description) 
FROM invoice_line_items GROUP BY invoice_id;
+------------------+------------+---------------------------+
| GROUP_CONCAT(id) | invoice_id | GROUP_CONCAT(description) |
+------------------+------------+---------------------------+
| 1,2              |          1 | New socks,A hat           |
| 3,4              |          2 | Shoes,T shirt             |
| 5                |          3 | Tie                       |
+------------------+------------+---------------------------+
3 rows in set (0.00 sec)

What is problematic about this behaviour is that it can hide bugs in application code. To highlight two blog posts:

The Proposal

We are proposing to change ONLY_FULL_GROUP_BY to be enabled by default as of MySQL 5.7. The downside in doing this, is that some users upgrading will have statements return errors. These users will be left with two options:

  1. Set sql_mode in their my.cnf (or my.ini) file explicitly, or as part of their application code since sql_mode is configurable per session.
  2. Modify the offending GROUP BY statements to be deterministic.

On the second point, we are also proposing introducing an ANY_VALUE() SQL function to allow statements that are 'acceptable as non-deterministic' to run easy retrofit in this less permissive configuration. For example:

mysql> SELECT ANY_VALUE(id) AS id, invoice_id, 
ANY_VALUE(description) AS description 
FROM invoice_line_items GROUP BY invoice_id;
+------+------------+-------------+
| id   | invoice_id | description |
+------+------------+-------------+
| 1    |          1 | New socks   |
| 3    |          2 | Shoes       |
| 5    |          3 | Tie         |
+------+------------+-------------+
3 rows in set (0.01 sec)

Why I personally like this proposal, is that I compare it to buying an undercooked burger in a North American restaurant. We are proposing to switch our policy from "we may serve it to you without you having realized you ordered it" to "sign a waiver, then we will serve it".

i.e. We are not removing options, but want you to have to opt-in to what can be an unsafe choice, rather than opt-out.

Fans of standard SQL will rejoice at compatibility with SQL-2003, a standard defined after this behavior was first implemented.

Conclusion

As with other changes that have been proposed for MySQL 5.7, we are soliciting your feedback!

  • Have you accidentally returned incorrect data, and could have benefited from having ONLY_FULL_GROUP_BY enabled?
  • Do you maintain an application that relies on the non deterministic behaviour?
  • Do you agree or disagree that this change is the better default for users?

Please leave a comment, or get in touch!

Update: This proposal has gone ahead. MySQL 5.7 DMR5 has ONLY_FULL_GROUP_BY enabled by default!

  • Pingback: MySQL, GROUP BY, Select lists, and Standard SQL | SQL And Its Sequels()

  • Gary P Smith

    Hi Morgan,
    I ran across your blog and this topic as I've been researching an issue related to this and an open source project called XBMC. In answer to your questions, I believe that FULL_GROUP_BY_ONLY should be the default, and I can't think of any reasons why I wouldn't want it.
    However, I do have a reason why I would like to have the option of turning it off. I would prefer not to need it, but the XBMC team has some code in their application that is not SQL-2003 compliant. I will be creating a ticket in hopes that they can clean up their code (they develop using SQLite3). In the mean time, I would like to configure my MySQL v5.6 instance to accept their problematic code.
    I'm a bit confused as to when the default of FULL_GROUP_BY_ONLY has been established. It looks like it is already the default for version 5.6. My SQL MODE parameters only include the NO_ENGINE_SUBSTITUTION and STRICT_TRANS_TABLES options. It does not include the FULL_GROUP_BY_ONLY option. Yet my database is enforcing the columns be listed in the GROUP BY clauses as though it had been included.
    So, now I'm stuck. I can't change their code (at least not willing to become an open source developer to do it), and I can't change the behavior of MySQL v5.6 to allow the "loose" group by SQL statements to work.
    Is there an option in 5.6 that I'm unaware of that I can use to turn this off?
    I hope you put an option in 5.7 to allow us to turn it off.
    My feedback,
    Gary P Smith

    • http://www.tocker.ca/ Morgan Tocker

      Hi Gary,

      What can happen, is that individual sessions override and change the SQL Mode. I can confirm that ONLY_FULL_GROUP_BY is not set by default in 5.6, but the option to set it dates back to MySQL 5.0 (2005).

      To answer your question: we will retain the ability to change the defaults and not set ONLY_FULL_GROUP_BY if an application is incompatible with it.

      • Gary P Smith

        Thank you Morgan. I did a select @@global.sql_mode and found that the ONLY_FULL_GROUP_BY option has been set, along with a number of other options. Only two of these are specified in my my.cnf file, so it looks like I've got some research to do to find out where this is being set. Thank for the reply.

  • http://karwin.blogspot.com Bill Karwin

    One problem is that MySQL's ONLY_FULL_GROUP_BY is more restrictive than the SQL:2003 standard.

    mysql> select id, description from invoice_line_items group by id;
    ERROR 1055 (42000): 'test.invoice_line_items.description' isn't in GROUP BY

    This should be allowed, because id -> description is a known functional dependency. That is, there's only one possible value for description for each group when grouping by id.

    Similar cases exist when doing joins. Say for example I group by a FK column, and my query joins to the parent referenced by the FK. All columns in the parent table should be fair game, and should not violate the single-value rule.

    select i.product_id, p.description
    from invoice_line_items i
    join products p on i.product_id = p.id
    group by i.product_id;

    • http://www.tocker.ca/ Morgan Tocker
    • http://www.jooq.org Lukas Eder

      I've actually been trying to decipher the SQL standard to see if this is really true, and I personally don't think it should be. Allowing for these corner cases seems like a bad idea in the context of:

      - GROUPING SETs
      - derived tables
      - outer joins

      • http://karwin.blogspot.com Bill Karwin

        Hmm. I'm not able to find a clear rule right now. The only thing I found contradicts what I think should be the rule.

        In Annex F, table 35 for mandatory features, feature ID E051-02 says, "With the restrictions that the shall contain all non-aggregated columns in the and that any column in the shall also appear in the "

        To me, "all non-aggregated columns" means that if I want to include p.description in my example above, I must use "GROUP BY i.product_id, p.description". In other words, no allowance is made for columns that are functionally dependent on the grouping columns.

        However, in 7.9 Group By, General Rule 3a: "If [a reference CR to a column that is functionally dependent on the grouping columns] is a group-invariant column reference, then it is a reference to the common value in that column of the rows in that group."

        Which seems to say that it's okay to include columns that are not grouping columns, as long as they have only one value within the group. It goes on to say that columns that are within-group-varying must be used in the argument to a set function.