Proposal to enable sql mode ONLY_FULL_GROUP_BY by default

We are considering enabling the SQL mode ONLY_FULL_GROUP_BY by default in MySQL 5.7. Here is a quick example to demonstrate how this change will impact you:

Sample Data:

CREATE TABLE invoice_line_items (id INT NOT NULL PRIMARY KEY auto_increment, 
invoice_id INT NOT NULL,
description varchar(100)
);

INSERT INTO invoice_line_items VALUES 
 (NULL, 1, 'New socks'),
 (NULL, 1, 'A hat'),
 (NULL, 2, 'Shoes'),
 (NULL, 2, 'T shirt'),
 (NULL, 3, 'Tie');

Default behaviour in MySQL 5.6 and below:

mysql> SELECT id, invoice_id, description 
FROM invoice_line_items GROUP BY invoice_id;
+----+------------+-------------+
| id | invoice_id | description |
+----+------------+-------------+
|  1 |          1 | New socks   |
|  3 |          2 | Shoes       |
|  5 |          3 | Tie         |
+----+------------+-------------+
3 rows in set (0.00 sec)

Proposed default behaviour in MySQL 5.7:

mysql> SELECT id, invoice_id, description 
FROM invoice_line_items GROUP BY invoice_id;
ERROR 1055 (42000): 'test.invoice_line_items.id' isn't in GROUP BY

That is to say that in 5.7 this statement will produce an error.

Notice that columns id and description are not actually unique? Roland Bouman has a blog post from 2007 that describes what MySQL is doing here, and how it is different in behaviour to other databases.

To summarize: MySQL is picking one of the possible values for id and description and the query executes in a non deterministic way. Here’s some more information to prove this:

mysql> SELECT id, invoice_id, description, max(description) 
FROM invoice_line_items GROUP BY invoice_id;
+----+------------+-------------+------------------+
| id | invoice_id | description | max(description) |
+----+------------+-------------+------------------+
|  1 |          1 | New socks   | New socks        |
|  3 |          2 | Shoes       | T shirt          |
|  5 |          3 | Tie         | Tie              |
+----+------------+-------------+------------------+
3 rows in set (0.00 sec)

mysql> SELECT GROUP_CONCAT(id), invoice_id, GROUP_CONCAT(description) 
FROM invoice_line_items GROUP BY invoice_id;
+------------------+------------+---------------------------+
| GROUP_CONCAT(id) | invoice_id | GROUP_CONCAT(description) |
+------------------+------------+---------------------------+
| 1,2              |          1 | New socks,A hat           |
| 3,4              |          2 | Shoes,T shirt             |
| 5                |          3 | Tie                       |
+------------------+------------+---------------------------+
3 rows in set (0.00 sec)

What is problematic about this behaviour is that it can hide bugs in application code. To highlight two blog posts:

The Proposal

We are proposing to change ONLY_FULL_GROUP_BY to be enabled by default as of MySQL 5.7. The downside in doing this, is that some users upgrading will have statements return errors. These users will be left with two options:

  1. Set sql_mode in their my.cnf (or my.ini) file explicitly, or as part of their application code since sql_mode is configurable per session.
  2. Modify the offending GROUP BY statements to be deterministic.

On the second point, we are also proposing introducing an ANY_VALUE() SQL function to allow statements that are ‘acceptable as non-deterministic’ to run easy retrofit in this less permissive configuration. For example:

mysql> SELECT ANY_VALUE(id) AS id, invoice_id, 
ANY_VALUE(description) AS description 
FROM invoice_line_items GROUP BY invoice_id;
+------+------------+-------------+
| id   | invoice_id | description |
+------+------------+-------------+
| 1    |          1 | New socks   |
| 3    |          2 | Shoes       |
| 5    |          3 | Tie         |
+------+------------+-------------+
3 rows in set (0.01 sec)

Why I personally like this proposal, is that I compare it to buying an undercooked burger in a North American restaurant. We are proposing to switch our policy from “we may serve it to you without you having realized you ordered it” to “sign a waiver, then we will serve it”.

i.e. We are not removing options, but want you to have to opt-in to what can be an unsafe choice, rather than opt-out.

Fans of standard SQL will rejoice at compatibility with SQL-2003, a standard defined after this behavior was first implemented.

Conclusion

As with other changes that have been proposed for MySQL 5.7, we are soliciting your feedback!

  • Have you accidentally returned incorrect data, and could have benefited from having ONLY_FULL_GROUP_BY enabled?
  • Do you maintain an application that relies on the non deterministic behaviour?
  • Do you agree or disagree that this change is the better default for users?

Please leave a comment, or get in touch!

Update: This proposal has gone ahead. MySQL 5.7 DMR5 has ONLY_FULL_GROUP_BY enabled by default!

Published by

morgo

I joined MySQL AB in 2006, left, and am now back at Oracle working on the MySQL team. I’ve also worked at Percona and InPowered.

  • Pingback: MySQL, GROUP BY, Select lists, and Standard SQL | SQL And Its Sequels()

  • Gary P Smith

    Hi Morgan,
    I ran across your blog and this topic as I’ve been researching an issue related to this and an open source project called XBMC. In answer to your questions, I believe that FULL_GROUP_BY_ONLY should be the default, and I can’t think of any reasons why I wouldn’t want it.
    However, I do have a reason why I would like to have the option of turning it off. I would prefer not to need it, but the XBMC team has some code in their application that is not SQL-2003 compliant. I will be creating a ticket in hopes that they can clean up their code (they develop using SQLite3). In the mean time, I would like to configure my MySQL v5.6 instance to accept their problematic code.
    I’m a bit confused as to when the default of FULL_GROUP_BY_ONLY has been established. It looks like it is already the default for version 5.6. My SQL MODE parameters only include the NO_ENGINE_SUBSTITUTION and STRICT_TRANS_TABLES options. It does not include the FULL_GROUP_BY_ONLY option. Yet my database is enforcing the columns be listed in the GROUP BY clauses as though it had been included.
    So, now I’m stuck. I can’t change their code (at least not willing to become an open source developer to do it), and I can’t change the behavior of MySQL v5.6 to allow the “loose” group by SQL statements to work.
    Is there an option in 5.6 that I’m unaware of that I can use to turn this off?
    I hope you put an option in 5.7 to allow us to turn it off.
    My feedback,
    Gary P Smith

    • Hi Gary,

      What can happen, is that individual sessions override and change the SQL Mode. I can confirm that ONLY_FULL_GROUP_BY is not set by default in 5.6, but the option to set it dates back to MySQL 5.0 (2005).

      To answer your question: we will retain the ability to change the defaults and not set ONLY_FULL_GROUP_BY if an application is incompatible with it.

      • Gary P Smith

        Thank you Morgan. I did a select @@global.sql_mode and found that the ONLY_FULL_GROUP_BY option has been set, along with a number of other options. Only two of these are specified in my my.cnf file, so it looks like I’ve got some research to do to find out where this is being set. Thank for the reply.

  • One problem is that MySQL’s ONLY_FULL_GROUP_BY is more restrictive than the SQL:2003 standard.

    mysql> select id, description from invoice_line_items group by id;
    ERROR 1055 (42000): ‘test.invoice_line_items.description’ isn’t in GROUP BY

    This should be allowed, because id -> description is a known functional dependency. That is, there’s only one possible value for description for each group when grouping by id.

    Similar cases exist when doing joins. Say for example I group by a FK column, and my query joins to the parent referenced by the FK. All columns in the parent table should be fair game, and should not violate the single-value rule.

    select i.product_id, p.description
    from invoice_line_items i
    join products p on i.product_id = p.id
    group by i.product_id;

    • I’ve actually been trying to decipher the SQL standard to see if this is really true, and I personally don’t think it should be. Allowing for these corner cases seems like a bad idea in the context of:

      – GROUPING SETs
      – derived tables
      – outer joins

      • Hmm. I’m not able to find a clear rule right now. The only thing I found contradicts what I think should be the rule.

        In Annex F, table 35 for mandatory features, feature ID E051-02 says, “With the restrictions that the shall contain all non-aggregated columns in the and that any column in the shall also appear in the ”

        To me, “all non-aggregated columns” means that if I want to include p.description in my example above, I must use “GROUP BY i.product_id, p.description”. In other words, no allowance is made for columns that are functionally dependent on the grouping columns.

        However, in 7.9 Group By, General Rule 3a: “If [a reference CR to a column that is functionally dependent on the grouping columns] is a group-invariant column reference, then it is a reference to the common value in that column of the rows in that group.”

        Which seems to say that it’s okay to include columns that are not grouping columns, as long as they have only one value within the group. It goes on to say that columns that are within-group-varying must be used in the argument to a set function.

        • Very interesting, thanks for digging that up. I’m intrigued by the sentence that “any column in the group by clause shall also appear in the select list”. That doesn’t seem to make sense…

  • tongshan

    Thanks

  • simon

    With ONLY_FULL_GROUP_BY set to true, will the second example query above (the one with GROUP_CONCAT) also error? We often use GROUP_CONCAT in a similar way — ie to query for a sub-collection of ID’s or other values. I’d very much like something like ONLY_FULL_GROUP_BY set to true to prevent buggy queries from getting into the code but it doesn’t help if that setting complains when doing sub-collection queries.

    Also, as an aside, many of my teammates dislike using GROUP_CONCAT to accomplish this task of querying for collections within collections but it’s the most performant way I’ve found of pulling this data out of MySQL. Do you know of a better or even less hacky way of doing this?

    • simon

      Ah, just checked. It looks like the answer to my first question is NO, the GROUP_CONCAT query is still totally kosher.

      I’d still be interested to hear if you have suggestions for better ways of accomplishing the sub-collection query though!

  • mohamad bathulah

    Do you agree or disagree that this change is the better default for users?
    I don’t agree because a lot of my sql operation rely on the previous default (not ONLY_FULL_GROUP_BY ) setting, additionally a lot of complex sql that i already done using previous default (not ONLY_FULL_GROUP_BY ) setting

  • Harshendra Baghel

    Any one help me to resolve my issue which is generated on production server its working staging MYSQL Server version: 5.5.47 perfect but when site is on production MYSQL Server version 5.7.x version produce an error when we have clicked on menu. some menu working fine but few produce error …. Which is mentioned below.

    PDOException: SQLSTATE[42000]: Syntax error or access violation: 1055 Expression #6 of SELECT list is not in GROUP BY clause and contains nonaggregated column ‘SmithGeiger.a.episode_id’ which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by: SELECT u.uid AS uid, u.name AS name, u.login AS login, u.mail AS mail, u.name AS u_name, a.episode_id AS episode_id, f.field_first_name_value AS field_first_name_value, l.field_last_name_value AS field_last_name_value, COUNT(DISTINCT(a.episode_id)) AS showcount FROM {users} u LEFT OUTER JOIN {field_data_field_first_name} f ON f.entity_id = u.uid LEFT OUTER JOIN {field_data_field_last_name} l ON l.entity_id = u.uid LEFT OUTER JOIN {a1_orientation} a ON a.user_id = u.uid WHERE (u.uid :db_condition_placeholder_0) AND (name LIKE :db_condition_placeholder_1 ESCAPE ‘\’) GROUP BY u.uid ORDER BY uid ASC LIMIT 10 OFFSET 0; Array ( [:db_condition_placeholder_0] => 0 [:db_condition_placeholder_1] => %% ) in PagerDefault->execute() (line 79 of /Web/SmithGeiger/includes/pager.inc).

    could you please provide solution how to fixed it.

    • Have a look at the other comments. I guess you will have to configure your database changing the sql_mode. There was some changing in the default settings I think. Good luck!

  • jason

    Thank you for your post
    myfreedownloadgames.com