Plan to deprecate PROCEDURE ANALYSE

In the MySQL team, we have been refactoring the SQL parser to be more maintainable. Gleb Shchepa lists the goals of this project in more details on the MySQL Server Team blog.

As part of this, we have identified the feature PROCEDURE ANALYSE as something that we would like to deprecate. For added context, here is a demonstration:

mysql> SELECT * FROM City procedure analyse()\G
*************************** 1. row ***************************
             Field_name: world.city.ID
              Min_value: 1
              Max_value: 4079
             Min_length: 1
             Max_length: 4
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 2040.0000
                    Std: 1177.5058
      Optimal_fieldtype: SMALLINT(4) UNSIGNED NOT NULL
*************************** 2. row ***************************
             Field_name: world.city.Name
              Min_value: A Coruña (La Coruña)
              Max_value: ´s-Hertogenbosch
             Min_length: 3
             Max_length: 33
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 8.5295
                    Std: NULL
      Optimal_fieldtype: VARCHAR(33) NOT NULL
*************************** 3. row ***************************
             Field_name: world.city.CountryCode
              Min_value: ABW
              Max_value: ZWE
             Min_length: 3
             Max_length: 3
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 3.0000
                    Std: NULL
      Optimal_fieldtype: ENUM('ABW','AFG','AGO','AIA','ALB','AND','ANT','ARE','ARG','ARM','ASM','ATG','AUS','AUT','AZE','BDI','BEL','BEN','BFA','BGD','BGR','BHR','BHS','BIH','BLR','BLZ','BMU','BOL','BRA','BRB','BRN','BTN','BWA','CAF','CAN','CCK','CHE','CHL','CHN','CIV','CMR','COD','COG','COK','COL','COM','CPV','CRI','CUB','CXR','CYM','CYP','CZE','DEU','DJI','DMA','DNK','DOM','DZA','ECU','EGY','ERI','ESH','ESP','EST','ETH','FIN','FJI','FLK','FRA','FRO','FSM','GAB','GBR','GEO','GHA','GIB','GIN','GLP','GMB','GNB','GNQ','GRC','GRD','GRL','GTM','GUF','GUM','GUY','HKG','HND','HRV','HTI','HUN','IDN','IND','IRL','IRN','IRQ','ISL','ISR','ITA','JAM','JOR','JPN','KAZ','KEN','KGZ','KHM','KIR','KNA','KOR','KWT','LAO','LBN','LBR','LBY','LCA','LIE','LKA','LSO','LTU','LUX','LVA','MAC','MAR','MCO','MDA','MDG','MDV','MEX','MHL','MKD','MLI','MLT','MMR','MNG','MNP','MOZ','MRT','MSR','MTQ','MUS','MWI','MYS','MYT','NAM','NCL','NER','NFK','NGA','NIC','NIU','NLD','NOR','NPL','NRU','NZL','OMN','PAK','PAN','PCN','PER','PHL','PLW','PNG','POL','PRI','PRK','PRT','PRY','PSE','PYF','QAT','REU','ROM','RUS','RWA','SAU','SDN','SEN','SGP','SHN','SJM','SLB','SLE','SLV','SMR','SOM','SPM','STP','SUR','SVK','SVN','SWE','SWZ','SYC','SYR','TCA','TCD','TGO','THA','TJK','TKL','TKM','TMP','TON','TTO','TUN','TUR','TUV','TWN','TZA','UGA','UKR','URY','USA','UZB','VAT','VCT','VEN','VGB','VIR','VNM','VUT','WLF','WSM','YEM','YUG','ZAF','ZMB','ZWE') NOT NULL
*************************** 4. row ***************************
             Field_name: world.city.District
              Min_value: Abhasia [Aphazeti]
              Max_value: –
             Min_length: 1
             Max_length: 20
       Empties_or_zeros: 4
                  Nulls: 0
Avg_value_or_avg_length: 9.0194
                    Std: NULL
      Optimal_fieldtype: VARCHAR(20) NOT NULL
*************************** 5. row ***************************
             Field_name: world.city.Population
              Min_value: 42
              Max_value: 10500000
             Min_length: 2
             Max_length: 8
       Empties_or_zeros: 0
                  Nulls: 0
Avg_value_or_avg_length: 350468.2236
                    Std: 723686.9870
      Optimal_fieldtype: MEDIUMINT(8) UNSIGNED NOT NULL
5 rows in set (0.01 sec)

ANALYSE() examines the result from a query and returns an analysis of the results that suggests optimal data types for each column that may help reduce table sizes.

Our justification for wanting to deprecate PROCEDURE ANALYSE is as follows:

  • There are no other uses of SELECT * FROM table PROCEDURE. This syntax is used exclusively by ANALYSE, and uses the UK English spelling.
  • The name PROCEDURE predates the addition of stored procedures as a MySQL feature. Ideally this feature would use a different name (CHANNEL?) to avoid confusion in usage. It also exists as an extension to the SQL standard.
  • There are numerous advantages to a feature similar to this being external to the MySQL server. The server must follow a stable release cycle, with core functionality being unchanged once it is declared GA. As an external tool, it is much easier to develop in an agile way, and provide new functionality without having to provide the same level of backward compatibility.

    By "external" I am implying that this could either be a script or as a view or stored procedure in MySQL. Shlomi has a good example of how to show auto_increment column capacity in common_schema!

Our current plan is to deprecate PROCEDURE ANALYSE in MySQL 5.7, for removal as soon as MySQL 5.8. We are inviting feedback from the MySQL Community and would like to hear from you if you use PROCEDURE ANALYSE. Please leave a comment, or get in touch!

Published by

morgo

I joined MySQL AB in 2006, left, and am now back at Oracle working on the MySQL team. I've also worked at Percona and InPowered.

  • http://karwin.blogspot.com Bill Karwin

    Agreed. This is a feature that virtually no one knows about, and those who do know about it don't use it. In fact, I thought this feature was already deprecated.

    Getting a report of the optimal data types for columns would be useful during schema design on a greenfield project, but this tool doesn't give a useful report until the table is populated with production data. By that time, it's too late for a DBA to recommend restructuring the table to make the columns more compact. Just imagine a DBA requesting a service outage window so he/she can run an ALTER TABLE to make all the VARCHAR(255) columns into VARCHAR(70).

    My understanding is that the PROCEDURE feature was intended to be a more general-purpose query post-filtering feature, for people who wanted to develop filter code in C++ and compile it into the server like a UDF. In theory, one could write a post-filtering routine to translate English text into French, or log access to certain kinds of results, or implement cube/rollup functionality. PROCEDURE ANALYSE was only a sample to demonstrate how to write the code hooks. But very few sites want to get involved with recompiling MySQL server, or developing their own code to run inside it. And the MySQL user culture does prefer to implement extended features outside the engine.

    • http://databaseblog.myname.nl/ Daniël van Eeden

      In my experience the reason for the (relative) low number of UDFs and plugins are:
      * The server crashes if your plugin/UDF crashes. Sharing the same process is ok for plugins which need high performance, but for many plugins this is not needed and they could benefit from having their own process with some IPC and not crashing the server.
      * There is no auto recompile (like DKMS), so this makes upgrading more complicated

  • Shlomi Noach

    Thanks for pointing to auto_increment_columns in common_schema! It actually just so happens that I pull-requested this view into the mysql-sys yesterday, with the intention of pushing more of common_schema into sys schema.

    https://github.com/MarkLeith/mysql-sys/pull/67

    • http://www.tocker.ca/ Morgan Tocker

      Awesome news. Thanks!

  • http://databaseblog.myname.nl/ Daniël van Eeden

    There are enough issues with this which could be solved by replacing this by a script and/or sys.
    1. The advice is not always correct (e.g. ENUM(<list of current values) )
    2. Some advice (pros/cons/reasoning) should be added
    3. This doesn't scale for bigger tables.
    4. It should give you the statements to actually do the changes.

  • Ike Walker

    I would like to see PROCEDURE ANALYSE go away. My biggest gripe with it is how often it suggests an "optimal field type" that ranges from debatable to just plain wrong. In particular it often suggests using the ENUM data type for columns where it is totally inappropriate just because the column has a relatively low cardinality.

  • Michael Widenius

    The purpose of PROCEDURE ANALYSE was not get the best CREATE TABLE definition, but get to understand better what the table contains.
    When I go to a customer and look at there tables to understand what they contain, I often use PROCEDURE ANALYZE to get a better grip of what kind of data they store in a table.
    It does help spot obvious faults in the database design (like storing YES/NO as strings in something that is better as bool field), but that's only part of why this feature is useful.

  • Tibor Sekelj

    So what will be the replacement for this feature if you remove it in mysql 8.0

    its like, you buy a new car and there is no display for gasoline level in fuel tank because someone deprecated it as it wasnt accurate enough.

    while the procedure wasnt accurate, it did show an overview of the data, so why not improving it and renaming it?