Estimating MySQL’s Working Set with information_schema

In an earlier post, I wrote about when MySQL loads data in and out of cache. At the end of this post, I talked about the concept of page churn, and having enough memory for your working set:

“If you do not have enough memory to hold all of your working set in memory, what will happen is that the buffer pool will start juggling as it churns pages out of memory only to load them back in soon after. A small amount of this is probably okay, but it can start to degrade performance. The traditional rule of thumb is called “the 5 minute rule“. This means that if you load a page into memory and are going to need it again in the next five minutes – it should not need to be churned out.”

To be able to calculate this in a precise way we would need to collect:
– The current number of pages in the buffer pool.
– The unique pages loaded in over a 5 minute period.

MySQL doesn’t have a hook to be able to tell when a page is loaded in memory, but as of MySQL 5.6 it does have an information_schema table called INNODB_BUFFER_PAGE. So while we will not be able to get the exacty number, polling this table on a frequent enough interval and storing the unique results (space_id + page_id) to a temporary table should get close.

So I wrote a stored procedure estimate_working_set.sql to do exactly this. It accepts two arguments; sleep time and iterations. Here is an example ~5 minute observation on my laptop:

mysql> call test.estimate_working_set(10, 30);
+----------+
| progress |
+----------+
| 1/30     |
+----------+
1 row in set (10.13 sec)

+----------+
| progress |
+----------+
| 2/30     |
+----------+
1 row in set (21.19 sec)

.. lines ommitted for brevity ..

+----------+
| progress |
+----------+
| 29/30    |
+----------+
1 row in set (5 min 41.97 sec)

+----------+
| progress |
+----------+
| 30/30    |
+----------+
1 row in set (5 min 54.72 sec)

+----------------------+
| pages_in_working_set |
+----------------------+
|               100679 |
+----------------------+
1 row in set (5 min 55.61 sec)

Query OK, 0 rows affected (5 min 55.71 sec)

So in my case my working set is 100679 pages. I am not using compressed pages so each page is 16 KiB, or divide by 64 to convert to MiB. 100679/64 = 1573MB working set. I have a 128M buffer pool – so I desperately need to increase it / add more memory.

Warning! This stored procedure will be expensive on a server with a large buffer pool. Use at your own risk!

Published by

morgo

I joined MySQL AB in 2006, left, and am now back at Oracle working on the MySQL team. I’ve also worked at Percona and InPowered.

  • Peter Laursen

    Why is exactly 5 minutes the ‘holy cow’?

    Wouldn’t that value depend on several factors? If I feel the overall responsiveness of my applications is the same with a pool with data from 1 or 2 minutes, I may prefer a lower setting (in order to get faster crash recovery, for instance).

    • It’s not exactly 5 minutes – this is a rule of thumb first formulated in 1985: http://en.wikipedia.org/wiki/Five-minute_rule

      In response to factors that change it, Mark Callaghan has talked about fast SSDs and how the miss path is now nowhere near as expensive. This means the punishment for not surviving 5 minutes is much lower, and that the 5 minute goal could possibly be lowered.

      Also – if you look at workload factors, advertising servers generally demand 50ms or less. If you are using hard drives where every IO is maybe 5ms, it’s easy to see with a few misses you are well on your way to missing your goal – whether or not you achieved the 5 minute rule.