Faking a slave: Subscribing to mysql row-based-replication changes

In complex systems it’s often useful to be able to receive notification when rows have been modified in MySQL so that you can invalidate various external caches or indexes. For example: memcached, Sphinx, Lucene.

In the case of MySQL’s default statement-based replication this can be quite tricky to do, as it would likely require an SQL parser to determine what was intended to be modified. However, this is made much simpler with Row-based Replication (something I recommend switching to).

A C++ library exists to be able to listen to MySQL replication, but what I want to demonstrate is that it is also very simple to be able to do this with mysqlbinlog:


shell> mysqlbinlog --read-from-remote-server --stop-never --host localhost.localdomain --port 5616 -u msandbox -pmsandbox --verbose mysql_sandbox5616-bin.000004 | grep '^### '

To explain how this command works:

  • --read-from-remote-server tells mysqlbinlog to act like a slave, and fetch remote files rather than local (new option to 5.6).
  • --stop-never makes mysqlbinlog block waiting for a continual stream of updates, rather than exiting. Perfect!
  • --verbose rewrites row-based replication events to be pseudo SQL statements. The pseudo statements are very easy to parse. For example:
    ### DELETE FROM `test2`.`a`
    ### WHERE
    ###   @1=1
    ### INSERT INTO `test2`.`a`
    ### SET
    ###   @1=1
    ### UPDATE `test2`.`a`
    ### WHERE
    ###   @1=10
    ### SET
    ###   @1=20
    

    Noting that a multi-row statement will appear as individual statements via --verbose. Row-based events also default to sending all columns in the row, not just those that are changed.

  • The grep '^### ' statement is just a lazy way of stripping out everything except pseudo SQL statements.

Advanced Usage

You can fairly easily extend the above to track your progress reading through the master’s binary logs. This will make it easier to resume from where you left off if there is a crash. Just remove the grep, and keep track of:

  • Positional markers just before DML events. These are marked in bold here:
    # at 191
    #140526 15:28:27 server id 10  end_log_pos 239 CRC32 0x559a84a8     GTID [commit=yes]
    SET @@SESSION.GTID_NEXT= '9f0ce61c-bb92-11e3-89fd-f056da47d247:17'/*!*/;
    # at 239
    #140526 15:28:27 server id 10  end_log_pos 312 CRC32 0xff074c19     Query   thread_id=1 exec_time=0 error_code=0
    SET TIMESTAMP=1401143307/*!*/;
    BEGIN
    /*!*/;
    # at 312
    #140526 15:28:27 server id 10  end_log_pos 357 CRC32 0x74b1ad7f     Table_map: `test2`.`a` mapped to number 71
    # at 357
    #140526 15:28:27 server id 10  end_log_pos 397 CRC32 0x2c6f8b8d     Write_rows: table id 71 flags: STMT_END_F
    BINLOG '
    C8CDUxMKAAAALQAAAGUBAAAAAEcAAAAAAAEABXRlc3QyAAFhAAEDAAF/rbF0
    C8CDUx4KAAAAKAAAAI0BAAAAAEcAAAAAAAEAAgAB//4KAAAAjYtvLA==
    '/*!*/;
    ### INSERT INTO `test2`.`a`
    ### SET
    ###   @1=10
    
  • Events that show that the binary log file itself is being rotated. For example, the result of FLUSH LOGS is:
    # at 4
    #691231 16:00:00 server id 10  end_log_pos 0 CRC32 0x7800af55   Rotate to mysql_sandbox5616-bin.000006  pos: 4
    

Conclusion

This might not be as robust in all cases as using the C++ API, but it sure beats the alternative. Gone are my days of writing complex sets of triggers to write to an “events” table, which I poll continually from an external script.