This morning, a colleague told, MySQL master data from the database is inconsistent, the standby database guess there is a problem in the synchronization process, therefore, prepared by the library board, use mysql> show slave status G Show, indeed, prepared by the library in the insert statement for violation of a primary key constraint, leading to stop by the library synchronization. Now the question is clear, it is how to recover from a master repository data consistency.
Options are as follows:
A view of the Position latest Master, Slave as a starting point of replication.
This idea is reflected in inconsistent past let bygones be bygones, we can now keep pace. It seems that this idea has been violated and restore the master database from the original intention of the consistency of the data, but this method is simple, efficient and can be used in a test environment, historical data less demanding scenes.
Second, the recovery must be strictly from the master repository data consistency.
Here, there are two ideas:
1. Back up the master database data and recovery from the library, the historical data consistency on the basis of open simultaneously, but this method is too much trouble, you must lock the table to perform operations in the main library, stop the client to update table data operation, and in the case of large volumes of data, the backup is a time-consuming project. In fact, this method in actual production environments are rarely used.
2. Skip out related errors
In fact, this is not very strict, he said live, ready to say, skip the relevant transaction. In this case I am here today is to skip out insert statements because it violates the primary key constraint and failure.
How to skip related matters
First, stop the slave service
Two, SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
Third, open the slave service.
Here is a skipped transaction. Of course, you can also skip more than one transaction, but be careful, after all, you do not know what the transaction is skipped.
Recommendation: the above steps may be performed repeatedly, look carefully at the statement can not cause synchronization from the library. Sometimes, too much to prevent the transaction from the library, this method becomes slightly inefficient.
Main library can analyze the transaction log to determine the appropriate value of SQL_SLAVE_SKIP_COUNTER. Specific steps are as follows:
I. Executive show slave status G in the backup repository, confirm the following two parameters
According to the above two values of the parameters, view the transaction is currently hampered copied from the library, and after the transaction in the main library.
mysql> SHOW BINLOG EVENTS in 'mysql-bin.000217' from 673146776;
This is the view the log file mysql-bin.000217 transaction ID for all transactions 673 146 776 after.
Of course, SHOW BINLOG EVENTS usage is still quite flexible and can be in the following manner.
mysql> SHOW BINLOG EVENTS in 'mysql-bin.000217' from 673146776 G
mysql> SHOW BINLOG EVENTS in 'mysql-bin.000217' from 673146776 limit 10;
In a hosted environment can also be viewed through mysqlbinlog command
# Mysqlbinlog mysql-bin.000217 --start-position = 673146776
How to check the implementation of the statement
After skipping related matters from the library, restart the Slave, Slave_IO_Running, Slave_SQL_Running two showed "YES", but Seconds_Behind_Master did not fall immediately, but slowly rising.
This time to check the show processlist statement execution threads, found the first statement execution time is too long, "State" column shows "Sending data". About "Sending data" meaning
Visible, the statement involves a large number of disk reads.
For further analysis of time-consuming distribution of the statement, set variable profiling. Proceed as follows:
First, before the start of the query, set the set profiling = on;
Second, after the statement is finished, view Query_ID statement by show profiles.
Third, by show profile for query Query_ID view of the specific implementation of the statement.
Finally found, the statement takes too long in Sending data phase.
to sum up:
1. In performing the stop slave, stop slave command is hang live, online query relevant information, may have long-Slave SQL or SQL execution Locked relevant here, in addition to show processlist outside, it is best not to perform show slave status and other slave and slave stop commands. So how to solve this problem? Slave SQL thread waiting for the lock end, or restart the database. I chose the latter.
2. In the process of restarting the standby database, there is a small section of episode, when executing start slave command reported the following error: ERROR 1872 (HY000): Slave failed to initialize relay log info structure from the repository. Lot of information online are recommended to reconfigure the master from the cluster, so back to the beginning of the program selection section. Strange, I closed the library from the restart, the good. The two start command only difference is that the previous start using mysqld, after first start using mysqld_safe, and more with a --user argument.