MySQL主备复制集群跳过错误方法(GTID) 原创 数据库 2022年6月2日 16:13 夏至未至 952 当前内容 14345 字,在路上,马上到,马上到 ## 目录 [TOC] ## 主备复制分类 主从复制主要分为传统复制,和gtid复制方式。传统复制主要依赖 `binlog` 和 `pos` 来实现,而 `gtid` 则只需要 `gtid` 即可。无论那种复制,一旦发生复制报错,那主从关系就可能异常,我们除了修复这种异常,让集群关系正常外,还需要了解如何跳过错误,让集群关系快速正常。 ### 传统复制 在从节点上执行如下命令,意为跳过两个事务(其中就包含你想要跳过的哪个错误事务): mysql> stop slave; mysql> set global sql_slave_skip_counter=2; mysql> start slave; ### gtid复制 #### 问题复现 重点验证下 `gtid` 复制报错,如何跳过,如下为一个正常 `gtid` 复制的表,在从实例上删除表,然后主实例中对表插入数据,以复现复制报错: 从:192.168.78.189 3306,上删除表 test: MySQL [gtid_test]> show tables; +---------------------+ | Tables_in_gtid_test | +---------------------+ | test | +---------------------+ 1 row in set (0.00 sec) MySQL [gtid_test]> MySQL [gtid_test]> drop table test; Query OK, 0 rows affected (0.01 sec) MySQL [gtid_test]> show tables; Empty set (0.00 sec) MySQL [gtid_test]> 主:192.168.78.188 3308 上对正在复制的表 test 插入 sql: MySQL [(none)]> show databases; +--------------------+ | Database | +--------------------+ | gtid_test | | information_schema | | mysql | | ning | | performance_schema | | sys | +--------------------+ 6 rows in set (0.01 sec) MySQL [(none)]> use gtid_test; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed MySQL [gtid_test]> show tables; +---------------------+ | Tables_in_gtid_test | +---------------------+ | test | +---------------------+ 1 row in set (0.00 sec) MySQL [gtid_test]> insert into test values (2); Query OK, 1 row affected (0.00 sec) MySQL [gtid_test]> 然后,在从上查看同步状态: MySQL [gtid_test]> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for source to send event Master_Host: 192.168.78.188 Master_User: gtid_copy Master_Port: 3308 Connect_Retry: 60 Master_Log_File: mysql-bin.000002 Read_Master_Log_Pos: 667 Relay_Log_File: 67bf0b455536-relay-bin.000004 Relay_Log_Pos: 598 Relay_Master_Log_File: mysql-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1146 Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '1d9b633d-e1ae-11ec-8ece-0242ac110002:15' at master log mysql-bin.000002, end_log_pos 636. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any. Skip_Counter: 0 Exec_Master_Log_Pos: 382 Relay_Log_Space: 1140 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1146 Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '1d9b633d-e1ae-11ec-8ece-0242ac110002:15' at master log mysql-bin.000002, end_log_pos 636. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any. Replicate_Ignore_Server_Ids: Master_Server_Id: 188 Master_UUID: 1d9b633d-e1ae-11ec-8ece-0242ac110002 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 220602 07:47:50 Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-15 Executed_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-14, f19b117a-cf6d-11ec-9fd7-0242ac110002:1 // 此为从节点删除表执行的事务 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0 Network_Namespace: 1 row in set, 1 warning (0.00 sec) ERROR: No query specified MySQL [gtid_test]> 可以看出,复制已经报错,而: Retrieved_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-15 Executed_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-14, 意思是,从主接收 事务 1 到 15,但是只成功执行了 1 到 14 ,15 并没有执行成功,如果 15 没有成功的前提下,我们发现,后续的复制,也没法在进行,如下,主节点在新增一张表: MySQL [gtid_test]> create table next_test (a int); Query OK, 0 rows affected (0.02 sec) MySQL [gtid_test]> MySQL [gtid_test]> show tables; +---------------------+ | Tables_in_gtid_test | +---------------------+ | next_test | | test | +---------------------+ 2 rows in set (0.00 sec) MySQL [gtid_test]> 如下,可以看到,由于上一个 15 事务的失败,导致后续的 16 事务同样没有在从实例上执行: MySQL [gtid_test]> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for source to send event Master_Host: 192.168.78.188 Master_User: gtid_copy Master_Port: 3308 Connect_Retry: 60 Master_Log_File: mysql-bin.000002 Read_Master_Log_Pos: 872 Relay_Log_File: 67bf0b455536-relay-bin.000004 Relay_Log_Pos: 598 Relay_Master_Log_File: mysql-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1146 Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '1d9b633d-e1ae-11ec-8ece-0242ac110002:15' at master log mysql-bin.000002, end_log_pos 636. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any. Skip_Counter: 0 Exec_Master_Log_Pos: 382 Relay_Log_Space: 1345 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1146 Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '1d9b633d-e1ae-11ec-8ece-0242ac110002:15' at master log mysql-bin.000002, end_log_pos 636. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any. Replicate_Ignore_Server_Ids: Master_Server_Id: 188 Master_UUID: 1d9b633d-e1ae-11ec-8ece-0242ac110002 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 220602 07:47:50 Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-16 Executed_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-14, f19b117a-cf6d-11ec-9fd7-0242ac110002:1 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0 Network_Namespace: 1 row in set, 1 warning (0.01 sec) ERROR: No query specified MySQL [gtid_test]> MySQL [gtid_test]> MySQL [gtid_test]> show tables; Empty set (0.00 sec) MySQL [gtid_test]> 如下,15 16 号事务并未执行: Retrieved_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-16 Executed_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-14, #### 解决问题 ##### 方法一 我们验证一种常用的方法,在从节点上跳过错误事务: 1. 停止slave进程 mysql> STOP SLAVE; 2. 设置事务号,事务号从 Retrieved_Gtid_Set 获取,在session里设置gtid_next,即跳过这个GTID mysql> SET @@SESSION.GTID_NEXT= '1d9b633d-e1ae-11ec-8ece-0242ac110002:15' 3. 设置空事物 mysql> BEGIN; COMMIT; 4. 恢复自增事物号 mysql> SET SESSION GTID_NEXT = AUTOMATIC; 5. 启动slave进程 mysql> START SLAVE; 如下验证可得,跳过了15 插入的那条 sql 操作,而之后的 16 建表操作也就成功执行了: MySQL [gtid_test]> STOP SLAVE; Query OK, 0 rows affected, 1 warning (0.00 sec) MySQL [gtid_test]> SET @@SESSION.GTID_NEXT= '1d9b633d-e1ae-11ec-8ece-0242ac110002:15'; Query OK, 0 rows affected (0.01 sec) MySQL [gtid_test]> BEGIN; COMMIT; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) MySQL [gtid_test]> SET SESSION GTID_NEXT = AUTOMATIC; Query OK, 0 rows affected (0.00 sec) MySQL [gtid_test]> START SLAVE; Query OK, 0 rows affected, 1 warning (0.01 sec) MySQL [gtid_test]> MySQL [gtid_test]> MySQL [gtid_test]> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for source to send event Master_Host: 192.168.78.188 Master_User: gtid_copy Master_Port: 3308 Connect_Retry: 60 Master_Log_File: mysql-bin.000002 Read_Master_Log_Pos: 872 Relay_Log_File: 67bf0b455536-relay-bin.000005 Relay_Log_Pos: 460 Relay_Master_Log_File: mysql-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 872 Relay_Log_Space: 1865 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 188 Master_UUID: 1d9b633d-e1ae-11ec-8ece-0242ac110002 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-16 Executed_Gtid_Set: 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-16, f19b117a-cf6d-11ec-9fd7-0242ac110002:1 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0 Network_Namespace: 1 row in set, 1 warning (0.00 sec) ERROR: No query specified MySQL [gtid_test]> show tables; +---------------------+ | Tables_in_gtid_test | +---------------------+ | next_test | +---------------------+ 1 row in set (0.01 sec) MySQL [gtid_test]> ##### 方法二 重置 master 方法跳过错误: mysql> STOP SLAVE; mysql> RESET MASTER; mysql> SET @@GLOBAL.GTID_PURGED ='1d9b633d-e1ae-11ec-8ece-0242ac110002:1-16' mysql> START SLAVE; 上面这些命令的意思是,忽略 1d9b633d-e1ae-11ec-8ece-0242ac110002:1-16 这个 GTID 事务,下一次事务接着从 17 这个GTID开始,即可跳过上述错误。 本文标题: MySQL主备复制集群跳过错误方法(GTID) 本文作者: 夏至未至 发布时间: 2022年6月2日 16:13 最近更新: 2022年7月8日 15:48 原文链接: 许可协议: 署名-非商业性-禁止演绎 4.0 国际(CC BY-NC-ND 4.0) 请按协议转载并保留原文链接及作者 GTID(3) 上一个 JS动态生成二维码 下一个 MySQL以GTID方式搭建主备集群 当前文章评论暂未开放,请移步至留言处留言。