有一主一从,从库主从同步卡住了,延迟好久了。
主从同步卡住的SQL是删除一个几千万的表里面十万条数据,在主库上很快就完成了,从库上就卡住了。
我按网上的方法,修改了下面两个参数:
set global sync_binlog=20 ;
set global  innodb_flush_log_at_trx_commit=2;
各种参数值搭配都不行,也重启了。最后我干脆删了从库上的同步数据文件(master.info,relay-log.info,mysql-relay-bin.000xxx等),重新搭了主从同步,还是卡住!没有办法了,哪位高手能指点一下么?谢谢!下面是各种数据:
mysql> show slave  status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.3.112
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.001276
          Read_Master_Log_Pos: 449567705
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 279
        Relay_Master_Log_File: mysql-bin.001246
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: mysql
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 332146441
              Relay_Log_Space: 16431867934
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 98412
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 31123306
                  Master_UUID: 29c7258c-bee8-11e7-b50f-00163e0ac3db
             Master_Info_File: /data/data7-209/mysql/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Reading event from the relay log
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
 

解决方案 »

  1.   

    mysql> show full processlist;
    +----+-------------+-----------+------+---------+-------+----------------------------------+-----------------------+
    | Id | User        | Host      | db   | Command | Time  | State                            | Info                  |
    +----+-------------+-----------+------+---------+-------+----------------------------------+-----------------------+
    |  2 | root        | localhost | test | Query   |     0 | init                             | show full processlist |
    |  8 | system user |           | NULL | Connect |  6361 | Waiting for master to send event | NULL                  |
    |  9 | system user |           | NULL | Connect | 98710 | Reading event from the relay log | NULL                  |
    +----+-------------+-----------+------+---------+-------+----------------------------------+-----------------------+没有什么进程在用。
      

  2.   

    查看master上的binlog内容:
    [root@mysql-3-112 log]# /usr/local/mysql/bin/mysqlbinlog /data/data3-112/mysql/log/mysql-bin.001246 --start-position=332146441|more
    /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
    /*!40019 SET @@session.max_insert_delayed_threads=0*/;
    /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
    DELIMITER /*!*/;
    # at 4
    #171113 13:54:17 server id 31123306  end_log_pos 120  Start: binlog v 4, server v 5.6.13-log created 171113 13:54:17
    BINLOG '
    iTMJWg9q59oBdAAAAHgAAAAAAAQANS42LjEzLWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXAAEGggAAAAICAgCAAAACgoKGRkAAEij
    lGk=
    '/*!*/;
    # at 332146441
    #171113 14:32:02 server id 31123306  end_log_pos 332146521  Query thread_id=174103 exec_time=0 error_code=0
    SET TIMESTAMP=1510554722/*!*/;
    SET @@session.pseudo_thread_id=174103/*!*/;
    SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
    SET @@session.sql_mode=524288/*!*/;
    SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
    /*!\C utf8 *//*!*/;
    SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=28/*!*/;
    SET @@session.time_zone='SYSTEM'/*!*/;
    SET @@session.lc_time_names=0/*!*/;
    SET @@session.collation_database=DEFAULT/*!*/;
    BEGIN
    /*!*/;
    # at 332146521
    #171113 14:32:02 server id 31123306  end_log_pos 332146596  Table_map: `shreport`.`stat_report_daydata_shop` mapped to number 923
    # at 332146596
    #171113 14:32:02 server id 31123306  end_log_pos 332154792  Delete_rows: table id 923
    # at 332154792
    #171113 14:32:02 server id 31123306  end_log_pos 332162988  Delete_rows: table id 923
    # at 332162988
    #171113 14:32:02 server id 31123306  end_log_pos 332171184  Delete_rows: table id 923
    # at 332171184
    #171113 14:32:02 server id 31123306  end_log_pos 332179380  Delete_rows: table id 923
    # at 332179380
    #171113 14:32:02 server id 31123306  end_log_pos 332187576  Delete_rows: table id 923
      

  3.   

    查看系统进程,mysqld的cpu100%
    [root@mysql-7-209 data]# top
    top - 18:00:29 up 13 days,  2:45,  3 users,  load average: 1.04, 1.07, 1.05
    Tasks: 139 total,   2 running, 137 sleeping,   0 stopped,   0 zombie
    %Cpu0  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu1  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    KiB Mem : 65809540 total, 34208864 free, 11423024 used, 20177652 buff/cache
    KiB Swap:        0 total,        0 free,        0 used. 53823636 avail Mem   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                   
    14293 mysql     20   0 28.934g 0.010t   6864 S 100.9 16.5  93:34.50 mysqld                                                                                                                    
        1 root      20   0   43380   3816   2432 S   0.0  0.0   0:10.37 systemd                                                                                                                   
        2 root      20   0       0      0      0 S   0.0  0.0   0:00.16 kthreadd                                                                                                                  
        3 root      20   0       0      0      0 S   0.0  0.0   0:00.47 ksoftirqd/0                                                                                                               
        7 root      rt   0       0      0      0 S   0.0  0.0   0:01.30 migration/0                                                                                                               
        8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh                                                                                                                    
        9 root      20   0       0      0      0 S   0.0  0.0   1:22.29 rcu_sched     
      

  4.   

    Read_Master_Log_Pos: 449567705
    Exec_Master_Log_Pos: 332146441
    --------------------------------------------  这两个是差很多啊,你是怎么重建的? 数据没有根据主库的重建?
      

  5.   

    当然,如果你能确定删除的 POS 的话,你可以通过调整 MASTER_LOG_POS 把删除的操作跳过,然后手工在 SLAVE 上把这些数据删除
      

  6.   

    下面一堆pos,下一个POS点不好找啊。。
      

  7.   

    可以选择不找,直接手工把从库应该删除的记录手工删除,这样同步肯定出错,然后通过设置 sql_slave_skip_counter 跑过错误
    当然,这个要求你的删除在单独的事物中,因为这个 skip 是跑过一个事务,如果你在一个事物中包含了 delete 之外的操作,那么这些操作同样会跳过
      

  8.   

    那你就找原因呗,比如有没有 block, 服务器是否性能不够,手工删除测试晃是同样慢等