server:Intel E5 48 Cores,256GB memory(system used 180GB before crashed), 8T disk
version:mysql  Ver 14.14 Distrib 5.7.16, for Linux (x86_64)
OS:redhat7.2 机器启用一个多月没有崩溃过,开始只有100GB数据,现在400多个schema占1300GB,配了主从复制,这是主库,关闭主从也会crash。
从监控来看,每当内存占用超过160G(绿色是剩余空间),就crash并自动重启,重启要花1小时,正常重启几分钟。
附日志和配置文件2017-05-24T20:58:06.959318Z 3991095 [Note] Aborted connection 3991095 to db: 'ws2015121700001' user: 'root' host: '10.XXX' (Got timeout writing communication packets)2017-05-25T05:59:45.648070Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4307ms. The settings might not be optimal. (flushed=690 and evicted=0, during the time.)以上两种错误在过去几天频发宕机时有以下日志,两次都是这样05:59:58 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.key_buffer_size=209715200
read_buffer_size=131072
max_used_connections=224
max_threads=1000
thread_count=94
connection_count=94
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 205146143 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0xef182b]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x7ad8c1]
/lib64/libpthread.so.0(+0xf100)[0x7f80ffa60100]
/usr/sbin/mysqld(_Z22row_merge_drop_indexesP5trx_tP12dict_table_tm+0xa6)[0x100a0d6]
/usr/sbin/mysqld(_Z32dict_table_remove_from_cache_lowP12dict_table_tm+0x4af)[0x114e34f]
/usr/sbin/mysqld(_Z23dict_make_room_in_cachemm+0x17f)[0x114e62f]
/usr/sbin/mysqld[0x105e945]
/usr/sbin/mysqld(srv_master_thread+0xa1f)[0x106374f]
/lib64/libpthread.so.0(+0x7dc5)[0x7f80ffa58dc5]
/lib64/libc.so.6(clone+0x6d)[0x7f80fe5141cd]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
2017-05-25T06:00:21.917264Z 0 [Warning] Insecure configuration for --secure-file-priv: Location is accessible to all OS users. Consider choosing a different directory.
2017-05-25T06:00:21.917499Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.16-log) starting as process 40549 ...
2017-05-25T06:00:21.953825Z 0 [Note] InnoDB: PUNCH HOLE support available
2017-05-25T06:00:21.953880Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-05-25T06:00:21.953900Z 0 [Note] InnoDB: Uses event mutexes
2017-05-25T06:00:21.953907Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2017-05-25T06:00:21.953911Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.3
2017-05-25T06:00:21.953914Z 0 [Note] InnoDB: Using Linux native AIO
2017-05-25T06:00:21.959640Z 0 [Note] InnoDB: Number of pools: 1
2017-05-25T06:00:21.959783Z 0 [Note] InnoDB: Using CPU crc32 instructions
2017-05-25T06:00:21.961859Z 0 [Note] InnoDB: Initializing buffer pool, total size = 100G, instances = 1, chunk size = 128M
2017-05-25T06:00:28.901192Z 0 [Note] InnoDB: Completed initialization of buffer pool
2017-05-25T06:00:30.068043Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2017-05-25T06:00:30.135765Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2017-05-25T06:00:30.912936Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 18005318784106------------最新一次日志
(此条有大量)2017-05-27T16:31:54.423593Z 1508917 [Warning] InnoDB: Trying to close/delete/truncate tablespace 'ws2016012900002/tb_index_128bb5a8a9e5' but there are 1 pending operations on it.
2017-06-05T18:25:52.517598Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
InnoDB: If you get repeated assertion failures or crashes, even
As this is a crash and something is definitely wrong, the information
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
2017-06-05T18:27:02.053447Z 0 [Note] InnoDB: Starting crash recovery.
2017-06-05T19:36:35.518932Z 0 [Note] Recovering after a crash using mysql-bin
 14 15 16 172017-06-05T19:36:37.168930Z 0 [Note] Starting crash recovery...
---------------------------[mysqld]
port = 3306
datadir=/data/mysqldata/3306/data
tmpdir=/data/mysqltmp
socket=/data/mysqldata/3306/mysql/mysql_3306.sock
pid-file=/data/mysqldata/3306/mysql/mysqld_3306.pid
slow_query_log_file = /data/mysqldata/3306/log/mysql-slow.log
log-error=/data/mysqldata/3306/log/mysqld.err
user=mysql
sql_mode=NO_ENGINE_SUBSTITUTION,NO_AUTO_CREATE_USER
character_set_server=utf8
collation-server=utf8_general_ciserver-id=2
read-only=0
log_slave_updates=1explicit_defaults_for_timestamp=true
skip-ssl
skip-name-resolve
#slowquerylog
slow_query_log=1
long_query_time=15
#log-queries-not-using-indexes
secure-file-priv=/data/mysqldata/3306/mysqlinnodb_data_home_dir=/data/mysqldata/3306/data
innodb_data_file_path=ibdata1:2898264064;ibdata2:2898264064:autoextend
innodb_lock_wait_timeout=50
innodb_file_per_table=1
innodb_log_file_size=1024M
innodb_log_files_in_group = 3
innodb_open_files=1024000
innodb_buffer_pool_size=102400M
innodb_buffer_pool_instances=1  改过4,依旧
#innodb_additional_mem_pool_size=20M
innodb_log_buffer_size=8M
join_buffer_size=80M
sort_buffer_size=200M 降为80M后继续崩溃
innodb_checksum_algorithm=NONE
innodb_doublewrite=0
innodb_support_xa=1
innodb_thread_concurrency=0
innodb_flush_log_at_trx_commit=2
innodb_max_dirty_pages_pct=50
innodb_use_native_aio=1
innodb_stats_persistent = 1
innodb_spin_wait_delay= 6
innodb_adaptive_flushing = 1
innodb_flush_neighbors = 0
innodb_read_io_threads = 4
innodb_write_io_threads = 4
innodb_io_capacity = 4000
innodb_purge_threads=1
innodb_adaptive_hash_index=0
#innodb_force_recovery=1lock_wait_timeout=60max_connections=1000log_bin=mysql-bin
binlog_format=mixed
max_binlog_size=1073741824
log_bin_trust_function_creators=1
expire_logs_days=15 是不是过大了? 占了2T
low_priority_updates=1  
key_buffer_size=200M
table_open_cache = 8000
back_log=1500
query_cache_type=0
thread_cache_size = 300
table_open_cache_instances=16
lower_case_table_names=1symbolic-links=0max_allowed_packet=512M[mysql]
socket=/data/mysqldata/3306/mysql/mysql_3306.sock[client]
socket=/data/mysqldata/3306/mysql/mysql_3306.sock

解决方案 »

  1.   

    https://stackoverflow.com/questions/24860111/warning-a-long-semaphore-waitInnodb_adaptive_hash_index=0 and a restart solved the problem.这个参数已经是OFF了-----------------------------------------
    http://blog.csdn.net/wulantian/article/details/37560849
    晚上尝试下
    echo "kernel.sem=250 32000 100 128" >>/etc/sysctl.conf 
    sysctl -p 
    cat /proc/sys/kernel/sem
    systemctl restart mysqld
      

  2.   

    crash时间点没有规律,一次中午,一次下午,两次半夜。
    实际线程数200多,平时数据变化较为频繁,晚上有很多数据同步。