28日,linux服务器出现死机问题。
free查看信息:
             total       used       free     shared    buffers     cached
Mem:       7901184    5132912    2768272          0     223468    3557096
-/+ buffers/cache:    1352348    6548836
Swap:     27647824      94796   27553028
top查看信息(这是在服务器正常时查询到的,不是服务器有问题时的状态):
top - 10:43:46 up 44 days,  2:35,  2 users,  load average: 0.88, 0.62, 0.42
Tasks: 294 total,  10 running, 283 sleeping,   0 stopped,   1 zombie
Cpu(s):  2.0%us,  0.3%sy,  0.0%ni, 97.5%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   7901184k total,  5645808k used,  2255376k free,   224320k buffers
Swap: 27647824k total,    94744k used, 27553080k free,  4045212k cached  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                              
11505 root      15   0 11328 4464 2696 R  8.3  0.1   0:29.83 sshd                                                                                                                 
32109 root      19   0 1794m 459m 8336 S  3.7  6.0  16:21.62 java                                                                                                                 
12480 oracle    15   0 2141m  73m  69m R  2.3  1.0   0:02.18 oracle                                                                                                               
12401 oracle    15   0 2143m  63m  57m R  1.3  0.8   0:03.49 oracle                                                                                                               
12476 oracle    18   0 2143m  46m  41m R  1.3  0.6   0:00.29 oracle                                                                                                               
12395 oracle    23   0 2142m  69m  64m R  0.7  0.9   0:03.46 oracle                                                                                                               
12420 oracle    23   0 2140m  61m  58m S  0.3  0.8   0:01.77 oracle                                                                                                               
12437 oracle    20   0 2140m  62m  59m S  0.3  0.8   0:01.46 oracle                                                                                                               
12486 oracle    24   0 2138m 7728 6620 R  0.3  0.1   0:00.01 oracle                                                                                                               
    1 root      15   0  2176  640  552 S  0.0  0.0   0:10.46 init                                                                                                                 
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.34 migration/0                                                                                                          
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.08 ksoftirqd/0                                                                                                          
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0                                                                                                           
    5 root      RT  -5     0    0    0 S  0.0  0.0   0:00.46 migration/1                                                                                                          
    6 root      34  19     0    0    0 S  0.0  0.0   0:00.07 ksoftirqd/1                                                                                                          
    7 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/1                                                                                                           
    8 root      RT  -5     0    0    0 S  0.0  0.0   0:00.41 migration/2                                                                                                          
    9 root      34  19     0    0    0 S  0.0  0.0   0:00.06 ksoftirqd/2                                                                                                          
   10 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/2                                                                                                           
   11 root      RT  -5     0    0    0 S  0.0  0.0   0:00.34 migration/3                                                                                                          
   12 root      34  19     0    0    0 S  0.0  0.0   0:00.06 ksoftirqd/3                                                                                                          
   13 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/3                                                                                                           
   14 root      RT  -5     0    0    0 S  0.0  0.0   0:00.39 migration/4                                                                                                          
   15 root      34  19     0    0    0 S  0.0  0.0   0:00.04 ksoftirqd/4                                                                                                          
   16 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/4                                                                                                           
   17 root      RT  -5     0    0    0 S  0.0  0.0   0:00.41 migration/5                                                                                                          
   18 root      34  19     0    0    0 S  0.0  0.0   0:00.07 ksoftirqd/5                                                                                                          
   19 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/5                                                                                                           
   20 root      RT  -5     0    0    0 S  0.0  0.0   0:00.41 migration/6                                                                                                          
   21 root      34  19     0    0    0 S  0.0  0.0   0:00.02 ksoftirqd/6                                                                                                          
   22 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/6                                                                                                           
   23 root      RT  -5     0    0    0 S  0.0  0.0   0:00.47 migration/7                                                                                                          
   24 root      34  19     0    0    0 S  0.0  0.0   0:00.04 ksoftirqd/7                                                                                                          
   25 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/7                                                                                                           
   26 root      10  -5     0    0    0 S  0.0  0.0   0:00.01 events/0      查看var/log/message 目录下的message日志,在服务器未异常时,未查看到特殊信息。服务器异常时,有下边提示:Jun 28 06:20:39 fzyz kernel: HighMem: 196420*4kB 66312*8kB 4597*16kB 221*32kB 21*64kB 6*128kB 2*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1399936kB
Jun 28 06:20:39  kernel: 1071026 pagecache pages
Jun 28 06:20:39  kernel: Swap cache: add 2495943, delete 2486311, find 1145896/1379632, race 0+8
Jun 28 06:20:39  kernel: Free swap  = 27325956kB
Jun 28 06:20:39  kernel: Total swap = 27647824kB
Jun 28 06:20:39  kernel: Free swap:       27325956kB
Jun 28 06:20:39  kernel: 1977312 pages of RAM
Jun 28 06:20:39  kernel: 1790946 pages of HIGHMEM
Jun 28 06:20:39  kernel: 35579 reserved pages
Jun 28 06:20:39  kernel: 4351061 pages shared
Jun 28 06:20:39  kernel: 9637 pages swap cached
Jun 28 06:20:39  kernel: 17 pages dirty
Jun 28 06:20:39  kernel: 0 pages writeback
Jun 28 06:20:39  kernel: 421036 pages mapped
Jun 28 06:20:39  kernel: 24437 pages slab
Jun 28 06:20:39  kernel: 121373 pages pagetables
Jun 28 06:20:39  kernel: Out of memory: Killed process 28858, UID 501, (oracle).
Jun 28 06:20:39  kernel: Xorg invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
Jun 28 06:20:39  kernel:  [<c0455d5f>] out_of_memory+0x72/0x1a3
Jun 28 06:20:39  kernel:  [<c045737b>] __alloc_pages+0x24e/0x2cf
Jun 28 06:20:39  kernel:  [<c0457421>] __get_free_pages+0x25/0x31
Jun 28 06:20:39  kernel:  [<c048563d>] __pollwait+0x44/0xb2
Jun 28 06:20:39  kernel:  [<c0617d7b>] unix_poll+0x17/0x94
Jun 28 06:20:39  kernel:  [<c05b9d68>] sock_poll+0xc/0xe
Jun 28 06:20:39  kernel:  [<c0484f2e>] do_select+0x227/0x3cb
Jun 28 06:20:39  kernel:  [<c04855f9>] __pollwait+0x0/0xb2
Jun 28 06:20:39  kernel:  [<c041ad22>] default_wake_function+0x0/0xc
Jun 28 06:20:40  last message repeated 19 times
Jun 28 06:20:40  kernel:  [<c048537b>] core_sys_select+0x2a9/0x2ca
Jun 28 06:20:40  kernel:  [<c0431dd3>] autoremove_wake_function+0x0/0x2d
Jun 28 06:20:40  kernel:  [<c0473612>] do_readv_writev+0x22e/0x247
Jun 28 06:20:40  kernel:  [<c0485975>] sys_select+0xd1/0x180
Jun 28 06:20:40  kernel:  [<c0405417>] syscall_call+0x7/0xb
Jun 28 06:20:40  kernel:  =======================执行show parameter sga:
SQL> show parameter sgaNAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
lock_sga                             boolean     FALSE
pre_page_sga                         boolean     FALSE
sga_max_size                         big integer 2G
sga_target                           big integer 2G
对服务器和oracle不是很精通,只能猜想原因是oracle内存溢出,不确定。 如果要修改sga_max_size的值,针对当前服务器情况,修改为多大合适呢?
不知道提供信息是否完整,如不够,继续贴。先谢了。

解决方案 »

  1.   

    vmstat 5 10 
    iostat 5 10 看看内存和CPU是使用情况
      

  2.   

    # vmstat 5 10 
    procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
     0  0 111944  49868  87220 6375040    0    0    28    11    0    1 14  0 86  0  0
     0  0 111944  45268  87236 6380244    0    0  3075   104 2886  974  2  0 98  0  0
     0  0 111944  48804  86840 6377224    0    0  3204    47 3448 1190  4  1 95  0  0
     0  0 111944  41252  86880 6383336    0    0  3117   167 3079 1102  2  0 97  1  0
     0  0 111944  47800  86364 6377680    0    0  2666    84 2666 1051  2  0 97  0  0
     0  0 111944  42100  85892 6384088    0    0  3152    52 2928  984  2  0 98  0  0
     1  0 111944  47396  85616 6379440    0    0  3002    48 3164 1280  4  1 94  0  0
     0  0 111948  42112  85320 6385788    0    0  3029   165 3058 1104  4  0 96  0  0
     0  0 111948  46568  84968 6380636    0    0  2798    67 2911 1161  4  0 95  0  0
     0  0 111948  41596  84732 6386044    0    0  3024    74 2822 1077  2  0 97  0  0
    iostat 5 10 执行失败
    # iostat 5 10
    -bash: iostat: command not found
    服务器版本是Fedora release 11 (Leonidas),和服务器版本有关?
      

  3.   

    没有安装IOSATA包,不过看内存使用不高,服务器死机不至于吧,看TOP里的CPU使用率也不高,
    1 zombie 有一个僵尸进程,用TOP -C看看哪些ORACLE进程在跑
      

  4.   


    top - 11:39:08 up 44 days,  3:30,  2 users,  load average: 0.88, 0.62, 0.54
    Tasks: 294 total,   1 running, 292 sleeping,   0 stopped,   1 zombie
    Cpu(s):  1.8%us,  0.0%sy,  0.0%ni, 97.9%id,  0.2%wa,  0.0%hi,  0.1%si,  0.0%st
    Mem:   7901184k total,  7852236k used,    48948k free,    51924k buffers
    Swap: 27647824k total,   364532k used, 27283292k free,  6255196k cached  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                              
    11505 root      15   0 11328 4484 2712 S 13.6  0.1   6:39.73 sshd: root@notty                                                                                                     
    32109 root      19   0 1798m 402m 8336 S  1.7  5.2  17:49.55 /usr/jdk1.6.0_16/bin/java -server -XX:PermSize=256M -XX:MaxPermSize=512M -Djava.endorsed.dirs=/fzyzweb/apache-tomcat-
    13590 root      15   0  2544 1176  824 R  0.3  0.0   0:00.06 top -c                                                                                                               
        1 root      15   0  2176  640  552 S  0.0  0.0   0:10.47 init [5]                                                                                                             
        2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.34 [migration/0]                                                                                                        
        3 root      34  19     0    0    0 S  0.0  0.0   0:00.08 [ksoftirqd/0]                                                                                                        
        4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 [watchdog/0]                                                                                                         
        5 root      RT  -5     0    0    0 S  0.0  0.0   0:00.46 [migration/1]                                                                                                        
        6 root      34  19     0    0    0 S  0.0  0.0   0:00.07 [ksoftirqd/1]                                                                                                        
        7 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 [watchdog/1]                                                                                                         
        8 root      RT  -5     0    0    0 S  0.0  0.0   0:00.41 [migration/2]                                                                                                        
        9 root      34  19     0    0    0 S  0.0  0.0   0:00.06 [ksoftirqd/2]                                                                                                        
       10 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 [watchdog/2]                                                                                                         
       11 root      RT  -5     0    0    0 S  0.0  0.0   0:00.34 [migration/3]                                                                                                        
       12 root      34  19     0    0    0 S  0.0  0.0   0:00.06 [ksoftirqd/3]                                                                                                        
       13 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 [watchdog/3]                                                                                                         
       14 root      RT  -5     0    0    0 S  0.0  0.0   0:00.39 [migration/4]                                                                                                        
       15 root      34  19     0    0    0 S  0.0  0.0   0:00.04 [ksoftirqd/4]                                                                                                        
       16 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 [watchdog/4]                                                                                                         
       17 root      RT  -5     0    0    0 S  0.0  0.0   0:00.41 [migration/5]                                                                                                        
       18 root      36  19     0    0    0 S  0.0  0.0   0:00.08 [ksoftirqd/5]                                                                                                        
       19 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 [watchdog/5]                                                                                                         
       20 root      RT  -5     0    0    0 S  0.0  0.0   0:00.42 [migration/6]                                                                                                        
       21 root      34  19     0    0    0 S  0.0  0.0   0:00.02 [ksoftirqd/6]                                                                                                        
       22 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 [watchdog/6]                                                                                                         
       23 root      RT  -5     0    0    0 S  0.0  0.0   0:00.47 [migration/7]                                                                                                        
       24 root      34  19     0    0    0 S  0.0  0.0   0:00.04 [ksoftirqd/7]                                                                                                        
       25 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 [watchdog/7]                                                                                                         
       26 root      10  -5     0    0    0 S  0.0  0.0   0:00.01 [events/0]                                                                                                           
       27 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 [events/1]                                                                                                           
       28 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 [events/2]                                                                                                           
       29 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 [events/3]                                                                                                           
       30 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 [events/4]                                                                                                           
       31 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 [events/5]                                                                                                           
       32 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 [events/6]    
      

  5.   

    上边信息不准确,下边:     PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                              
    13762 oracle    15   0 2141m  57m  54m S 27.6  0.7   0:02.02 oracleorcl (LOCAL=NO)                                                                                                
    11505 root      15   0 11328 4496 2712 S 12.9  0.1   8:00.33 sshd: root@notty                                                                                                     
    32109 root      19   0 1798m 455m 8336 S  4.0  5.9  18:07.38 /usr/jdk1.6.0_16/bin/java -server -XX:PermSize=256M -XX:MaxPermSize=512M -Djava.endorsed.dirs=/fzyzweb/apache-tomcat-
    13778 oracle    15   0 2140m  41m  39m S  2.7  0.5   0:00.08 oracleorcl (LOCAL=NO)                                                                                                
    13780 oracle    17   0 2140m  33m  31m S  2.0  0.4   0:00.06 oracleorcl (LOCAL=NO)                                                                                                
    13756 oracle    15   0 2141m  68m  65m S  1.7  0.9   0:03.56 oracleorcl (LOCAL=NO)                                                                                                
    13710 oracle    15   0 2141m  79m  76m S  1.3  1.0   0:06.96 oracleorcl (LOCAL=NO)                                                                                                
    13740 oracle    15   0 2140m  51m  49m S  1.0  0.7   0:00.68 oracleorcl (LOCAL=NO)                                                                                                
    13590 root      15   0  2544 1180  824 R  0.7  0.0   0:01.90 top -c                                                                                                               
    11507 root      16   0  8332 3296 1208 S  0.3  0.0   0:05.91 /usr/libexec/openssh/sftp-server                                                                                     
    13752 oracle    17   0 2140m  57m  54m S  0.3  0.7   0:01.58 oracleorcl (LOCAL=NO)                                                                                                
        1 root      15   0  2176  640  552 S  0.0  0.0   0:10.47 init [5]                                                                                                             
        2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.34 [migration/0]     
      

  6.   

    13778 oracle 15 0 2140m 41m 39m S 2.7 0.5 0:00.08 oracleorcl (LOCAL=NO)   
    13780 oracle 17 0 2140m 33m 31m S 2.0 0.4 0:00.06 oracleorcl (LOCAL=NO)   
    13756 oracle 15 0 2141m 68m 65m S 1.7 0.9 0:03.56 oracleorcl (LOCAL=NO)   
    13710 oracle 15 0 2141m 79m 76m S 1.3 1.0 0:06.96 oracleorcl (LOCAL=NO)   
    13740 oracle 15 0 2140m 51m 49m S 1.0 0.7 0:00.68 oracleorcl (LOCAL=NO)  这部分不间断出现,有时多个oracleorcl,大部分时间只有一个
      

  7.   

    都是用户进程连接上来,看看ORACLE的ALERT.LOG日志里是否有异常信息
    看看用户进程都执行哪些操作?  PGA设置多大,顺便你说的死机是什么情况,多久一次,操作系统版本,数据库版本?
      

  8.   

    你这数据库是不是单纯的oracle数据库?如果是SGA可以加大到12G左右,还有PGA的大小也可以调大点~~
      

  9.   

    alert.log中有一些死锁的信息,但时间不是服务器发生故障的时间。Tue Jun 28 05:25:12 2011
    ORA-00060: Deadlock detected. More info in file /home/oracle/oracle/product/10.2.0/db_1/admin/orcl/udump/orcl_ora_25093.trc.
    Tue Jun 28 05:33:07 2011
    ORA-00060: Deadlock detected. More info in file /home/oracle/oracle/product/10.2.0/db_1/admin/orcl/udump/orcl_ora_25061.trc.
    Tue Jun 28 05:33:43 2011
    ORA-00060: Deadlock detected. More info in file /home/oracle/oracle/product/10.2.0/db_1/admin/orcl/udump/orcl_ora_25093.trc.
    Tue Jun 28 05:33:52 2011
    ORA-00060: Deadlock detected. More info in file /home/oracle/oracle/product/10.2.0/db_1/admin/orcl/udump/orcl_ora_25061.trc.
    Tue Jun 28 05:39:22 2011
    ORA-00060: Deadlock detected. More info in file /home/oracle/oracle/product/10.2.0/db_1/admin/orcl/udump/orcl_ora_25093.trc.
    Tue Jun 28 05:41:19 2011
    ORA-00060: Deadlock detected. More info in file /home/oracle/oracle/product/10.2.0/db_1/admin/orcl/udump/orcl_ora_25061.trc.
    Tue Jun 28 05:44:38 2011
    ORA-00060: Deadlock detected. More info in file /home/oracle/oracle/product/10.2.0/db_1/admin/orcl/udump/orcl_ora_25093.trc.
    Tue Jun 28 05:47:53 2011
    ORA-00060: Deadlock detected. More info in file /home/oracle/oracle/product/10.2.0/db_1/admin/orcl/udump/orcl_ora_25061.trc.
    Tue Jun 28 05:50:46 2011
    ORA-00060: Deadlock detected. More info in file /home/oracle/oracle/product/10.2.0/db_1/admin/orcl/udump/orcl_ora_25093.trc.
    Tue Jun 28 09:58:46 2011   这是离发生故障较近的时间
    Thread 1 advanced to log sequence 203
      Current log# 1 seq# 203 mem# 0: /home/oracle/oracle/product/10.2.0/oradata/orcl/redo01.log
    在orcl/udump/orcl_ora_25093.trc.文件中找到了导致死锁的语句
      

  10.   


    是单纯的oracle数据库,目前看应该是SGA偏小的问题,只是不知道SGA设置多大,对于当前服务器是最优的,不敢轻易调。
      

  11.   

    对linux服务器了解甚少,用
    select sess.sid,   
      sess.serial#,   
     lo.oracle_username,   
     lo.os_user_name,   
       ao.object_name,   
      lo.locked_mode   
     from v$locked_object lo,   
      dba_objects ao,   
      v$session sess   
    where ao.object_id = lo.object_id and lo.session_id = sess.sid;    
    查看oralce锁表进程,发生“死机”情况时,锁表进程最多达到170多个。汗。。
    原来查看一条新闻,要进行多次对新闻表的update操作,未及时commit,导致锁表,出现所谓的“死机”。
    对程序进行修改,所有数据库操作均在存储过程实现,及时commit。 
    现在已经无锁表现象出现。
    需要深入学习服务器了。多谢java3344520赐教