由于出现ora-00020超出最大进程数的问题,修改了processes数为1500
之后数据库都正常,但是最近发现在udump的几乎所有的trc文件中,都有5-6个ORA-00020的报错,而在alert文件中没有报错,数据库看着正常,只是似乎感觉有些慢,请各位看看是什么原因?ORA-00020: maximum number of processes 150 exceeded
/export/home/oracle/admin/orcl/udump/orcl_ora_21259.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /export/home/oracle/product/10.0
System name: SunOS
Node name: EEE-DataBase
Release: 5.10
Version: Generic_148888-01
Machine: sun4v
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 0
Unix process pid: 21259, image: oracle@EEE-DataBase
SQL> show parameter process; NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 0
db_writer_processes integer 16
gcs_server_processes integer 0
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 1500SQL> select count(*) from v$process;
COUNT(*)
----------
256
之后数据库都正常,但是最近发现在udump的几乎所有的trc文件中,都有5-6个ORA-00020的报错,而在alert文件中没有报错,数据库看着正常,只是似乎感觉有些慢,请各位看看是什么原因?ORA-00020: maximum number of processes 150 exceeded
/export/home/oracle/admin/orcl/udump/orcl_ora_21259.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /export/home/oracle/product/10.0
System name: SunOS
Node name: EEE-DataBase
Release: 5.10
Version: Generic_148888-01
Machine: sun4v
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 0
Unix process pid: 21259, image: oracle@EEE-DataBase
SQL> show parameter process; NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 0
db_writer_processes integer 16
gcs_server_processes integer 0
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 1500SQL> select count(*) from v$process;
COUNT(*)
----------
256
有个人,没病,他去找医生看病,医生说他没病,但是他还是说自己有病,结果会怎样?
A、医生把他撵出来了;
B、医生把它撵出来了,他转头捅死了医生
C、他一直感觉自己有病,但所有医生都说他没病,然后有一天他疯了,真病了……好吧,说正经的:
你说你感觉数据库慢?慢在哪里?慢在某个功能点?还是所有操作上?慢很很多吗?比如原来跑了100毫秒的SQL,现在跑了200毫秒?数据库看着正常?你为什么觉得他正常?感觉它正常?然后感觉它又有点慢?
1、如果你觉得整个库都慢,那么提供:你觉得数据库慢时段数据库的AWR报告;
2、如果你觉得某个功能点慢,那么找出这个功能点跑的慢的SQL,提供这个SQL的执行计划。
至少先有了这些信息再提性能问题,上面虽说是玩笑,但有一点与医生看病是异曲同工的:问题不是猜出来的、感受出来的——谈恋爱的时候问题大概是这么来的 ,而是需要检查,需要有凭有据。BTW:进程数这种东西,即使真的出现与之相关的问题,它也只会是表象——比如进程数爆了,不会是问题的根源。
alert文件:
Fri Nov 17 15:04:16 2017
Thread 1 advanced to log sequence 17463 (LGWR switch)
Current log# 9 seq# 17463 mem# 0: /oradata/orcl/redo09.log
Fri Nov 17 15:09:46 2017
Thread 1 advanced to log sequence 17464 (LGWR switch)
Current log# 10 seq# 17464 mem# 0: /oradata/orcl/redo10.log
Fri Nov 17 15:16:54 2017
Thread 1 advanced to log sequence 17465 (LGWR switch)
Current log# 1 seq# 17465 mem# 0: /oradata/orcl/redo01.log
Fri Nov 17 15:19:11 2017
Thread 1 advanced to log sequence 17466 (LGWR switch)
Current log# 2 seq# 17466 mem# 0: /oradata/orcl/redo02.log
Fri Nov 17 15:19:29 2017
Thread 1 advanced to log sequence 17467 (LGWR switch)
Current log# 3 seq# 17467 mem# 0: /oradata/orcl/redo03.log
Fri Nov 17 15:19:52 2017
Thread 1 advanced to log sequence 17468 (LGWR switch)
Current log# 4 seq# 17468 mem# 0: /oradata/orcl/redo04.log
Fri Nov 17 15:20:39 2017
Thread 1 advanced to log sequence 17469 (LGWR switch)
Current log# 5 seq# 17469 mem# 0: /oradata/orcl/redo05.log
Fri Nov 17 15:22:36 2017
ORA-00060: Deadlock detected. More info in file /export/home/oracle/admin/orcl/udump/orcl
_ora_25025.trc.
Fri Nov 17 15:25:18 2017
Thread 1 advanced to log sequence 17470 (LGWR switch)
Current log# 11 seq# 17470 mem# 0: /oradata/orcl/redo11.log
Fri Nov 17 15:31:22 2017
Thread 1 advanced to log sequence 17471 (LGWR switch)
Current log# 12 seq# 17471 mem# 0: /oradata/orcl/redo12.log
Fri Nov 17 15:38:24 2017
Thread 1 advanced to log sequence 17472 (LGWR switch)
Current log# 6 seq# 17472 mem# 0: /oradata/orcl/redo06.log
Fri Nov 17 15:47:12 2017
Thread 1 advanced to log sequence 17473 (LGWR switch)
Current log# 7 seq# 17473 mem# 0: /oradata/orcl/redo07.log
Fri Nov 17 15:52:40 2017
Thread 1 advanced to log sequence 17474 (LGWR switch)
Current log# 8 seq# 17474 mem# 0: /oradata/orcl/redo08.log
Fri Nov 17 15:54:17 2017
ORA-00060: Deadlock detected. More info in file /export/home/oracle/admin/orcl/udump/orcl
_ora_7809.trc.
Fri Nov 17 15:57:15 2017
Thread 1 advanced to log sequence 17475 (LGWR switch)
Current log# 9 seq# 17475 mem# 0: /oradata/orcl/redo09.log
Fri Nov 17 16:01:30 2017
Thread 1 advanced to log sequence 17476 (LGWR switch)
Current log# 10 seq# 17476 mem# 0: /oradata/orcl/redo10.log报死锁的那个trc:
/export/home/oracle/admin/orcl/udump/orcl_ora_7809.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /export/home/oracle/product/10.0
System name: SunOS
Node name: EAS-DataBase
Release: 5.10
Version: Generic_148888-01
Machine: sun4v
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 0
Unix process pid: 7809, image: oracle@EAS-DataBaseORA-00020: maximum number of processes 150 exceeded
/export/home/oracle/admin/orcl/udump/orcl_ora_7809.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /export/home/oracle/product/10.0
System name: SunOS
Node name: EAS-DataBase
Release: 5.10
Version: Generic_148888-01
Machine: sun4v
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 0
Unix process pid: 7809, image: oracle@EAS-DataBaseORA-00020: maximum number of processes 150 exceeded
/export/home/oracle/admin/orcl/udump/orcl_ora_7809.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /export/home/oracle/product/10.0
System name: SunOS
Node name: EAS-DataBase
Release: 5.10
Version: Generic_148888-01
Machine: sun4v
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 0
Unix process pid: 7809, image: oracle@EAS-DataBaseORA-00020: maximum number of processes 150 exceeded
/export/home/oracle/admin/orcl/udump/orcl_ora_7809.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
"/export/home/oracle/admin/orcl/udump/orcl_ora_7809.trc" 15049 行,842130 字符
详细的trc文件和awr我上传在这里了,麻烦给看下?
https://cloud.ciscol.com/nextcloud/index.php/s/M4UxYcI6WLgX2ky
这个系统好像有大量的临时表在创建,在删除?插入这些表或者引用到这些表的时候,搞出了很多文本不太一样的SQL来,可能因为设计问题,其他也有一些SQL也不方便使用绑定变量,解析看来是有点压力的,共享池还撑到了18g,比buffer cache大,但是300次/s软解析,30次/s的硬解析,应该还算过得去,不知道为啥解析时间占DB TIME的比重会那么高,高达30%,有没有同时段操作系统的性能监控,CPU的使用率是不是比较高?或者说这服务器上是不是有其他应用在与数据库争抢资源?
还有,数据库1/10的DB TIME是TX锁争用,结合alert日志经常报出ORA-60,看来应用的逻辑是有点问题的,这个包括死锁,你们得自己查程序了,也许优化流程中的TOP SQL可能一定程度上改善锁争用的现象。
仍然对trc文件里的processor耿耿于怀,感觉是不是哪里没设置好?-bash-3.2$ vmstat 2
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s5 s6 sd sd in sy cs us sy id
0 0 0 49316536 22549088 2614 202 20577 1 1 0 0 -0 4 -1 -1 3223 8294 3308 1 0 98
0 0 0 48328040 20626272 237 871 76 4 4 0 0 0 2 0 0 3229 11172 3695 1 0 99
0 0 0 48327272 20624968 19 88 12 0 0 0 0 0 0 0 0 2352 8449 2068 1 0 98
0 0 0 48327944 20625128 19 22 12 0 0 0 0 0 0 0 0 1394 2152 1044 0 0 100
0 0 0 48324840 20621768 187 532 12 0 0 0 0 0 0 0 0 1981 5092 1880 1 0 99oracle参数:
System parameters with non-default values:
processes = 1500
__shared_pool_size = 18622709760
__large_pool_size = 16777216
__java_pool_size = 117440512
__streams_pool_size = 16777216
sga_target = 34359738368
control_files = /oradata/orcl/control01.ctl, /oradata/orcl/control02.ctl, /o
radata/orcl/control03.ctl
db_block_size = 8192
__db_cache_size = 15435038720
compatible = 10.2.0.3.0
log_archive_format = %t_%s_%r.dbf
db_file_multiblock_read_count= 16
db_recovery_file_dest = /oradata/flash_recovery_area
db_recovery_file_dest_size= 1649267441664
undo_management = AUTO
undo_tablespace = UNDOTBS1
undo_retention = 32400
remote_login_passwordfile= EXCLUSIVE
db_domain =
dispatchers = (PROTOCOL=TCP) (SERVICE=orclXDB)
job_queue_processes = 10
background_dump_dest = /export/home/oracle/admin/orcl/bdump
user_dump_dest = /export/home/oracle/admin/orcl/udump
core_dump_dest = /export/home/oracle/admin/orcl/cdump
audit_file_dest = /export/home/oracle/admin/orcl/adump
db_name = orcl
open_cursors = 300
pga_aggregate_target = 6830424064
如果优化程序很实现(因为看来这是jindie的ERP系统),也许对象的统计信息可以检查下是不是没有跟上数据的变化?如果SQL优化或者程序优化短时间内没法做,可以考虑增大SGA看看情况。
不过最好根据原来系统稳定运行时候的共享池和buffer cache的情况,给现在的系统中的这两个参数也设置个最小值,也就是shared_pool_size和db_cache_size
对象的统计信息也可以先查一下,因为有些看上去像是统计的SQL执行时间还是比较长的,如果有必要的话要收集对象的统计信息,以得到更好的执行计划。plsql dev如果平常登录也慢的话,考虑收集下数据字典的统计信息,如果平常可以,那么可能是因为高峰期资源被占用的厉害导致它也受累了。