Home > Databases, IT Architecture, Oracle DB > resolved – ORA-27300 OS system dependent operation:fork failed with status: 11

resolved – ORA-27300 OS system dependent operation:fork failed with status: 11

November 18th, 2014

Today we observed all our DB were in down status, and in the trace file:

Errors in file /u01/database/diag/rdbms/oimdb/OIMDB/trace/OIMDB_psp0_3173.trc:

ORA-27300: OS system dependent operation:fork failed with status: 11

ORA-27301: OS failure message: Resource temporarily unavailable

ORA-27302: failure occurred at: skgpspawn5

After some searching for ORA-27300, I found this article, which suggested that it's the issue of user processes used up and system could not spawn new one at the time. As the problem happened at Mon Nov 17 02:08:51 2014, so I did some check using sysstat sar:

[root@test sa]# sar -f /var/log/sa/sa17 -s 00:00:00 -e 03:20:00
Linux 2.6.32-300.27.1.el5uek (slcn11vmf0029) 11/17/14

00:00:01 CPU %user %nice %system %iowait %steal %idle
00:10:01 all 1.16 0.12 0.48 0.71 0.18 97.35
00:20:02 all 1.30 0.00 0.47 0.95 0.19 97.10
00:30:01 all 1.88 0.00 0.63 1.98 0.19 95.32
00:40:01 all 1.00 0.00 0.35 2.15 0.18 96.32
00:50:01 all 1.09 0.00 0.40 0.47 0.18 97.87
01:00:01 all 1.03 0.00 0.34 0.25 0.16 98.23
01:10:01 all 3.98 0.02 1.72 4.26 0.22 89.80
01:20:01 all 9.98 0.13 5.99 47.40 0.31 36.19
01:30:01 all 1.86 0.00 1.24 48.72 0.16 48.01
01:40:01 all 1.08 0.00 0.82 48.77 0.18 49.15
01:50:01 all 1.54 0.00 0.97 49.32 0.18 47.98
02:00:01 all 1.05 0.00 0.85 48.74 0.18 49.19 --- problem occurred at Mon Nov 17 02:08:51 2014
02:10:01 all 10.14 0.14 8.95 44.75 0.34 35.68
02:20:01 all 0.06 0.00 0.21 1.87 0.07 97.78
02:30:01 all 0.08 0.00 0.29 2.81 0.08 96.74
02:40:01 all 0.09 0.00 0.31 3.08 0.08 96.44
02:50:01 all 0.05 0.00 0.13 0.96 0.06 98.81
03:00:01 all 0.07 0.00 0.26 2.38 0.07 97.22
03:10:01 all 0.06 0.12 0.21 1.52 0.07 98.02
Average: all 1.85 0.03 1.20 15.89 0.16 80.88

[root@test sa]# sar -f /var/log/sa/sa17 -s 01:10:00 -e 02:11:00 -A
......
......
01:10:01 kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
01:20:01 259940 15482728 98.35 2004 11703696 0 2104504 100.00 194056 -- even all SWAP spaces were used up
01:30:01 398584 15344084 97.47 904 11703152 0 2104504 100.00 191728
01:40:01 409104 15333564 97.40 984 11716924 0 2104504 100.00 191404
01:50:01 452844 15289824 97.12 1004 11711548 0 2104504 100.00 189076
02:00:01 440780 15301888 97.20 1424 11757600 0 2104504 100.00 189364
02:10:01 14602712 1139956 7.24 19548 382588 1978020 126484 6.01 3096
Average: 2760661 12982007 82.46 4311 9829251 329670 1774834 84.34 159787

So this proved that system was very busy during that time. I then increased oracle user's user process number to 131072 in /etc/security/limits.conf with the following:

* soft nproc 131072
* hard nproc 131072

And also set kernel.pid_max to 139264(which is 131072 plus 8192 which is recommended for OS stability) in /etc/sysctl.conf.

[root@test ~]# sysctl -a|grep pid_max
kernel.pid_max = 139264

Then increased memory from 16G to 32G of the box, and reboot.

Good Luck!


Comments are closed.