Home > IT Architecture, Kernel, Systems, Unix > Resolved:[Load Manager Shared Memory]. Error is [28]: [No space left on device](for apache, pmserver etc. running on linux, solaris, unix)

Resolved:[Load Manager Shared Memory]. Error is [28]: [No space left on device](for apache, pmserver etc. running on linux, solaris, unix)

April 23rd, 2011

This error may occur in pmserver, apache, oracle, rsync, up2date and many other services running on linux, solaris, unix, so it's a widespread and a famous question if you try to search google the keyword:"[Load Manager Shared Memory].  Error is [28]: [No space left on device]".
Now, let's take pmserver running on solaris10 for example to demonstrate to you step by step on how to solve the annoying problem.
Firstly, from "[No space left on device]" and "Load Manager Shared Memory", we firstly guessed that it's caused by shortage of memory, but after checking, we can see that memory is enough to allocate:

1.check the total memory size:

# /usr/sbin/prtconf |grep -i mem
Memory size: 32640 Megabytes
memory (driver not attached)
virtual-memory (driver not attached)

2.check application project memory size:

# su - sbintprd #as you have guessed, pmserver is running by user sbintprd in the box
$ id -p
uid=71269(sbintprd) gid=70772(sbintprd) projid=3(default)

This means that pmserver is running inside 'default' project. Then let's check the setting of "default" project:

# projects -l default
default
projid : 3
comment: ""
users  : (none)
groups : (none)
attribs: project.max-msg-ids=(privileged,256,deny)
project.max-shm-memory=(privileged,17179869184,deny)

# prctl -n project.max-shm-memory -i project default
project: 3: default
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-shm-memory
privileged      16.0GB      -   deny                                 -
system          16.0EB    max   deny                                 -

16GB available to 'default' project. How come the shortage of memory then?

Let's bump up the max-shm-memory size by 2 GB to see what happens:

#prctl -n project.max-shm-memory -r -v 18gb -i project default

After this, we tried to bounce the pmserver, but the problem is still there:

#tail -f pmserver.log
INFO : LM_36070 [Fri Apr 22 22:19:42 2011] : (25218|1) The server is running on a host with 32 logical processors.
INFO : LM_36039 [Fri Apr 22 22:19:42 2011] : (25218|1) The maximum number of sessions that can run simultaneously is [10].
FATAL ERROR : CMN_1011 [Fri Apr 22 22:19:42 2011] : (25218|1) Error allocating system shared memory of [2000000] bytes for [Load Manager Shared Memory].  Error is [28]: [No space left on device]
FATAL ERROR : SF_34004 [Fri Apr 22 22:19:42 2011] : (25218|1) Server initialization failed.
INFO : SF_34014 [Fri Apr 22 22:19:42 2011] : (25218|1) Server shut down.

OK, then, we should think in other ways.

As we know, linux use shared memory between processes. We can use ipcs to check the information about  active  shared  memory  segments:

# ipcs -m|grep sbintprd
m  671088691   0          --rw------- sbintprd sbintprd

#ipcs -mA|grep sbintprd|wc -l
92

And each of them use 20000 size:

IPC status from <running system> as of Sat Apr 23 03:29:51 BST 2011
T         ID      KEY        MODE        OWNER    GROUP  CREATOR   CGROUP NATTCH      SEGSZ  CPID  LPID   ATIME    DTIME    CTIME  ISMATTCH         PROJECT
Shared Memory:
m  671088691   0          --rw------- sbintprd sbintprd sbintprd sbintprd      1    2000000  7781 16109  3:28:35  3:28:50  2:01:43        0         default

Now, we can conclude that the sbintprd user has over allocated and is not freeing up the space. So let's clear the shared memeory:

#for i in `ipcs -m | grep prd | awk '{print $2}'`; do ipcrm -m $i; done

After this step, the pmserver started successfully. From the log we can see:

NFO : LM_36070 [Sat Apr 23 01:51:17 2011]
: (5979|1) The server is running on a host with 32 logical processors.
INFO : LM_36039 [Sat Apr 23 01:51:18 2011] : (5979|1) The maximum number of sessions that
can run simultaneously is [10].
INFO : CMN_1010 [Sat Apr 23 01:51:18 2011] : (5979|1) Allocated system shared memory [id =
469762275] of [2000000] bytes for [Load Manager Shared Memory].
INFO : LM_36095 [Sat Apr 23 01:51:50 2011] : (5979|1) Persistent session cache file
cleanup is scheduled to run on [Sun Apr 24 01:51:50 2011].
INFO : SF_34003 [Sat Apr 23 01:51:50 2011] : (5979|1) Server initialization completed.

Problem resolved!

Problem resolved?