Archive

Archive for the ‘Unix’ Category

Resolved:solaris patch panic – cannot start the system after patch

April 27th, 2011 No comments

Here goes the whole story:

Step 1. patch with PCA. after reboot — -r

Rebooting with command: boot -r

Boot device: /pci@1c,600000/scsi@2/disk@0,0:a  File and args: -r

SunOS Release 5.10 Version Generic_142900-13 64-bit

Copyright 1983-2010 Sun Microsystems, Inc.  All rights reserved.

Use is subject to license terms.

WARNING: mod_load: cannot load module ‘sharefs’

WARNING: Cannot mount /etc/dfs/sharetab

 

Hardware watchdog enabled

/kernel/drv/sparcv9/ip: undefined symbol ‘ddi_get_lbolt64′

WARNING: mod_load: cannot load module ‘ip’

/kernel/fs/sparcv9/sockfs: undefined symbol ‘sock_comm_create_function’

/kernel/fs/sparcv9/sockfs: undefined symbol ‘smod_lookup_byname’

/kernel/fs/sparcv9/sockfs: undefined symbol ‘sctp_disconnect’

/kernel/fs/sparcv9/sockfs: undefined symbol ‘sctp_getsockname’

/kernel/fs/sparcv9/sockfs: undefined symbol ‘nd_free’

/kernel/fs/sparcv9/sockfs: undefined symbol ‘nd_load’

/kernel/fs/sparcv9/sockfs: undefined symbol ‘UDP_WR’

 

Step 2. zfs Roll back

ok>boot -F failsafe

#zfs rollback rpool/ROOT/sol10_sparc@pre_patched.142900-13_04.03.2011

 

Step 3. Patch with PCA again, then “halt”, boot archive is not updated after patching, so we need remove the boot_archive

ok>boot -F failsafe

# mv /a/platform/`uname -i`/boot_archive /a/root/b_back

# /a/sbin/bootadm update-archive -R /a

# reboot

 

Step 4. Server is patched

root@solaris01~# uname -a

SunOS solaris01 5.10 Generic_144488-06 sun4u sparc SUNW,Sun-Fire-V240

Step 5. Restore the Zone Read more…

Categories: Kernel, Unix Tags: ,

luxadm usage

April 24th, 2011 No comments

Here’s the more detailed one about luxadm, please refer to this:

 

luxadm forcelip/display on solaris 10

http://www.doxer.org/learn-linux/luxadm-forcelipdisplay-on-solaris-10/

Some of the basic usage:

luxadm probe                 (discovers fcal)
luxadm display Enclosure (displays information on fcal box)
luxadm reserve /dev/rdsk/c#t#d#s# (reserves device so it can’t be accessed)
luxadm -e offline /dev/rdsk/c#t#d#s#     (takes a device offline)
luxadm -e bus_quiesce /dev/rdsk/c#t#d#s#    (quiesce the bus)
luxadm -e bus_unquiesce /dev/rdsk/c#t#d#s# (unquiesce the bus)
luxadm -e online /dev/rdsk/c#t#d#s#    (bring the disk device back online)
luxadm release /dev/rdsk/c#t#d#s#    (unreserved the device for use)
luxadm remove_device BAD,f2    (removes a device from slot f2 on enclosure BAD)
luxadm insert_device BAD,f2     (hot plug a new device to slot f2 on enclosure BAD)

What’s luxadm used for?

luxadm is an utility, which discovers FC devices (luxadm probe), shut downs devives (luxadm shutown_device …) runs a firmware upgrade (luxadm download_firmware …) and many other things. Read more…

Categories: Storage, Unix Tags: ,

Resolved:[Load Manager Shared Memory]. Error is [28]: [No space left on device](for apache, pmserver etc. running on linux, solaris, unix)

April 23rd, 2011 No comments

This error may occur in pmserver, apache, oracle, rsync, up2date and many other services running on linux, solaris, unix, so it’s a widespread and a famous question if you try to search google the keyword:”[Load Manager Shared Memory].  Error is [28]: [No space left on device]“.
Now, let’s take pmserver running on solaris10 for example to demonstrate to you step by step on how to solve the annoying problem.
Firstly, from “[No space left on device]” and “Load Manager Shared Memory”, we firstly guessed that it’s caused by shortage of memory, but after checking, we can see that memory is enough to allocate:

1.check the total memory size:
# /usr/sbin/prtconf |grep -i mem
Memory size: 32640 Megabytes
memory (driver not attached)
virtual-memory (driver not attached)
2.check application project memory size:
# su – sbintprd #as you have guessed, pmserver is running by user sbintprd in the box
$ id -p
uid=71269(sbintprd) gid=70772(sbintprd) projid=3(default)
This means that pmserver is running inside ‘default’ project. Then let’s check the setting of “default” project:
# projects -l default
default
projid : 3
comment: “”
users  : (none)
groups : (none)
attribs: project.max-msg-ids=(privileged,256,deny)
project.max-shm-memory=(privileged,17179869184,deny)
# prctl -n project.max-shm-memory -i project default
project: 3: default
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-shm-memory
privileged      16.0GB      -   deny                                 -
system          16.0EB    max   deny                                 -
16GB available to ‘default’ project. How come the shortage of memory then?
Let’s bump up the max-shm-memory size by 2 GB to see what happens:
#prctl -n project.max-shm-memory -r -v 18gb -i project default
After this, we tried to bounce the pmserver, but the problem is still there:

#tail -f pmserver.log
INFO : LM_36070 [Fri Apr 22 22:19:42 2011] : (25218|1) The server is running on a host with 32 logical processors.
INFO : LM_36039 [Fri Apr 22 22:19:42 2011] : (25218|1) The maximum number of sessions that can run simultaneously is [10].
FATAL ERROR : CMN_1011 [Fri Apr 22 22:19:42 2011] : (25218|1) Error allocating system shared memory of [2000000] bytes for [Load Manager Shared Memory].  Error is [28]: [No space left on device]
FATAL ERROR : SF_34004 [Fri Apr 22 22:19:42 2011] : (25218|1) Server initialization failed.
INFO : SF_34014 [Fri Apr 22 22:19:42 2011] : (25218|1) Server shut down.

OK, then, we should think in other ways.
As we know, linux use shared memory between processes. We can use ipcs to check the information about  active  shared  memory  segments:

# ipcs -m|grep sbintprd
m  671088691   0          –rw——- sbintprd sbintprd
NOTE:pmserver is running by user sbintprd in the box
Then,
ipcs -mA|grep sbintprd|wc -l
92
And each of them use 20000 size:
IPC status from <running system> as of Sat Apr 23 03:29:51 BST 2011
T         ID      KEY        MODE        OWNER    GROUP  CREATOR   CGROUP NATTCH      SEGSZ  CPID  LPID   ATIME    DTIME    CTIME  ISMATTCH         PROJECT
Shared Memory:
m  671088691   0          –rw——- sbintprd sbintprd sbintprd sbintprd      1    2000000  7781 16109  3:28:35  3:28:50  2:01:43        0         default

Now, we can conclude that the sbintprd user has over allocated and is not freeing up the space.
Let’s clear the shared memeory:

#for i in `ipcs -m | grep prd | awk ‘{print $2}’`; do ipcrm -m $i; done
After this step, the pmserver started successfully. From the log we can see:
NFO : LM_36070 [Sat Apr 23 01:51:17 2011]
: (5979|1) The server is running on a host with 32 logical processors.
INFO : LM_36039 [Sat Apr 23 01:51:18 2011] : (5979|1) The maximum number of sessions that
can run simultaneously is [10].
INFO : CMN_1010 [Sat Apr 23 01:51:18 2011] : (5979|1) Allocated system shared memory [id =
469762275] of [2000000] bytes for [Load Manager Shared Memory].
INFO : LM_36095 [Sat Apr 23 01:51:50 2011] : (5979|1) Persistent session cache file
cleanup is scheduled to run on [Sun Apr 24 01:51:50 2011].
INFO : SF_34003 [Sat Apr 23 01:51:50 2011] : (5979|1) Server initialization completed.

Problem resolved!

Install curl utility on solaris

April 19th, 2011 1 comment

Firstly, download curl package from sunfreeware.com. unzip the tarball, and execute ./configure
# ./configure
checking whether to enable maintainer-specific portions of Makefiles… no
checking whether to enable debug build options… no
checking whether to enable compiler optimizer… (assumed) yes
checking whether to enable strict compiler warnings… no
checking whether to enable compiler warnings as errors… no
checking whether to enable curl debug memory tracking… no
checking whether to enable c-ares for DNS lookups… no
checking for sed… /usr/bin/sed
checking for grep… /usr/bin/grep
checking for egrep… /usr/bin/egrep
checking for ar… /usr/local/bin/ar
checking for a BSD-compatible install… ./install-sh -c
checking whether build environment is sane… configure: error: newly created file is older than distributed files!
Check your system clock

This was because the system time was off and the timestamps in the source code were in the future.  To fix this all you can do is copy the files to another directory using the copy command: Read more…

Categories: Unix Tags:

Using zpool clear to resolve zpool “cannot open” issue

April 17th, 2011 2 comments

I’ve created a zfs pool tank using one removable disk. And it worked all ok before I reboot it.

As I unpluged the disk before rebooting, so zpool list returned “FAULTED” on Healthy column:

-bash-3.00# zpool list
NAME   SIZE  ALLOC   FREE    CAP  HEALTH  ALTROOT
tank      -      -      -      -  FAULTED  -

Using zpool status -x to check:

-bash-3.00# zpool status -x
pool: tank
state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using ‘zpool online’.
see: http://www.sun.com/msg/ZFS-8000-3C
scrub: none requested
config:

NAME        STATE     READ WRITE CKSUM
tank        UNAVAIL      0     0     0  insufficient replicas
c4t0d0    UNAVAIL      0     0     0  cannot open

c4t0d0 is the name of the removable disk. Using iostat -En, I can see that the disk is not connect to system: Read more…

Categories: Unix Tags:

UNIX Printers not working status solution

April 17th, 2011 No comments

Sometimes, unix printers do not work due to some faults of the service. Usually you would have to clean the queue and restart the printer spooler service.

For an example see below:

From the printer-hosting system, check the status of unix printers(all status information):

root@localhost#lpstat –t

printer PRINTER_NAME faulted printing PRINTER_NAME-27847. enabled since Feb 28 10:07 2009. available.

PRINTER_NAME: processing

root@localhost# ping PRINTER_NAME

PRINTER_NAME is alive Read more…

Categories: Unix Tags: