create shared iscsi LUNs from local disk on Linux

January 19th, 2016

We can use iscsitarget to share local disks as iscsi LUNs for clients. Below are brief steps.

First, install some packages:

yum install kernel-devel -y #'kernel-uek-devel' if you are using oracle linux with UEK
cd iscsitarget-1.4.20.2 #download the package from here
make
make install

And below are some useful tips about iscsitarget:

The iSCSI target consists of a kernel module (/lib/modules/`uname -r`/extra/iscsi/iscsi_trgt.ko)

The kernel modules will be installed in the module directory of the kernel

The daemon(/usr/sbin/ietd) and the control tool(/usr/sbin/ietadm)

/etc/init.d/iscsi-target status

/etc/iet/{ietd.conf, initiators.allow, targets.allow}

Later, we can modify IET(iSCSI Enterprise Target) config file:

vi /etc/iet/ietd.conf

    Target iqn.2001-04.org.doxer:server1-shared-local-disks
        Lun 0 Path=/dev/sdb1,Type=fileio,ScsiId=0,ScsiSN=doxerorg
        Lun 1 Path=/dev/sdb2,Type=fileio,ScsiId=1,ScsiSN=doxerorg
        Lun 2 Path=/dev/sdb3,Type=fileio,ScsiId=2,ScsiSN=doxerorg
        Lun 3 Path=/dev/sdb4,Type=fileio,ScsiId=3,ScsiSN=doxerorg
        Lun 4 Path=/dev/sdb5,Type=fileio,ScsiId=4,ScsiSN=doxerorg
        Lun 5 Path=/dev/sdb6,Type=fileio,ScsiId=5,ScsiSN=doxerorg
        Lun 6 Path=/dev/sdb7,Type=fileio,ScsiId=6,ScsiSN=doxerorg
        Lun 7 Path=/dev/sdb8,Type=fileio,ScsiId=7,ScsiSN=doxerorg

chkconfig iscsi-target on
/etc/init.d/iscsi-target start

Assume the server sharing local disks for iscsi LUN is with IP 192.168.10.212, and we can do below on client hosts to scan for iscsi LUNs:

[root@client01 ~]# iscsiadm -m discovery -t st -p 192.168.10.212
Starting iscsid:                                           [  OK  ]
192.168.10.212:3260,1 iqn.2001-04.org.doxer:server1-shared-local-disks

[root@client01 ~]# iscsiadm -m node -T iqn.2001-04.org.doxer:server1-shared-local-disks -p 192.168.10.212 -l
Logging in to [iface: default, target: iqn.2001-04.org.doxer:server1-shared-local-disks, portal: 192.168.10.212,3260] (multiple)
Login to [iface: default, target: iqn.2001-04.org.doxer:server1-shared-local-disks, portal: 192.168.10.212,3260] successful.

[root@client01 ~]# iscsiadm -m session --rescan
Rescanning session [sid: 1, target: iqn.2001-04.org.doxer:server1-shared-local-disks, portal: 192.168.10.212,3260]

[root@client01 ~]# iscsiadm -m session -P 3

You can also scan for LUNs on the server with local disk shared, but you should make sure iscsi-target service boot up between network & iscsi:

mv /etc/rc3.d/S39iscsi-target /etc/rc3.d/S12iscsi-target

PS:

  1. iSCSI target port default to 3260. You can check iscsi connection info in /var/lib/iscsi/send_targets/ - <iscsi portal ip, port>, and /var/lib/iscsi/nodes/ - <target iqn>/<iscsi portal ip, port>.
  2. If there are multiple targets to log on, we can use "iscsiadm -m node --loginall=all".  "iscsiadm -m node -T iqn.2001-04.org.doxer:server1-shared-local-disks -p 192.168.10.212 -u" to log out.
  3. More info is here (includes windows iscsi operation), and here is about create iSCSI on Oracle ZFS appliance.

 

Categories: Hardware, NAS, SAN, Storage Tags:

systemctl man page

January 7th, 2016
  1. brief introduction https://www.digitalocean.com/community/tutorials/how-to-use-systemctl-to-manage-systemd-services-and-units
  2. systemd introduction http://www.freedesktop.org/software/systemd/man/systemd.html
  3. man page http://www.freedesktop.org/software/systemd/man/systemctl.html

systemctl start application.service
systemctl start application

systemctl reload application
systemctl reload-or-restart application.service

systemctl status application
systemctl is-active autofs

#ls -l /lib/systemd/system|egrep -i 'network|autofs'
ls -l /usr/lib/systemd/system|egrep -i 'network|autofs'

systemctl enable application.service
systemctl is-enabled application
systemctl is-failed application

#ls -l /etc/systemd/system/|grep wants
ls -l /etc/systemd/system/multi-user.target.wants/

systemctl list-units #shows only active units
systemctl list-units --all #all units that systemd has loaded (or attempted to load)
systemctl list-units --all --state=inactive
systemctl list-units --type=service

systemctl list-unit-files #every available unit file within the systemd paths(include ones not attempted to load)

systemctl cat autofs #isplay the unit file
systemctl show autofs #low-level properties of a unit
systemctl show autofs -p StartLimitInterval

systemctl edit nginx.service
ls -l /etc/systemd/system|grep '.d$' #take precedence over system's unit definition in /lib/systemd/system
systemctl edit --full nginx.service #instead of using snippet overwriting
rm -r /etc/systemd/system/nginx.service.d #remove a snippet
rm /etc/systemd/system/nginx.service #remove a full modified unit file
systemctl daemon-reload

systemctl list-dependencies autofs
systemctl list-dependencies autofs --all

systemctl mask nginx.service #prevent the Nginx service from being started, automatically or manually
systemctl unmask nginx.service

systemctl get-default #runlevel, multiple targets can be active at one time
systemctl set-default graphical.target #GUI
systemctl list-unit-files --type=target #available targets
systemctl list-units --type=target #all of the active targets

systemctl list-dependencies multi-user.target
systemctl isolate multi-user.target #all of the graphical units will be stopped, and switch to multi-user.target
systemctl rescue #instead of using systemctl isolate rescue.target to put into rescue (single-user) mode
systemctl halt
systemctl poweroff
systemctl reboot #or just reboot

Categories: IT Architecture, Linux, Systems Tags:

resolved – /etc/rc.local not executed on boot in linux

November 11th, 2015

When you find your scripts in /etc/rc.local not executed along with system boots, then one possibility is that the previous subsys script takes too long to execute, as /etc/rc.local is usually the last one to execute, i.e. S99local. To prove which is the culprit subsys that gets stuck, you can edit /etc/rc.d/rc(which is from /etc/inittab):

[root@host1 tmp] vi /etc/rc.d/rc
# Now run the START scripts.
for i in /etc/rc$runlevel.d/S* ; do
        check_runlevel "$i" || continue

        # Check if the subsystem is already up.
        subsys=${i#/etc/rc$runlevel.d/S??}
        [ -f /var/lock/subsys/$subsys -o -f /var/lock/subsys/$subsys.init ] \
                && continue

        # If we're in confirmation mode, get user confirmation
        if [ -f /var/run/confirm ]; then
                confirm $subsys
                test $? = 1 && continue
        fi

        update_boot_stage "$subsys"
        # Bring the subsystem up.
        if [ "$subsys" = "halt" -o "$subsys" = "reboot" ]; then
                export LC_ALL=C
                exec $i start
        fi
        if LC_ALL=C egrep -q "^..*init.d/functions" $i \
                        || [ "$subsys" = "single" -o "$subsys" = "local" ]; then
                echo $i>>/var/tmp/process.txt
                $i start
                echo $i>>/var/tmp/process_end.txt
        else
                echo $i>>/var/tmp/process_self.txt
                action $"Starting $subsys: " $i start
                echo $i>>/var/tmp/process_self_end.txt
        fi
done

Then you can reboot the system, and check files /var/tmp/{process.txt,process_end.txt,process_self.txt,process_self_end.txt}. In one of the host, I found below entries:

[root@host1 tmp]# tail process.txt
/etc/rc3.d/S85gpm
/etc/rc3.d/S90crond
/etc/rc3.d/S90xfs
/etc/rc3.d/S91vncserver
/etc/rc3.d/S95anacron
/etc/rc3.d/S95atd
/etc/rc3.d/S95emagent_public
/etc/rc3.d/S97rhnsd
/etc/rc3.d/S98avahi-daemon
/etc/rc3.d/S98gcstartup

[root@host1 tmp]# tail process_end.txt
/etc/rc3.d/S85gpm
/etc/rc3.d/S90crond
/etc/rc3.d/S90xfs
/etc/rc3.d/S91vncserver
/etc/rc3.d/S95anacron
/etc/rc3.d/S95atd
/etc/rc3.d/S95emagent_public
/etc/rc3.d/S97rhnsd
/etc/rc3.d/S98avahi-daemon

So from here, we can see /etc/rc3.d/S98gcstartup tried start, but it took too long to finish. To make sure scripts in /etc/rc.local get executed and also the stuck script /etc/rc3.d/S98gcstartup get executed also, we can do this:

[root@host1 tmp]# mv /etc/rc3.d/S98gcstartup /etc/rc3.d/s98gcstartup
[root@host1 tmp]# vi /etc/rc.local

#!/bin/sh

touch /var/lock/subsys/local

#put your scripts here - begin

#put your scripts here - end

#put the stuck script here and make sure it's the last line
/etc/rc3.d/s98gcstartup start

After this, reboot the host and check whether scripts in /etc/rc.local got executed.

Categories: IT Architecture, Kernel, Linux, Systems, Unix Tags:

resolved – xend error: (98, ‘Address already in use’)

November 4th, 2015

Today one OVS server met issue with ovs-agent and need reboot. As there were VMs running on it, so I tried live migrating xen based VMs using "xm migrate -l", but below error occurred:

-bash-3.2# xm migrate -l vm1 server1
Error: can't connect: (111, 'Connection refused')
Usage: xm migrate  

Migrate a domain to another machine.

Options:

-h, --help           Print this help.
-l, --live           Use live migration.
-p=portnum, --port=portnum
                     Use specified port for migration.
-n=nodenum, --node=nodenum
                     Use specified NUMA node on target.
-s, --ssl            Use ssl connection for migration.

As xen migration use xend-relocation-server of xend-relocation-port, so this "Connection refused" issue was most likely related to this. And below is the configuration of /etc/xen/xend-config.sxp:

-bash-3.2# egrep -v '^#|^$' /etc/xen/xend-config.sxp
(xend-unix-server yes)
(xend-relocation-server yes)
(xend-relocation-ssl-server no)
(xend-unix-path /var/lib/xend/xend-socket)
(xend-relocation-port 8002)
(xend-relocation-server-ssl-key-file /etc/ovs-agent/cert/key.pem)
(xend-relocation-server-ssl-cert-file /etc/ovs-agent/cert/certificate.pem)
(xend-relocation-address '')
(xend-relocation-hosts-allow '')
(vif-script vif-bridge)
(dom0-min-mem 0)
(enable-dom0-ballooning no)
(dom0-cpus 0)
(vnc-listen '0.0.0.0')
(vncpasswd '')
(xend-domains-lock-path /opt/ovs-agent-2.3/utils/dlm.py)
(domain-shutdown-hook /opt/ovs-agent-2.3/utils/hook_vm_shutdown.py)

And to check the progresses related with these:

-bash-3.2# lsof -i :8002
COMMAND   PID     USER   FD   TYPE    DEVICE SIZE NODE NAME
xend    12095 root    5u  IPv4 146473964       TCP *:teradataordbms (LISTEN)

-bash-3.2# ps auxww|egrep '/opt/ovs-agent-2.3/utils/dlm.py|/opt/ovs-agent-2.3/utils/hook_vm_shutdown.py'
root  3501  0.0  0.0   3924   740 pts/0    S+   08:37   0:00 egrep /opt/ovs-agent-2.3/utils/dlm.py|/opt/ovs-agent-2.3/utils/hook_vm_shutdown.py
root 19007  0.0  0.0  12660  5840 ?        D    03:44   0:00 python /opt/ovs-agent-2.3/utils/dlm.py --lock --name vm1 --uuid 56f17372-0a86-4446-8603-d82423c54367
root 27446  0.0  0.0  12664  5956 ?        D    05:11   0:00 python /opt/ovs-agent-2.3/utils/dlm.py --lock --name vm2 --uuid eb1a4e84-3572-4543-8b1d-685b856d98c7

When processes went into D state(uninterruptable sleep), it'll be troublesome, as these processes can only be killed by reboot the whole system. However, on this server, we had many VMs running, and now live migration/relocation was blocked by issue caused by itself, and deadlock surfaced. And seems reboot was the only way to "resolve" the issue.

Firstly, I tried bounce xend(/etc/init.d/xend restart), but met below error indicated in /var/log/message:

[2015-11-04 04:39:43 24026] INFO (SrvDaemon:227) Xend stopped due to signal 15.
[2015-11-04 04:39:43 24115] INFO (SrvDaemon:332) Xend Daemon started
[2015-11-04 04:39:43 24115] INFO (SrvDaemon:336) Xend changeset: unavailable.
[2015-11-04 04:40:14 24115] ERROR (SrvDaemon:349) Exception starting xend ((98, 'Address already in use'))
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvDaemon.py", line 339, in run
    relocate.listenRelocation()
  File "/usr/lib/python2.4/site-packages/xen/xend/server/relocate.py", line 159, in listenRelocation
    hosts_allow = hosts_allow)
  File "/usr/lib/python2.4/site-packages/xen/web/tcp.py", line 36, in __init__
    connection.SocketListener.__init__(self, protocol_class)
  File "/usr/lib/python2.4/site-packages/xen/web/connection.py", line 89, in __init__
    self.sock = self.createSocket()
  File "/usr/lib/python2.4/site-packages/xen/web/tcp.py", line 49, in createSocket
    sock.bind((self.interface, self.port))
  File "", line 1, in bind
error: (98, 'Address already in use')

And later, I realized that we can change xend-relocation-port to have a try. So I made below changes to /etc/xen/xend-config.sxp:

(xend-relocation-port 8003)

And later, bounced xend:

/etc/init.d/xend stop; /etc/init.d/xend start

PS: xend bouncing will not affect running VMs, as I had compared qemu output(ps -ef|grep qemu). A tip here is that when xen related commands(xm list, and so on) stopped working, checking for "qemu" simulator processes will help you get the VM list.

After this, "xm migrate -l vm1 server1" still failed with the same can't connect: (111, 'Connection refused'). And I resolved this by specifying port:(you may need stop iptables too):

-bash-3.2# xm migrate -l -p 8002 vm1 server1

Now the live migration went on smoothly, and after all VMs were migrated, I changed xend-relocation-port back to 8002 and reboot the server to fix the D state(uninterruptable sleep) issue.

PS:

If you find error "Error: can't connect: (111, 'Connection refused')" even after above WA, then you can change back from 8003 to 8002, or even from 8003 to 8004, restart iptables, and try again.

Categories: Clouding, IT Architecture Tags:

resolved – mountd Cannot export /scratch, possibly unsupported filesystem or fsid= required

November 2nd, 2015

Today when I tried to access one autofs exported filesystem on one client host, it reported error:

[root@server01 ~]# cd /net/client01/scratch/
-bash: cd: scratch/: No such file or directory

From server side, we can see it's exported and writable:

[root@client01 ~]# cat /etc/exports
/scratch *(rw,no_root_squash)

[root@client01 ~]# df -h /scratch
Filesystem            Size  Used Avail Use% Mounted on
nas01:/export/generic/share_scratch
                      200G  103G   98G  52% /scratch

So I tried mount manually on client side, but still error reported:

[root@server01 ~]# mount client01:/scratch /media
mount: client01:/scratch failed, reason given by server: Permission denied

Here's log on server side:

[root@client01 scratch]# tail -f /var/log/messages
Nov  2 03:41:58 client01 mountd[2195]: Caught signal 15, un-registering and exiting.
Nov  2 03:41:58 client01 kernel: nfsd: last server has exited, flushing export cache
Nov  2 03:41:59 client01 kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Nov  2 03:41:59 client01 kernel: NFSD: starting 90-second grace period
Nov  2 03:42:11 client01 mountd[16046]: authenticated mount request from 192.162.100.137:1002 for /scratch (/scratch)
Nov  2 03:42:11 client01 mountd[16046]: Cannot export /scratch, possibly unsupported filesystem or fsid= required
Nov  2 03:43:05 client01 mountd[16046]: authenticated mount request from 192.162.100.137:764 for /scratch (/scratch)
Nov  2 03:43:05 client01 mountd[16046]: Cannot export /scratch, possibly unsupported filesystem or fsid= required
Nov  2 03:44:11 client01 mountd[16046]: authenticated mount request from 192.165.28.40:670 for /scratch (/scratch)
Nov  2 03:44:11 client01 mountd[16046]: Cannot export /scratch, possibly unsupported filesystem or fsid= required

After some debugging, the reason was that the exported FS was already NFS filesystem on server side, and NFS FS cannot be exported again:

[root@client01 ~]# df -h /scratch
Filesystem            Size  Used Avail Use% Mounted on
nas01:/export/generic/share_scratch
                      200G  103G   98G  52% /scratch

To WA this, just do normal mount of the NFS share instead of using autofs export:

[root@server01 ~]# mount nas01:/export/generic/share_scratch /media
Categories: Hardware, IT Architecture, NAS, Storage Tags:

resolved – Yum Error: Cannot retrieve repository metadata (repomd.xml) for repository Please verify its path and try again

October 13th, 2015

Today when I tried to install some package using yum on one linux host, below error prompted even I set up the right proxy:

[root@testvm1 yum.repos.d]# export http_proxy=http://right-proxy.example.com:80
[root@testvm1 yum.repos.d]# export ftp_proxy=http://right-proxy.example.com:80
[root@testvm1 yum.repos.d]# export https_proxy=http://right-proxy.example.com:80
[root@testvm1 yum.repos.d]# yum install xinetd
Loaded plugins: refresh-packagekit
http://public-yum.oracle.com/repo/OracleLinux/OL6/UEK/latest/x86_64/repodata/repomd.xml: [Errno 12] Timeout on http://public-yum.oracle.com/repo/OracleLinux/OL6/UEK/latest/x86_64/repodata/repomd.xml: (28, 'connect() timed out!')
Trying other mirror.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: ol6_UEK_latest. Please verify its path and try again

After some debugging, I found that there's wrong proxy setting in /etc/yum.conf(it take precede over proxy setting in shell). And by comment out the wrong proxy, yum worked.

From:

[root@testvm1 yum.repos.d]# grep proxy /etc/yum.conf
proxy=http://wrong-proxy.example.com:80

To:

[root@testvm1 yum.repos.d]# grep proxy /etc/yum.conf
#proxy=http://wrong-proxy.example.com:80
Categories: IT Architecture, Linux, Systems Tags:

resolved – sar -d failed with Requested activities not available in file

September 11th, 2015

Today when I tried to get report for activity for each block device using "sar -d", error "Requested activities not available in file" prompted:

[root@test01 ~]# sar -f /var/log/sa/sa11 -d
Requested activities not available in file

To fix this, I did the following:

[root@test01 ~]# cat /etc/cron.d/sysstat
# run system activity accounting tool every 10 minutes
*/10 * * * * root /usr/lib64/sa/sa1 -d 1 1 #add -d. It was */10 * * * * root /usr/lib64/sa/sa1 1 1
# generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib64/sa/sa2 -A

Later, move /var/log/sa/sa11 and run sa1 with "-d" to generate a new one:

[root@test01 ~]# mv /var/log/sa/sa11{,.bak}
[root@test01 ~]# /usr/lib64/sa/sa1 -d 1 1 #this generated /var/log/sa/sa11
[root@test01 ~]# /usr/lib64/sa/sa1 -d 1 1 #this put data into /var/log/sa/sa11

After this, the disk activity data could be retrieved:

[root@test01 ~]# sar -f /var/log/sa/sa11 -d
Linux 2.6.18-238.0.0.0.1.el5xen (slc03nsv) 09/11/15

09:26:04 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
09:26:22 dev202-0 10.39 0.00 133.63 12.86 0.00 0.06 0.06 0.07
09:26:22 dev202-1 10.39 0.00 133.63 12.86 0.00 0.06 0.06 0.07
09:26:22 dev202-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: dev202-0 10.39 0.00 133.63 12.86 0.00 0.06 0.06 0.07
Average: dev202-1 10.39 0.00 133.63 12.86 0.00 0.06 0.06 0.07
Average: dev202-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

For column "DEV", you can check mapping in /dev/*(dev202-2 is /dev/xvda2):

[root@test01 ~]# ls -l /dev/xvda2
brw-r----- 1 root disk 202, 2 Jan 26 2015 /dev/xvda2

Or you can add "-p" to sar which is simper

[root@test01 ~]# sar -f /var/log/sa/sa11 -d -p
Linux 2.6.18-238.0.0.0.1.el5xen (slc03nsv) 09/11/15

09:26:04 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
09:26:22 xvda 10.39 0.00 133.63 12.86 0.00 0.06 0.06 0.07
09:26:22 root 10.39 0.00 133.63 12.86 0.00 0.06 0.06 0.07
09:26:22 xvda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: xvda 10.39 0.00 133.63 12.86 0.00 0.06 0.06 0.07
Average: root 10.39 0.00 133.63 12.86 0.00 0.06 0.06 0.07
Average: xvda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

PS:

Here is more info about sysstat in Linux.

Categories: IT Architecture, Linux, Systems, Unix Tags:

resolved – ORA-12578: TNS:wallet open failed

September 1st, 2015

If you met error like "ORA-12578: TNS:wallet open failed", then one possibility is that the Oracle RAC Database is using a local wallet(created with parameter -auto_login_local, which is from 11.2 release, usually local wallet is used in a highly confidential system) but the wallet is migrated from another server.

The migrated local wallet can be opened and read without problems on the new host, but the information inside does not match the hostname and this leads to the error ORA-12578: TNS:wallet open failed. Be noted that even on the original host, the wallet cannot be used by another OS user.

Master encryption key is stored in wallet in TDE(transparent data encryption), it's the key that wraps(encrypts) the Oracle TDE columns and tablespace encryption keys. The wallet must be open before you can create the encrypted tablespace and before you can store or retrieve encrypted data. Also when recovering a database with encrypted tablespaces (for example after a SHUTDOWN ABORT or a catastrophic error that brings down the database instance), you must open the Oracle wallet after database mount and before database open, so the recovery process can decrypt data blocks and redo. When you open the wallet, it is available to all session, and it remains open until you explicitly close it or until the database is shut down.

Tablespace encryption encrypts at the physical block level, can perform better than encrypting many columns. When using column encryption for tables,  there is only one table key regardless of the number of encrypted columns in a table, and the table key is stored in data dictionary. And when using tablespace encryption, the tablespace key is stored in the header of each datafile of the encrypted tablespace.

Below is from here:

TDE uses a two tier key mechanism. When TDE column encryption is applied to an existing application table column, a new table key is created and stored in the Oracle data dictionary. When TDE tablespace encryption is used, the individual tablespace keys are stored in the header of the underlying OS file(s). The table and tablespace keys are encrypted using the TDE master encryption key. The master encryption key is generated when TDE is initialized and stored outside the database in the Oracle Wallet. Both the master key and table keys can be independently changed (rotated, re-keyed) based on company security policies. Tablespace keys cannot be re-keyed (rotated); work around is to move the data into a new encrypted tablespace. Oracle recommends backing up the wallet before and after each master key change.

Categories: Databases, IT Architecture, Oracle DB Tags:

resolved – nfsv4 Warning: rpc.idmapd appears not to be running. All uids will be mapped to the nobody uid

August 31st, 2015

Today when we tried to mount a nfs share as NFSv4(mount -t nfs4 testnas:/export/testshare01 /media), the following message prompted:

Warning: rpc.idmapd appears not to be running.
All uids will be mapped to the nobody uid.

And I had a check of file permissions under the mount point, they were owned by nobody as indicated:

[root@testvm~]# ls -l /u01/local
total 8
drwxr-xr-x 2 nobody nobody 2 Dec 18 2014 SMKIT
drwxr-xr-x 4 nobody nobody 4 Dec 19 2014 ServiceManager
drwxr-xr-x 4 nobody nobody 4 Mar 31 08:47 ServiceManager.15.1.5
drwxr-xr-x 4 nobody nobody 4 May 13 06:55 ServiceManager.15.1.6

However, as I checked, rpcidmapd was running:

[root@testvm ~]# /etc/init.d/rpcidmapd status
rpc.idmapd (pid 11263) is running...

After some checking, I found it's caused by low nfs version and some missed nfs4 packages on the OEL5 boxes. You can do below to fix this:

yum -y update nfs-utils nfs-utils-lib nfs-utils-lib-devel sblim-cmpi-nfsv4 nfs4-acl-tools
/etc/init.d/nfs restart
/etc/init.d/rpcidmapd restart

If you are using Oracle SUN ZFS appliance, then please make sure to set on ZFS side anonymous user mapping to "root" and also Custom NFSv4 identity domain to the one in your env(e.g. example.com) to avoid NFS clients nobody owner issue.

resolved – yum Error performing checksum Trying other mirror and finally No more mirrors to try

August 27th, 2015

Today when I was installing one package in Linux, below error prompted:

[root@testhost yum.repos.d]# yum list --disablerepo=* --enablerepo=yumpaas
Loaded plugins: rhnplugin, security
This system is not registered with ULN.
ULN support will be disabled.
yumpaas | 2.9 kB 00:00
yumpaas/primary_db | 30 kB 00:00
http://yumrepo.example.com/paas_oel5/repodata/b8e385ebfdd7bed69b7619e63cd82475c8bacc529db7b8c145609b64646d918a-primary.sqlite.bz2: [Errno -3] Error performing checksum
Trying other mirror.
yumpaas/primary_db | 30 kB 00:00
http://yumrepo.example.com/paas_oel5/repodata/b8e385ebfdd7bed69b7619e63cd82475c8bacc529db7b8c145609b64646d918a-primary.sqlite.bz2: [Errno -3] Error performing checksum
Trying other mirror.
Error: failure: repodata/b8e385ebfdd7bed69b7619e63cd82475c8bacc529db7b8c145609b64646d918a-primary.sqlite.bz2 from yumpaas: [Errno 256] No more mirrors to try.

The repo "yumpaas" is hosted on OEL 6 VM, which by default use sha2 for checksum. However, for OEL5 VMs(the VM running yum), yum uses sha1 by default. So I've WA this by install python-hashlib to extend yum's capability to handle sha2(python-hashlib from external repo EPEL).

[root@testhost yum.repos.d]# yum install python-hashlib

And after this, the problematic repo can be used. But to resolve this issue permanently without WA on OEL5 VMs, we should recreate the repo with sha1 for checksum algorithm(createrepo -s sha1).