resolved – nfsv4 Warning: rpc.idmapd appears not to be running. All uids will be mapped to the nobody uid

August 31st, 2015

Today when we tried to mount a nfs share as NFSv4(mount -t nfs4 testnas:/export/testshare01 /media), the following message prompted:

Warning: rpc.idmapd appears not to be running.
All uids will be mapped to the nobody uid.

And I had a check of file permissions under the mount point, they were owned by nobody as indicated:

[root@testvm~]# ls -l /u01/local
total 8
drwxr-xr-x 2 nobody nobody 2 Dec 18 2014 SMKIT
drwxr-xr-x 4 nobody nobody 4 Dec 19 2014 ServiceManager
drwxr-xr-x 4 nobody nobody 4 Mar 31 08:47 ServiceManager.15.1.5
drwxr-xr-x 4 nobody nobody 4 May 13 06:55 ServiceManager.15.1.6

However, as I checked, rpcidmapd was running:

[root@testvm ~]# /etc/init.d/rpcidmapd status
rpc.idmapd (pid 11263) is running...

After some checking, I found it's caused by low nfs version and some missed nfs4 packages on the OEL5 boxes. You can do below to fix this:

yum -y update nfs-utils nfs-utils-lib nfs-utils-lib-devel sblim-cmpi-nfsv4 nfs4-acl-tools
/etc/init.d/nfs restart
/etc/init.d/rpcidmapd restart

If you are using Oracle SUN ZFS appliance, then please make sure anonymous user mapping is set to "root" on ZFS side.

resolved – yum Error performing checksum Trying other mirror and finally No more mirrors to try

August 27th, 2015

Today when I was installing one package in Linux, below error prompted:

[root@testhost yum.repos.d]# yum list --disablerepo=* --enablerepo=yumpaas
Loaded plugins: rhnplugin, security
This system is not registered with ULN.
ULN support will be disabled.
yumpaas | 2.9 kB 00:00
yumpaas/primary_db | 30 kB 00:00
http://yumrepo.example.com/paas_oel5/repodata/b8e385ebfdd7bed69b7619e63cd82475c8bacc529db7b8c145609b64646d918a-primary.sqlite.bz2: [Errno -3] Error performing checksum
Trying other mirror.
yumpaas/primary_db | 30 kB 00:00
http://yumrepo.example.com/paas_oel5/repodata/b8e385ebfdd7bed69b7619e63cd82475c8bacc529db7b8c145609b64646d918a-primary.sqlite.bz2: [Errno -3] Error performing checksum
Trying other mirror.
Error: failure: repodata/b8e385ebfdd7bed69b7619e63cd82475c8bacc529db7b8c145609b64646d918a-primary.sqlite.bz2 from yumpaas: [Errno 256] No more mirrors to try.

The repo "yumpaas" is hosted on OEL 6 VM, which by default use sha2 for checksum. However, for OEL5 VMs(the VM running yum), yum uses sha1 by default. So I've WA this by install python-hashlib to extend yum's capability to handle sha2(python-hashlib from external repo EPEL).

[root@testhost yum.repos.d]# yum install python-hashlib

And after this, the problematic repo can be used. But to resolve this issue permanently without WA on OEL5 VMs, we should recreate the repo with sha1 for checksum algorithm(createrepo -s sha1).

resolved – Unable to locally verify the issuer’s authority

August 18th, 2015

Today we found below error in weblogic server log:

[2015-08-16 14:38:21,866] [WARN] [N/A] [testhost.example.com] [Thrown exception when ...... javax.net.ssl.SSLHandshakeException: General SSLEngine problem
......
Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
......
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:323)

And we had a test on the linux server:

[root@testhost1 ~]# wget https://testurl.example.com/
--2015-08-17 10:15:56-- https://testurl.example.com/
Resolving testurl.example.com... 192.168.77.230
Connecting to testurl.example.com|192.168.77.230|:443... connected.
ERROR: cannot verify testurl.example.com's certificate, issued by `/C=US/O=Symantec Corporation/OU=Symantec Trust Network/CN=Symantec Class 3 Secure Server CA - G4':
Unable to locally verify the issuer's authority.
To connect to testurl.example.com insecurely, use `--no-check-certificate'.
Unable to establish SSL connection.

And tested using openssl s_client:

[root@testhost ~]# openssl s_client -connect testurl.example.com:443 -showcerts
CONNECTED(00000003)
depth=1 /C=US/O=Symantec Corporation/OU=Symantec Trust Network/CN=Symantec Class 3 Secure Server CA - G4
verify error:num=2:unable to get issuer certificate
issuer= /C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=(c) 2006 VeriSign, Inc. - For authorized use only/CN=VeriSign Class 3 Public Primary Certification Authority - G5
verify return:0
---
Certificate chain
0 s:/C=US/ST=California/L=Redwood Shores/O=Test Corporation/OU=FOR TESTING PURPOSES ONLY/CN=*.testurl.example.com
i:/C=US/O=Symantec Corporation/OU=Symantec Trust Network/CN=Symantec Class 3 Secure Server CA - G4
-----BEGIN CERTIFICATE-----
MIIFOTCCBCGgAwIBAgIQB1z+1jMRPPv/TrBA2qh5YzANBgkqhkiG9w0BAQsFADB+
MQswCQYDVQQGEwJVUzEdMBsGA
......
-----END CERTIFICATE-----
---
Server certificate
subject=/C=US/ST=California/L=Redwood Shores/O=Test Corporation/OU=FOR TESTING PURPOSES ONLY/CN=*.testurl.example.com
issuer=/C=US/O=Symantec Corporation/OU=Symantec Trust Network/CN=Symantec Class 3 Secure Server CA - G4
---
No client certificate CA names sent
---
SSL handshake has read 1510 bytes and written 447 bytes
---
New, TLSv1/SSLv3, Cipher is AES128-SHA
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
SSL-Session:
Protocol : TLSv1
Cipher : AES128-SHA
Session-ID: 2B5E228E4705C12D7EE7B5C608AE5E3781C5626C1A99A8D60D21DB2350D4F4DE
Session-ID-ctx:
Master-Key: F06632FD67E22EFB1134CDC0EEE8B8B44972747D804D5C7969095FCB2A692E90F111DC4FC6081F4D94C7561BB556DA16
Key-Arg : None
Krb5 Principal: None
Start Time: 1439793731
Timeout : 300 (sec)
Verify return code: 2 (unable to get issuer certificate)
---

From above, the issue should be caused by CA file missing, as the SSL certificate is relatively new using sha2. So I determined to download CA file from symantec, both "Symantec Class 3 Secure Server CA - G4" and "VeriSign Class 3 Public Primary Certification Authority - G5".

[root@testhost ~]# wget --no-check-certificate 'http://symantec.tbs-certificats.com/SymantecSSG4.crt'

[root@testhost ~]# openssl version -a|grep OPENSSLDIR

OPENSSLDIR: "/etc/pki/tls"

[root@testhost ~]# cp /etc/pki/tls/certs/ca-bundle.crt /etc/pki/tls/certs/ca-bundle.crt.bak

[root@testhost ~]# openssl x509 -text -in "SymantecSSG4.crt" >> /etc/pki/tls/certs/ca-bundle.crt

And test wget again, but still the same error:

[root@testhost1 ~]# wget https://testurl.example.com/
--2015-08-17 10:38:59-- https://testurl.example.com/
Resolving myproxy.example.com... 192.168.19.20
Connecting to myproxy.example.com|192.168.19.20|:80... connected.
ERROR: cannot verify testurl.example.com's certificate, issued by `/C=US/O=Symantec Corporation/OU=Symantec Trust Network/CN=Symantec Class 3 Secure Server CA - G4':
unable to get issuer certificate
To connect to testurl.example.com insecurely, use `--no-check-certificate'.
Unable to establish SSL connection.

Then I determined to install another CA indicated in the error log(G4 above may not be needed):

[root@testhost1 ~]# wget --no-check-certificate 'https://www.symantec.com/content/en/us/enterprise/verisign/roots/VeriSign-Class%203-Public-Primary-Certification-Authority-G5.pem'

[root@testhost1 ~]# openssl x509 -text -in "VeriSign-Class 3-Public-Primary-Certification-Authority-G5.pem" >> /etc/pki/tls/certs/ca-bundle.crt

This time, wget passed https:

[root@testhost1 ~]# wget https://testurl.example.com/
--2015-08-17 10:40:03-- https://testurl.example.com/
Resolving myproxy.example.com... 192.168.19.20
Connecting to myproxy.example.com|192.168.19.20|:80... connected.
Proxy request sent, awaiting response... 200 OK
Length: 11028 (11K) [text/html]
Saving to: `index.html'

100%[======================================>] 11,028 --.-K/s in 0.02s

2015-08-17 10:40:03 (526 KB/s) - `index.html' saved [11028/11028]

PS:
You can check more info here about adding trusted root certificates to the server - Mac OS X / Windows / Linux (Ubuntu, Debian) / Linux (CentOs 6) / Linux (CentOs 5)

resolved – passwd: User not known to the underlying authentication module

August 12th, 2015

Today we met the following error when tried to change one user's password:

[root@test ~]# echo 2cool|passwd --stdin test
Changing password for user test.
passwd: User not known to the underlying authentication module

And after some searching work, we found it's caused by /etc/shadow file missing:

[root@test ~]# ls -l /etc/shadow
ls: /etc/shadow: No such file or directory

To generate the /etc/shadow file, use pwconv command:

[root@test ~]# pwconv

[root@test ~]# ls -l /etc/shadow
-r-------- 1 root root 1254 Aug 11 12:13 /etc/shadow

After this, we can reset password without issue:

[root@test ~]# echo mypass|passwd --stdin test
Changing password for user test.
passwd: all authentication tokens updated successfully.

Categories: IT Architecture, Linux, Systems, Unix Tags:

resolved – ORA-27303: additional information: Invalid protocol requested (2) or protocol not loaded

August 6th, 2015

Today when I tried to start up crs after patching(crsctl start crs), the following error occurred in /u01/app/11.2.0.4/grid/log/test/alerttest.log:

2015-07-31 11:57:54.702:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(18654)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/scratch/app/11.2.0.4/grid/log/slcai081/agent/ohasd/oraagent_oracle/oraagent_oracle.log"
2015-07-31 11:57:59.894:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(18654)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/scratch/app/11.2.0.4/grid/log/slcai081/agent/ohasd/oraagent_oracle/oraagent_oracle.log"
2015-07-31 11:58:05.121:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(18654)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/scratch/app/11.2.0.4/grid/log/slcai081/agent/ohasd/oraagent_oracle/oraagent_oracle.log"
2015-07-31 11:58:10.322:
[ohasd(17944)]CRS-2807:Resource 'ora.asm' failed to start automatically.
2015-07-31 11:58:10.326:
[ohasd(17944)]CRS-2807:Resource 'ora.crsd' failed to start automatically.

And running crsctl check crs, I saw that CRS/EM were not up:

[root@test ~]# /u01/app/11.2.0.4/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager

Later I tried manually start up ora.crsd, but still failed:

[root@test ~]# /u01/app/11.2.0.4/grid/bin/crsctl start res ora.crsd -init
CRS-2672: Attempting to start 'ora.asm' on 'test'
CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-27504: IPC error creating OSD context
ORA-00600: internal error code, arguments: [OSDEP_INTERNAL], [], [], [], [], [], [], [], [], [], [], []
ORA-27302: failure occurred at: sskgxplp
ORA-27303: additional information: Invalid protocol requested (2) or protocol not loaded.
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0.4/grid/log/test/agent/ohasd/oraagent_oracle//oraagent_oracle.log".
CRS-2674: Start of 'ora.asm' on 'test' failed
CRS-2679: Attempting to clean 'ora.asm' on 'test'
CRS-2681: Clean of 'ora.asm' on 'test' succeeded
CRS-4000: Command Start failed, or completed with errors.

From the output, it's complaining about "Invalid protocol requested". As this RAC is non-exadata, so we should use ipc_g protocol rather than ipc_rds which is for Exadata when relinking oracle binaries. So I made the following change in the script:

grep ins_rdbms.mk apply_BP_and_OneOffs.sh
su $OWNER -c "cd $1/rdbms/lib && export ORACLE_HOME=$1 && /usr/bin/make -f ins_rdbms.mk ipc_rds lbac_off dv_off ioracle > /dev/null
su $OWNER -c "cd $1/rdbms/lib && export ORACLE_HOME=$1 && /usr/bin/make -f ins_rdbms.mk ipc_rds ioracle > /dev/null"

After that, I rollbacked all patches, and re-run patching, and later all were ok.

Categories: Databases, IT Architecture, Oracle DB Tags:

remove usb disk from LVM

July 29th, 2015

On some servers, USB stick may become part of the LVM volume group. The usb is more prone to fail, which will cause big issues when they start. Also they work at a different speed than the drives, and this also causes performance issues.

For example, on one server, you can see below:

[root@test ~]# vgs
  VG      #PV #LV #SN Attr   VSize VFree
  DomUVol   2   1   0 wz--n- 3.77T    0

[root@test~]# pvs
  PV         VG      Fmt  Attr PSize PFree
  /dev/sdb1  DomUVol lvm2 a--  3.76T    0 #all PE allocated
  /dev/sdc1  DomUVol lvm2 a--  3.59G    0 #this is usb device

[root@test~]# lvs
  LV      VG      Attr   LSize Origin Snap%  Move Log Copy%  Convert
  scratch DomUVol -wi-ao 3.77T

[root@test ~]# df -h /scratch
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/DomUVol-scratch
                      3.7T  257G  3.3T   8% /scratch

[root@test~]# pvdisplay /dev/sdc1
  --- Physical volume ---
  PV Name               /dev/sdc1
  VG Name               DomUVol
  PV Size               3.61 GB / not usable 14.61 MB
  Allocatable           yes (but full)
  PE Size (KByte)       32768
  Total PE              115
  Free PE               0
  Allocated PE          115 #so physical extends are allocated on this usb device
  PV UUID               a8a0P5-AlCz-Cu5e-acC2-ldEQ-NPCn-kc4Du0

As you can see from above, as the USB device has PE allocated, so to remove that(pvreduce), we need first move PE from it to other PV. We can see the other PV has all space allocated(can also confirm from vgs above):

[root@test~]# pvdisplay /dev/sdb1
  --- Physical volume ---
  PV Name               /dev/sdb1
  VG Name               DomUVol
  PV Size               3.76 TB / not usable 30.22 MB
  Allocatable           yes (but full)
  PE Size (KByte)       32768
  Total PE              123360
  Free PE               0
  Allocated PE          123360
  PV UUID               5IyCgh-JsiV-EnpO-XKj4-yxNq-pRjI-d7LKGy

Here are the steps for taking usb device out of VG:

umount /scratch
#fsck -y /dev/mapper/DomUVol-scratch
lvreduce --size -5G /dev/mapper/DomUVol-scratch
resize2fs /dev/mapper/DomUVol-scratch

[root@test ~]# vgs
  VG      #PV #LV #SN Attr   VSize VFree
  DomUVol   2   1   0 wz--n- 3.77T 5.00G

[root@test ~]# pvs
  PV         VG      Fmt  Attr PSize PFree
  /dev/sdb1  DomUVol lvm2 a--  3.76T 1.41G
  /dev/sdc1  DomUVol lvm2 a--  3.59G 3.59G #PEs on the usb device are all freed, if not, use pvmove /dev/sdc1

[root@test ~]# pvdisplay /dev/sdc1
  --- Physical volume ---
  PV Name               /dev/sdc1
  VG Name               DomUVol
  PV Size               3.61 GB / not usable 14.61 MB
  Allocatable           yes
  PE Size (KByte)       32768
  Total PE              115
  Free PE               115
  Allocated PE          0
  PV UUID               a8a0P5-AlCz-Cu5e-acC2-ldEQ-NPCn-kc4Du0

[root@test ~]# vgreduce DomUVol /dev/sdc1
  Removed "/dev/sdc1" from volume group "DomUVol"

[root@test ~]# pvs
  PV         VG      Fmt  Attr PSize PFree
  /dev/sdb1  DomUVol lvm2 a--  3.76T 1.41G
  /dev/sdc1          lvm2 a--  3.61G 3.61G #VG column is empty for the usb device, this confirms the usb device is taken out of VG

PS:

If you want to shrink lvm volume(lvreduce) on /, then you'll need go to linux rescue mode. Select "Skip" when the system prompts for the option of mounting / to /mnt/sysimage. Run "lvm vgchange -a y" first and then the other steps are more or less the same as above, but you'll need type "lvm" before any lvm command, such as "lvm lvs", "lvm pvs", "lvm lvreduce" etc.