ORA-12154 – TNS:could not resolve the connect identifier specified

December 2nd, 2014

Today I try to connect to one Db service named pditui using the following easy connect method:

export ORACLE_HOME=/u01/app/oracle/product/11.2.0/client_1
export PATH=$ORACLE_HOME/bin:$PATH

sqlplus "sys/password@(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = scanname.test.example.com)(PORT = 1521))(CONNECT_DATA =(SERVER = DEDICATED)(SERVICE_NAME = pditui)))" as sysdba

However, the following error messages prompted:

SQL*Plus: Release 11.2.0.3.0 Production on Tue Dec 2 14:07:35 2014

Copyright (c) 1982, 2011, Oracle. All rights reserved.

ERROR:
ORA-12154: TNS:could not resolve the connect identifier specified

Enter user-name:
ERROR:
ORA-12162: TNS:net service name is incorrectly specified

The username/password and service name were all correct, but the error was there. After some checking, I found that it was caused by wrong configuration of NAMES.DIRECTORY_PATH in file $ORACLE_HOME/network/admin/sqlnet.ora:

[root@centos-doxer ~]# cat /u01/app/oracle/product/11.2.0/client_1/network/admin/sqlnet.ora
# sqlnet.ora Network Configuration File: /u01/app/oracle/product/11.2.0/client_1/network/admin/sqlnet.ora
# Generated by Oracle configuration tools.

#NAMES.DIRECTORY_PATH= (TNSNAMES)
NAMES.DIRECTORY_PATH= (TNSNAMES,ezconnect) -- add ezconnect methond here

ADR_BASE = /u01/app/oracle

After this, the connection was ok.

PS: 

You can read more about NAMES.DIRECTORY_PATH in file $ORACLE_HOME/network/admin/sqlnet.ora here.

Categories: Databases, Oracle DB Tags:

resolved – switching from Unbreakable Enterprise Kernel Release 2(UEKR2) to UEKR3 on Oracle Linux 6

November 24th, 2014

As we can see from here, the available kernels include the following 3 for Oracle Linux 6:

3.8.13 Unbreakable Enterprise Kernel Release 3 (x86_64 only)
2.6.39 Unbreakable Enterprise Kernel Release 2**
2.6.32 (Red Hat compatible kernel)

On one of our OEL6 VM, we found that it's using UEKR2:

[root@testbox aime]# cat /etc/issue
Oracle Linux Server release 6.4
Kernel \r on an \m

[root@testbox aime]# uname -r
2.6.39-400.211.1.el6uek.x86_64

So how can we switch the kernel to UEKR3(3.8)?

If your linux version is 6.4, first do a "yum update -y" to upgrade to 6.5 and uppper, and then reboot the host, and follow steps below.

[root@testbox aime]# ls -l /etc/grub.conf
lrwxrwxrwx. 1 root root 22 Aug 21 18:24 /etc/grub.conf -> ../boot/grub/grub.conf

[root@testbox aime]# yum update -y

If your linux version is 6.5 and upper, you'll find /etc/grub.conf and /boot/grub/grub.conf are different files(for yum update one. If your host is OEL6.5 when installed, then /etc/grub.conf should be softlink too):

[root@testbox ~]# ls -l /etc/grub.conf
-rw------- 1 root root 2356 Oct 20 05:26 /etc/grub.conf

[root@testbox ~]# ls -l /boot/grub/grub.conf
-rw------- 1 root root 1585 Nov 23 21:46 /boot/grub/grub.conf

In /etc/grub.conf, you'll see entry like below:

title Oracle Linux Server Unbreakable Enterprise Kernel (3.8.13-44.1.3.el6uek.x86_64)
root (hd0,0)
kernel /vmlinuz-3.8.13-44.1.3.el6uek.x86_64 ro root=/dev/mapper/vg01-lv_root rd_LVM_LV=vg01/lv_root rd_NO_LUKS rd_LVM_LV=vg01/lv_swap LANG=en_US.UTF-8 KEYTABLE=us console=hvc0 rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_NO_DM rhgb quiet
initrd /initramfs-3.8.13-44.1.3.el6uek.x86_64.img

What you'll need to do is just copying the entries above from /etc/grub.conf to /boot/grub/grub.conf, and then reboot the VM.

After rebooting, you'll find the kernel is now at UEKR3(3.8).

PS:

If you find the VM is OEL6.5 and /etc/grub.conf is a softlink to /boot/grub/grub.conf, then you could do the following to upgrade kernel to UEKR3:

1. add the following lines to /etc/yum.repos.d/public-yum-ol6.repo:

[public_ol6_UEKR3]
name=UEKR3 for Oracle Linux 6 ($basearch)
baseurl=http://public-yum.oracle.com/repo/OracleLinux/OL6/UEKR3/latest/$basearch/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1

2. List and install UEKR3:

[root@testbox aime]# yum list|grep kernel-uek|grep public_ol6_UEKR3
kernel-uek.x86_64 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-debug.x86_64 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-debug-devel.x86_64 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-devel.x86_64 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-doc.noarch 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-firmware.noarch 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-headers.x86_64 3.8.13-26.2.4.el6uek public_ol6_UEKR3

[root@testbox aime]# yum install -y kernel-uek* --disablerepo=* --enablerepo=public_ol6_UEKR3

3. Reboot

resolved – ORA-27300 OS system dependent operation:fork failed with status: 11

November 18th, 2014

Today we observed all our DB were in down status, and in the trace file:

Errors in file /u01/database/diag/rdbms/oimdb/OIMDB/trace/OIMDB_psp0_3173.trc:

ORA-27300: OS system dependent operation:fork failed with status: 11

ORA-27301: OS failure message: Resource temporarily unavailable

ORA-27302: failure occurred at: skgpspawn5

After some searching for ORA-27300, I found this article, which suggested that it's the issue of user processes used up and system could not spawn new one at the time. As the problem happened at Mon Nov 17 02:08:51 2014, so I did some check using sysstat sar:

[root@test sa]# sar -f /var/log/sa/sa17 -s 00:00:00 -e 03:20:00
Linux 2.6.32-300.27.1.el5uek (slcn11vmf0029) 11/17/14

00:00:01 CPU %user %nice %system %iowait %steal %idle
00:10:01 all 1.16 0.12 0.48 0.71 0.18 97.35
00:20:02 all 1.30 0.00 0.47 0.95 0.19 97.10
00:30:01 all 1.88 0.00 0.63 1.98 0.19 95.32
00:40:01 all 1.00 0.00 0.35 2.15 0.18 96.32
00:50:01 all 1.09 0.00 0.40 0.47 0.18 97.87
01:00:01 all 1.03 0.00 0.34 0.25 0.16 98.23
01:10:01 all 3.98 0.02 1.72 4.26 0.22 89.80
01:20:01 all 9.98 0.13 5.99 47.40 0.31 36.19
01:30:01 all 1.86 0.00 1.24 48.72 0.16 48.01
01:40:01 all 1.08 0.00 0.82 48.77 0.18 49.15
01:50:01 all 1.54 0.00 0.97 49.32 0.18 47.98
02:00:01 all 1.05 0.00 0.85 48.74 0.18 49.19 --- problem occurred at Mon Nov 17 02:08:51 2014
02:10:01 all 10.14 0.14 8.95 44.75 0.34 35.68
02:20:01 all 0.06 0.00 0.21 1.87 0.07 97.78
02:30:01 all 0.08 0.00 0.29 2.81 0.08 96.74
02:40:01 all 0.09 0.00 0.31 3.08 0.08 96.44
02:50:01 all 0.05 0.00 0.13 0.96 0.06 98.81
03:00:01 all 0.07 0.00 0.26 2.38 0.07 97.22
03:10:01 all 0.06 0.12 0.21 1.52 0.07 98.02
Average: all 1.85 0.03 1.20 15.89 0.16 80.88

[root@test sa]# sar -f /var/log/sa/sa17 -s 01:10:00 -e 02:11:00 -A
......
......
01:10:01 kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
01:20:01 259940 15482728 98.35 2004 11703696 0 2104504 100.00 194056 -- even all SWAP spaces were used up
01:30:01 398584 15344084 97.47 904 11703152 0 2104504 100.00 191728
01:40:01 409104 15333564 97.40 984 11716924 0 2104504 100.00 191404
01:50:01 452844 15289824 97.12 1004 11711548 0 2104504 100.00 189076
02:00:01 440780 15301888 97.20 1424 11757600 0 2104504 100.00 189364
02:10:01 14602712 1139956 7.24 19548 382588 1978020 126484 6.01 3096
Average: 2760661 12982007 82.46 4311 9829251 329670 1774834 84.34 159787

So this proved that system was very busy during that time. I then increased oracle user's user process number to 131072 in /etc/security/limits.conf with the following:

* soft nproc 131072
* hard nproc 131072

And also set kernel.pid_max to 139264(which is 131072 plus 8192 which is recommended for OS stability) in /etc/sysctl.conf.

[root@test ~]# sysctl -a|grep pid_max
kernel.pid_max = 139264

Then increased memory from 16G to 32G of the box, and reboot.

resolved – high value of RX overruns in ifconfig

November 13th, 2014

Today we tried ssh to one host, it stuck soon there. And we observed that RX overruns in ifconfig output was high:

[root@test /]# ifconfig bond0
bond0 Link encap:Ethernet HWaddr 00:10:E0:0D:AD:5E
inet6 addr: fe80::210:e0ff:fe0d:ad5e/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:140234052 errors:0 dropped:0 overruns:12665 frame:0
TX packets:47259596 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:34204561358 (31.8 GiB) TX bytes:21380246716 (19.9 GiB)

Receiver overruns usually occur when packets come in faster than the kernel can service the last interrupt. But in our case, we are seeing increasing inbound errors on interface Eth105/1/7 of device bcd-c1z1-swi-5k07a/b, did shut and no shut but no change. And after some more debugging, we found one bad SFP/cable. After replaced that, the server came back to normal.

ucf-c1z1-swi-5k07a# sh int Eth105/1/7 | i err
4065 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 output error 0 collision 0 deferred 0 late collision

ucf-c1z1-swi-5k07a# sh int Eth105/1/7 | i err
4099 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 output error 0 collision 0 deferred 0 late collision

ucf-c1z1-swi-5k07a# sh int Eth105/1/7 counters errors

--------------------------------------------------------------------------------
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards
--------------------------------------------------------------------------------
Eth105/1/7 3740 483 0 4223 0 0

--------------------------------------------------------------------------------
Port Single-Col Multi-Col Late-Col Exces-Col Carri-Sen Runts
--------------------------------------------------------------------------------
Eth105/1/7 0 0 0 0 0 3740

--------------------------------------------------------------------------------
Port Giants SQETest-Err Deferred-Tx IntMacTx-Er IntMacRx-Er Symbol-Err
--------------------------------------------------------------------------------
Eth105/1/7 0 -- 0 0 0 0

ucf-c1z1-swi-5k07a# sh int Eth105/1/7 counters errors

--------------------------------------------------------------------------------
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards
--------------------------------------------------------------------------------
Eth105/1/7 4386 551 0 4937 0 0

--------------------------------------------------------------------------------
Port Single-Col Multi-Col Late-Col Exces-Col Carri-Sen Runts
--------------------------------------------------------------------------------
Eth105/1/7 0 0 0 0 0 4386

--------------------------------------------------------------------------------
Port Giants SQETest-Err Deferred-Tx IntMacTx-Er IntMacRx-Er Symbol-Err
--------------------------------------------------------------------------------
Eth105/1/7 0 -- 0 0 0 0

PS:

During debugging, we also found on server side, the interface eth0 is half duplex and speed at 100Mb/s:

-bash-3.2# ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Half
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000003 (3)
Link detected: yes

However, it should be full duplex and 1000Mb/s. So we also changed the speed, duplex to auto auto on switch and after that, the OS side is now showing the expected value:

-bash-3.2# ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000003 (3)
Link detected: yes

resolved – Exception: ha_check_cpu_compatibility failed:

November 12th, 2014

Today when I tried to add one OVS server in OVMM pool, the following error messages prompted:

2014-11-12 06:25:18.083 WARNING failed:errcode=00000, errmsg=Unexpected error: <Exception: ha_check_cpu_compatibility
failed:<Exception: CPU not compatible! {'new_ovs_03': 'vendor_id=GenuineIntel;cpu_family=6;model=45', 'slce27vmf1002': 'vendor_id=GenuineIntel;cpu_family=6;model=44', 'new_ovs_03': 'vendor_id=GenuineIntel;cpu_family=6;model=45'}>

StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteHA.py", line 248, in ha_check_cpu_compatibility
raise Exception("CPU not compatible! %s" % repr(d))
>

StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteCluster.py", line 609, in cluster_check_prerequisite
raise Exception(msg)

StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteCluster.py", line 646, in _cluster_setup
#_check(ret)
File "/opt/ovs-agent-2.3/OVSXCluster.py", line 340, in _check
raise OVSException(error=ret["error"])

2014-11-12 06:25:18.083 NOTIFICATION Failed setup cluster for agent 2.2.0...
2014-11-12 06:25:18.083 ERROR Cluster Setup when adding server
2014-11-12 06:25:18.087 ERROR [Server Pool Management][Server Pool][test_serverpool]:During adding servers ([new_ovs_03]) to server pool (test_serverpool), Cluster setup failed: (OVM-1011 OVM Manager communication with new_ovs_03 for operation HA Setup for Oracle VM Agent 2.2.0 failed:
errcode=00000, errmsg=Unexpected error: <Exception: ha_check_cpu_compatibility
failed:<Exception: CPU not compatible! {'new_ovs_03': 'vendor_id=GenuineIntel;cpu_family=6;model=45', 'slce27vmf1002': 'vendor_id=GenuineIntel;cpu_family=6;model=44', 'new_ovs_03': 'vendor_id=GenuineIntel;cpu_family=6;model=45'}>

)

As stated in the error message, the adding failed at cpu check. To resolve this, we can comment out the code where cpu check occurred.

File /opt/ovs-agent-2.3/OVSSiteCluster.py in line 646 on each OVS server in the server pool:

#ret = cluster_check_prerequisite(ha_enable=ha_enable)
#_check(ret)

Then bounce ovs-agent on each OVS server, and try add again.

resolved – /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory

November 7th, 2014

In one of our script, error prompted when we ran it today:

[root@testhost01 ~]# su - user1
[user1@testhost01 ~]$ /home/testuser/run_as_root 'su'
-bash: /usr/local/packages/aime/ias/run_as_root: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory

From the output, we can see that it's complaining for not founding file /lib/ld-linux.so.2:

[user1@testhost01 ~]$ ls -l /lib/ld-linux.so.2
ls: cannot access /lib/ld-linux.so.2: No such file or directory

I then checked on another host and found /lib/ld-linux.so.2 belonged to package glibc:

[root@centos-doxer ~]# ls -l /lib/ld-linux.so.2
lrwxrwxrwx 1 root root 9 May 9 2013 /lib/ld-linux.so.2 -> ld-2.5.so
[root@centos-doxer ~]# rpm -qf /lib/ld-linux.so.2
glibc-2.5-107.el5_9.4

However, on the problematic host, glibc was installed:

[root@testhost01 user1]# rpm -qa|grep glibc
glibc-headers-2.12-1.149.el6.x86_64
glibc-common-2.12-1.149.el6.x86_64
glibc-devel-2.12-1.149.el6.x86_64
glibc-2.12-1.149.el6.x86_64

I then tried making a soft link from /lib64/ld-2.12.so to /lib/ld-linux.so.2:

[root@testhost01 ~]# ln -s /lib64/ld-2.12.so /lib/ld-linux.so.2
[root@testhost01 ~]# su - user1
[user1@testhost01 ~]$ /usr/local/packages/aime/ias/run_as_root su
-bash: /usr/local/packages/aime/ias/run_as_root: Accessing a corrupted shared library

Hmmm, so it now complained about corrupted shared library. Maybe we need 32bit of glibc? So I removed the softlink, and then installed glibc.i686:

rm -rf /lib/ld-linux.so.2
yum -y install glibc.i686

After installation, I found /lib/ld-linux.so.2 was there already:

[root@testhost01 user1]# ls -l /lib/ld-linux.so.2
lrwxrwxrwx 1 root root 10 Nov 7 03:46 /lib/ld-linux.so.2 -> ld-2.12.so
[root@testhost01 user1]# rpm -qf /lib/ld-linux.so.2
glibc-2.12-1.149.el6.i686

And when I ran again the command, it returned ok:

[root@testhost01 user1]# su - user1
[user1@testhost01 ~]$ /home/testuser/run_as_root 'su'
[root@testhost01 user1]#

So from this, we can see that the issue was caused by /usr/local/packages/aime/ias/run_as_root supports only 32bit of glibc.

Categories: IT Architecture, Kernel, Linux, Systems Tags: