Archive

Author Archive

resolved – error:0D0C50A1:asn1 encoding routines:ASN1_item_verify:unknown message digest algorithm

December 17th, 2014 No comments

Today when I tried using curl to get url info, error occurred like below:

[root@centos-doxer ~]# curl -i --user username:password -H "Content-Type: application/json" -X POST --data @/u01/shared/addcredential.json https://testserver.example.com/actions -v

* About to connect() to testserver.example.com port 443

*   Trying 10.242.11.201... connected

* Connected to testserver.example.com (10.242.11.201) port 443

* successfully set certificate verify locations:

*   CAfile: /etc/pki/tls/certs/ca-bundle.crt

  CApath: none

* SSLv2, Client hello (1):

SSLv3, TLS handshake, Server hello (2):

SSLv3, TLS handshake, CERT (11):

SSLv3, TLS alert, Server hello (2):

error:0D0C50A1:asn1 encoding routines:ASN1_item_verify:unknown message digest algorithm

* Closing connection #0

After some searching, I found that it's caused by the current version of openssl(openssl-0.9.8e) does not support SHA256 Signature Algorithm. To resolve this, there are two ways:

1. add -k parameter to curl to ignore the SSL error

2. upgrade openssl to at least openssl-0.9.8o. Here's the way to upgrade openssl:

wget --no-check-certificate 'https://www.openssl.org/source/old/0.9.x/openssl-0.9.8o.tar.gz'
tar zxvf openssl-0.9.8o.tar.gz
cd openssl-0.9.8o
./config --prefix=/usr --openssldir=/usr/openssl
make
make test
make install

After this, run openssl version to confirm:

[root@centos-doxer openssl-0.9.8o]# /usr/bin/openssl version
OpenSSL 0.9.8o 01 Jun 2010

PS:

If you installed openssl from rpm package, then you'll find the openssl version is still the old one even after you install the new package. This is expected so don't rely too much on rpm:

[root@centos-doxer openssl-0.9.8o]# /usr/bin/openssl version
OpenSSL 0.9.8o 01 Jun 2010

Even after rebuilding rpm DB(rpm --rebuilddb), it's still the old version:

[root@centos-doxer openssl-0.9.8o]# rpm -qf /usr/bin/openssl
openssl-0.9.8e-26.el5_9.1
openssl-0.9.8e-26.el5_9.1

[root@centos-doxer openssl-0.9.8o]# rpm -qa|grep openssl
openssl-0.9.8e-26.el5_9.1
openssl-devel-0.9.8e-26.el5_9.1
openssl-0.9.8e-26.el5_9.1
openssl-devel-0.9.8e-26.el5_9.1

 

output analysis of linux last command

December 9th, 2014 No comments

Here's the output of last on my linux host:

root     pts/9        remote.example   Tue Dec  9 14:51   still logged in
testuser pts/2        :3               Tue Dec  9 14:49   still logged in
aime     pts/1        :2               Tue Dec  9 14:49   still logged in
root     pts/0        :1               Tue Dec  9 14:49   still logged in
testuser pts/13       remote.example   Tue Dec  9 10:48 - 10:52  (00:02)
reboot   system boot  2.6.23           Tue Dec  9 10:11          (04:39)
root     pts/11       10.182.120.179   Thu Dec  4 17:14 - 17:20  (00:06)
root     pts/11       10.182.120.179   Thu Dec  4 17:14 - 17:14  (00:00)
root     pts/10       10.182.120.179   Thu Dec  4 15:55 - 15:55  (00:00)
testuser pts/14       :3.0             Tue Dec  2 15:44 - 15:46  (00:01)
testuser pts/12       :3.0             Tue Dec  2 15:44 - 15:46  (00:01)
testuser pts/13       :3.0             Tue Dec  2 15:44 - 15:46  (00:01)
testuser pts/15       :3.0             Tue Dec  2 15:44 - 15:46  (00:01)
testuser pts/11       :3.0             Tue Dec  2 15:44 - 15:46  (00:01)
testuser pts/16       :3.0             Tue Dec  2 15:44 - 15:46  (00:01)
root     pts/10       10.182.120.179   Tue Dec  2 11:20 - 11:20  (00:00)
root     pts/7        10.182.120.179   Tue Dec  2 10:15 - down  (6+07:39)
root     pts/6        10.182.120.179   Tue Dec  2 10:15 - 17:55 (6+07:39)
root     pts/5        10.182.120.179   Tue Dec  2 10:15 - 17:55 (6+07:39)
root     pts/4        10.182.120.179   Tue Dec  2 10:15 - 17:55 (6+07:39)
root     pts/3        10.182.120.179   Tue Dec  2 10:15 - 17:55 (6+07:39)
root     pts/2        :1               Tue Dec  2 10:00 - down  (6+07:55)
aime     pts/1        :2               Tue Dec  2 10:00 - down  (6+07:55)
testuser pts/0        :3               Tue Dec  2 10:00 - down  (6+07:55)
reboot   system boot  2.6.23           Tue Dec  2 09:58         (6+07:56)

Here's some analysis:

  • User "reboot" is a pseudo-user for system reboot. Entries between two reboots are users who log on the system during two reboots. For info about login shells(.bash_profile) and interactive non-login shells(.bashrc), you can refer to here.
  • Here're columns meanings:

Column 1: User logged on

Column 2: The tty name after logging on

Column 3: Remote IP or hostname from which the user logged on. You can see ":1", ":2", ":3", that's vnc port number which vncserver are rendering against.

Column 4: Begin/End time of the session. If "still logged in", then means the user is still logged on; if there's value in parenthesis, then that's the total time of the logged on. For the latest "reboot"(red line 1), means the uptime till now; For the second "reboot"(red line 2), means the uptime between two reboots. Note however that this time is not always accurate, for example after system crash and unusual restart sequence. last calculates it as time between it and next reboot/shutdown.

 

Categories: IT Architecture, Linux, Systems, Unix Tags:

ORA-12154 – TNS:could not resolve the connect identifier specified

December 2nd, 2014 No comments

Today I try to connect to one Db service named pditui using the following easy connect method:

export ORACLE_HOME=/u01/app/oracle/product/11.2.0/client_1
export PATH=$ORACLE_HOME/bin:$PATH

sqlplus "sys/password@(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = scanname.test.example.com)(PORT = 1521))(CONNECT_DATA =(SERVER = DEDICATED)(SERVICE_NAME = pditui)))" as sysdba

However, the following error messages prompted:

SQL*Plus: Release 11.2.0.3.0 Production on Tue Dec 2 14:07:35 2014

Copyright (c) 1982, 2011, Oracle. All rights reserved.

ERROR:
ORA-12154: TNS:could not resolve the connect identifier specified

Enter user-name:
ERROR:
ORA-12162: TNS:net service name is incorrectly specified

The username/password and service name were all correct, but the error was there. After some checking, I found that it was caused by wrong configuration of NAMES.DIRECTORY_PATH in file $ORACLE_HOME/network/admin/sqlnet.ora:

[root@centos-doxer ~]# cat /u01/app/oracle/product/11.2.0/client_1/network/admin/sqlnet.ora
# sqlnet.ora Network Configuration File: /u01/app/oracle/product/11.2.0/client_1/network/admin/sqlnet.ora
# Generated by Oracle configuration tools.

#NAMES.DIRECTORY_PATH= (TNSNAMES)
NAMES.DIRECTORY_PATH= (TNSNAMES,ezconnect) -- add ezconnect methond here

ADR_BASE = /u01/app/oracle

After this, the connection was ok.

PS: 

You can read more about NAMES.DIRECTORY_PATH in file $ORACLE_HOME/network/admin/sqlnet.ora here.

Categories: Databases, Oracle DB Tags:

resolved – switching from Unbreakable Enterprise Kernel Release 2(UEKR2) to UEKR3 on Oracle Linux 6

November 24th, 2014 No comments

As we can see from here, the available kernels include the following 3 for Oracle Linux 6:

3.8.13 Unbreakable Enterprise Kernel Release 3 (x86_64 only)
2.6.39 Unbreakable Enterprise Kernel Release 2**
2.6.32 (Red Hat compatible kernel)

On one of our OEL6 VM, we found that it's using UEKR2:

[root@testbox aime]# cat /etc/issue
Oracle Linux Server release 6.4
Kernel \r on an \m

[root@testbox aime]# uname -r
2.6.39-400.211.1.el6uek.x86_64

So how can we switch the kernel to UEKR3(3.8)?

If your linux version is 6.4, first do a "yum update -y" to upgrade to 6.5 and uppper, and then reboot the host, and follow steps below.

[root@testbox aime]# ls -l /etc/grub.conf
lrwxrwxrwx. 1 root root 22 Aug 21 18:24 /etc/grub.conf -> ../boot/grub/grub.conf

[root@testbox aime]# yum update -y

If your linux version is 6.5 and upper, you'll find /etc/grub.conf and /boot/grub/grub.conf are different files(for yum update one. If your host is OEL6.5 when installed, then /etc/grub.conf should be softlink too):

[root@testbox ~]# ls -l /etc/grub.conf
-rw------- 1 root root 2356 Oct 20 05:26 /etc/grub.conf

[root@testbox ~]# ls -l /boot/grub/grub.conf
-rw------- 1 root root 1585 Nov 23 21:46 /boot/grub/grub.conf

In /etc/grub.conf, you'll see entry like below:

title Oracle Linux Server Unbreakable Enterprise Kernel (3.8.13-44.1.3.el6uek.x86_64)
root (hd0,0)
kernel /vmlinuz-3.8.13-44.1.3.el6uek.x86_64 ro root=/dev/mapper/vg01-lv_root rd_LVM_LV=vg01/lv_root rd_NO_LUKS rd_LVM_LV=vg01/lv_swap LANG=en_US.UTF-8 KEYTABLE=us console=hvc0 rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_NO_DM rhgb quiet
initrd /initramfs-3.8.13-44.1.3.el6uek.x86_64.img

What you'll need to do is just copying the entries above from /etc/grub.conf to /boot/grub/grub.conf, and then reboot the VM.

After rebooting, you'll find the kernel is now at UEKR3(3.8).

PS:

If you find the VM is OEL6.5 and /etc/grub.conf is a softlink to /boot/grub/grub.conf, then you could do the following to upgrade kernel to UEKR3:

1. add the following lines to /etc/yum.repos.d/public-yum-ol6.repo:

[public_ol6_UEKR3]
name=UEKR3 for Oracle Linux 6 ($basearch)
baseurl=http://public-yum.oracle.com/repo/OracleLinux/OL6/UEKR3/latest/$basearch/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1

2. List and install UEKR3:

[root@testbox aime]# yum list|grep kernel-uek|grep public_ol6_UEKR3
kernel-uek.x86_64 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-debug.x86_64 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-debug-devel.x86_64 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-devel.x86_64 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-doc.noarch 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-firmware.noarch 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-headers.x86_64 3.8.13-26.2.4.el6uek public_ol6_UEKR3

[root@testbox aime]# yum install -y kernel-uek* --disablerepo=* --enablerepo=public_ol6_UEKR3

3. Reboot

resolved – ORA-27300 OS system dependent operation:fork failed with status: 11

November 18th, 2014 No comments

Today we observed all our DB were in down status, and in the trace file:

Errors in file /u01/database/diag/rdbms/oimdb/OIMDB/trace/OIMDB_psp0_3173.trc:

ORA-27300: OS system dependent operation:fork failed with status: 11

ORA-27301: OS failure message: Resource temporarily unavailable

ORA-27302: failure occurred at: skgpspawn5

After some searching for ORA-27300, I found this article, which suggested that it's the issue of user processes used up and system could not spawn new one at the time. As the problem happened at Mon Nov 17 02:08:51 2014, so I did some check using sysstat sar:

[root@test sa]# sar -f /var/log/sa/sa17 -s 00:00:00 -e 03:20:00
Linux 2.6.32-300.27.1.el5uek (slcn11vmf0029) 11/17/14

00:00:01 CPU %user %nice %system %iowait %steal %idle
00:10:01 all 1.16 0.12 0.48 0.71 0.18 97.35
00:20:02 all 1.30 0.00 0.47 0.95 0.19 97.10
00:30:01 all 1.88 0.00 0.63 1.98 0.19 95.32
00:40:01 all 1.00 0.00 0.35 2.15 0.18 96.32
00:50:01 all 1.09 0.00 0.40 0.47 0.18 97.87
01:00:01 all 1.03 0.00 0.34 0.25 0.16 98.23
01:10:01 all 3.98 0.02 1.72 4.26 0.22 89.80
01:20:01 all 9.98 0.13 5.99 47.40 0.31 36.19
01:30:01 all 1.86 0.00 1.24 48.72 0.16 48.01
01:40:01 all 1.08 0.00 0.82 48.77 0.18 49.15
01:50:01 all 1.54 0.00 0.97 49.32 0.18 47.98
02:00:01 all 1.05 0.00 0.85 48.74 0.18 49.19 --- problem occurred at Mon Nov 17 02:08:51 2014
02:10:01 all 10.14 0.14 8.95 44.75 0.34 35.68
02:20:01 all 0.06 0.00 0.21 1.87 0.07 97.78
02:30:01 all 0.08 0.00 0.29 2.81 0.08 96.74
02:40:01 all 0.09 0.00 0.31 3.08 0.08 96.44
02:50:01 all 0.05 0.00 0.13 0.96 0.06 98.81
03:00:01 all 0.07 0.00 0.26 2.38 0.07 97.22
03:10:01 all 0.06 0.12 0.21 1.52 0.07 98.02
Average: all 1.85 0.03 1.20 15.89 0.16 80.88

[root@test sa]# sar -f /var/log/sa/sa17 -s 01:10:00 -e 02:11:00 -A
......
......
01:10:01 kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
01:20:01 259940 15482728 98.35 2004 11703696 0 2104504 100.00 194056 -- even all SWAP spaces were used up
01:30:01 398584 15344084 97.47 904 11703152 0 2104504 100.00 191728
01:40:01 409104 15333564 97.40 984 11716924 0 2104504 100.00 191404
01:50:01 452844 15289824 97.12 1004 11711548 0 2104504 100.00 189076
02:00:01 440780 15301888 97.20 1424 11757600 0 2104504 100.00 189364
02:10:01 14602712 1139956 7.24 19548 382588 1978020 126484 6.01 3096
Average: 2760661 12982007 82.46 4311 9829251 329670 1774834 84.34 159787

So this proved that system was very busy during that time. I then increased oracle user's user process number to 131072 in /etc/security/limits.conf with the following:

* soft nproc 131072
* hard nproc 131072

And also set kernel.pid_max to 139264(which is 131072 plus 8192 which is recommended for OS stability) in /etc/sysctl.conf.

[root@test ~]# sysctl -a|grep pid_max
kernel.pid_max = 139264

Then increased memory from 16G to 32G of the box, and reboot.

resolved – high value of RX overruns in ifconfig

November 13th, 2014 No comments

Today we tried ssh to one host, it stuck soon there. And we observed that RX overruns in ifconfig output was high:

[root@test /]# ifconfig bond0
bond0 Link encap:Ethernet HWaddr 00:10:E0:0D:AD:5E
inet6 addr: fe80::210:e0ff:fe0d:ad5e/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:140234052 errors:0 dropped:0 overruns:12665 frame:0
TX packets:47259596 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:34204561358 (31.8 GiB) TX bytes:21380246716 (19.9 GiB)

Receiver overruns usually occur when packets come in faster than the kernel can service the last interrupt. But in our case, we are seeing increasing inbound errors on interface Eth105/1/7 of device bcd-c1z1-swi-5k07a/b, did shut and no shut but no change. And after some more debugging, we found one bad SFP/cable. After replaced that, the server came back to normal.

ucf-c1z1-swi-5k07a# sh int Eth105/1/7 | i err
4065 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 output error 0 collision 0 deferred 0 late collision

ucf-c1z1-swi-5k07a# sh int Eth105/1/7 | i err
4099 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 output error 0 collision 0 deferred 0 late collision

ucf-c1z1-swi-5k07a# sh int Eth105/1/7 counters errors

--------------------------------------------------------------------------------
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards
--------------------------------------------------------------------------------
Eth105/1/7 3740 483 0 4223 0 0

--------------------------------------------------------------------------------
Port Single-Col Multi-Col Late-Col Exces-Col Carri-Sen Runts
--------------------------------------------------------------------------------
Eth105/1/7 0 0 0 0 0 3740

--------------------------------------------------------------------------------
Port Giants SQETest-Err Deferred-Tx IntMacTx-Er IntMacRx-Er Symbol-Err
--------------------------------------------------------------------------------
Eth105/1/7 0 -- 0 0 0 0

ucf-c1z1-swi-5k07a# sh int Eth105/1/7 counters errors

--------------------------------------------------------------------------------
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards
--------------------------------------------------------------------------------
Eth105/1/7 4386 551 0 4937 0 0

--------------------------------------------------------------------------------
Port Single-Col Multi-Col Late-Col Exces-Col Carri-Sen Runts
--------------------------------------------------------------------------------
Eth105/1/7 0 0 0 0 0 4386

--------------------------------------------------------------------------------
Port Giants SQETest-Err Deferred-Tx IntMacTx-Er IntMacRx-Er Symbol-Err
--------------------------------------------------------------------------------
Eth105/1/7 0 -- 0 0 0 0

PS:

During debugging, we also found on server side, the interface eth0 is half duplex and speed at 100Mb/s:

-bash-3.2# ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Half
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000003 (3)
Link detected: yes

However, it should be full duplex and 1000Mb/s. So we also changed the speed, duplex to auto auto on switch and after that, the OS side is now showing the expected value:

-bash-3.2# ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000003 (3)
Link detected: yes

resolved – Exception: ha_check_cpu_compatibility failed:

Today when I tried to add one OVS server in OVMM pool, the following error messages prompted:

2014-11-12 06:25:18.083 WARNING failed:errcode=00000, errmsg=Unexpected error: <Exception: ha_check_cpu_compatibility
failed:<Exception: CPU not compatible! {'new_ovs_03': 'vendor_id=GenuineIntel;cpu_family=6;model=45', 'slce27vmf1002': 'vendor_id=GenuineIntel;cpu_family=6;model=44', 'new_ovs_03': 'vendor_id=GenuineIntel;cpu_family=6;model=45'}>

StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteHA.py", line 248, in ha_check_cpu_compatibility
raise Exception("CPU not compatible! %s" % repr(d))
>

StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteCluster.py", line 609, in cluster_check_prerequisite
raise Exception(msg)

StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteCluster.py", line 646, in _cluster_setup
#_check(ret)
File "/opt/ovs-agent-2.3/OVSXCluster.py", line 340, in _check
raise OVSException(error=ret["error"])

2014-11-12 06:25:18.083 NOTIFICATION Failed setup cluster for agent 2.2.0...
2014-11-12 06:25:18.083 ERROR Cluster Setup when adding server
2014-11-12 06:25:18.087 ERROR [Server Pool Management][Server Pool][test_serverpool]:During adding servers ([new_ovs_03]) to server pool (test_serverpool), Cluster setup failed: (OVM-1011 OVM Manager communication with new_ovs_03 for operation HA Setup for Oracle VM Agent 2.2.0 failed:
errcode=00000, errmsg=Unexpected error: <Exception: ha_check_cpu_compatibility
failed:<Exception: CPU not compatible! {'new_ovs_03': 'vendor_id=GenuineIntel;cpu_family=6;model=45', 'slce27vmf1002': 'vendor_id=GenuineIntel;cpu_family=6;model=44', 'new_ovs_03': 'vendor_id=GenuineIntel;cpu_family=6;model=45'}>

)

As stated in the error message, the adding failed at cpu check. To resolve this, we can comment out the code where cpu check occurred.

File /opt/ovs-agent-2.3/OVSSiteCluster.py in line 646 on each OVS server in the server pool:

#ret = cluster_check_prerequisite(ha_enable=ha_enable)
#_check(ret)

Then bounce ovs-agent on each OVS server, and try add again.

Categories: Clouding, IT Architecture, Oracle Cloud Tags:

resolved – /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory

November 7th, 2014 No comments

In one of our script, error prompted when we ran it today:

[root@testhost01 ~]# su - user1
[user1@testhost01 ~]$ /home/testuser/run_as_root 'su'
-bash: /usr/local/packages/aime/ias/run_as_root: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory

From the output, we can see that it's complaining for not founding file /lib/ld-linux.so.2:

[user1@testhost01 ~]$ ls -l /lib/ld-linux.so.2
ls: cannot access /lib/ld-linux.so.2: No such file or directory

I then checked on another host and found /lib/ld-linux.so.2 belonged to package glibc:

[root@centos-doxer ~]# ls -l /lib/ld-linux.so.2
lrwxrwxrwx 1 root root 9 May 9 2013 /lib/ld-linux.so.2 -> ld-2.5.so
[root@centos-doxer ~]# rpm -qf /lib/ld-linux.so.2
glibc-2.5-107.el5_9.4

However, on the problematic host, glibc was installed:

[root@testhost01 user1]# rpm -qa|grep glibc
glibc-headers-2.12-1.149.el6.x86_64
glibc-common-2.12-1.149.el6.x86_64
glibc-devel-2.12-1.149.el6.x86_64
glibc-2.12-1.149.el6.x86_64

I then tried making a soft link from /lib64/ld-2.12.so to /lib/ld-linux.so.2:

[root@testhost01 ~]# ln -s /lib64/ld-2.12.so /lib/ld-linux.so.2
[root@testhost01 ~]# su - user1
[user1@testhost01 ~]$ /usr/local/packages/aime/ias/run_as_root su
-bash: /usr/local/packages/aime/ias/run_as_root: Accessing a corrupted shared library

Hmmm, so it now complained about corrupted shared library. Maybe we need 32bit of glibc? So I removed the softlink, and then installed glibc.i686:

rm -rf /lib/ld-linux.so.2
yum -y install glibc.i686

After installation, I found /lib/ld-linux.so.2 was there already:

[root@testhost01 user1]# ls -l /lib/ld-linux.so.2
lrwxrwxrwx 1 root root 10 Nov 7 03:46 /lib/ld-linux.so.2 -> ld-2.12.so
[root@testhost01 user1]# rpm -qf /lib/ld-linux.so.2
glibc-2.12-1.149.el6.i686

And when I ran again the command, it returned ok:

[root@testhost01 user1]# su - user1
[user1@testhost01 ~]$ /home/testuser/run_as_root 'su'
[root@testhost01 user1]#

So from this, we can see that the issue was caused by /usr/local/packages/aime/ias/run_as_root supports only 32bit of glibc.

Categories: IT Architecture, Kernel, Linux, Systems Tags:

Resolved – AttributeError: ‘NoneType’ object has no attribute ‘_imgName’

November 6th, 2014 No comments

Today when I tried to list Virtual Machines on one Oracle OVMM, error message prompted:

[root@ovmm_test ~]# ovm -uadmin -pwelcome1 vm ls
Traceback (most recent call last):
  File "/usr/bin/ovm", line 43, in ?
    ovmcli.ovmmain.main(sys.argv[1:])
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmmain.py", line 122, in main
    return ovm.ovmcli.runcmd(args)
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmcli.py", line 147, in runcmd
    return method(options)
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmcli.py", line 1578, in do_vm_ls
    result.append((serverpool._serverPoolName, vm._imgName))
AttributeError: 'NoneType' object has no attribute '_imgName'

Then I tried list VMs by server pool:

[root@ovmm_test ~]# ovm -uadmin -pwelcome1 vm ls -s Pool1_test
Name                 Size(MB) Mem   VCPUs Status  Server_Pool
testvm1              17750    8196  4     Running Pool1_test
testvm2               50518    8196  4     Running Pool1_test
testvm3          19546    8192  2     Running Pool1_test
testvm4          50518    20929 4     Running Pool1_test
testvm5          19546    8192  2     Running Pool1_test
[root@ovmm_test ~]# ovm -uadmin -pwelcome1 vm ls -s Pool1_test_A
Traceback (most recent call last):
  File "/usr/bin/ovm", line 43, in ?
    ovmcli.ovmmain.main(sys.argv[1:])
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmmain.py", line 122, in main
    return ovm.ovmcli.runcmd(args)
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmcli.py", line 147, in runcmd
    return method(options)
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmcli.py", line 1578, in do_vm_ls
    result.append((serverpool._serverPoolName, vm._imgName))
AttributeError: 'NoneType' object has no attribute '_imgName'

One pool was working and the other was not, so the problematic VMs must reside in pool Pool1_test_A.

Another symptom was that, although ovmcli wouldn't work, the OVMM GUI worked as expected and returns all the VMs.

As ovmcli read entries from Oracle DB(SID XE) on OVMM, so the issue maybe caused by the inconsistency between DB & OVMM agent DB.

I got the list of all VMs on the problematic server pool from OVMM GUI, and then ran the following query to get all entries in DB:

select IMG_NAME from OVS_VM_IMG where SITE_ID=110 and length(IMG_NAME)>50; #Pool1_test_A was with ID 110, got from table OVS_SITE. I used length() here because in the problematic server pool, VMs all should have IMG_NAME with more than 50 characters; if less than 50, then they were VM templates which should have no issue

Comparing the output from OVMM GUI & OVM DB, I found some entries which only existed in DB. And for all of these entries, they all had "Status" in "Creating", and also the

select IMG_NAME from OVS_VM_IMG where STATUS='Creating';

Then I removed these weird entries:

create table OVS_VM_IMG_bak20141106 as select * from OVS_VM_IMG;
delete from OVS_VM_IMG where STATUS='Creating'; #you can try rename/drop table OVS_VM_IMG(alter table TBL1 rename to TBL2; drop table TBL1), remove entries in backup table(OVS_VM_IMG_bak20141106), and then rename backup table(OVS_VM_IMG_bak20141106) to OVS_VM_IMG if failed at this step caused by foreign key or other reasons

After this, the issue got resolved.

PS:

After you removed entries with STATUS being "Creating", and if you found some more entries of this kind occurred in OVM DB, then maybe it's caused by VM templates not working or DB table corrupted. In this case, you'll need recover OVMM by rollong back to previous version of your backup, and then import VM templates/VM images etc.

resoved – nfs share chown: changing ownership of ‘blahblah': Invalid argument

October 28th, 2014 No comments

Today I encountered the following error when trying to change ownership of some files:

[root@test webdav]# chown -R apache:apache ./bigfiles/
chown: changing ownership of `./bigfiles/opcmessaging': Invalid argument
chown: changing ownership of `./bigfiles/': Invalid argument

This host is running CentOS 6.2, and in this version of OS, nfs4 is by default used:

[root@test webdav]# cat /proc/mounts |grep u01
nas-server.example.com:/export/share01/ /u01 nfs4 rw,relatime,vers=4,rsize=32768,wsize=32768

However, the NFS server does not support NFSv4 well, so I modified the share to use NFSv3 by force:

nas-server.example.com:/export/share01/ /u01 nfs rsize=32768,wsize=32768,hard,nolock,timeo=14,noacl,intr,mountvers=3,nfsvers=3

After umount/mount, the issue was resolved!

PS:

If the NAS server is SUN ZFS appliance, then the following should be noted, or the issue may occur even on CentOS/Redhat linux 5.x:

protocol_anonymous_user_mapping

root_directory_access

Categories: Hardware, IT Architecture, Linux, NAS, Storage, Systems Tags:

Sun ZFS storage stuck due to incorrect LACP configuration

October 24th, 2014 No comments

Today we met issue with Sun ZFS storage 7320. NFS shares provisioned from the ZFS appliance were not responding to requests, even a "df -h" will stuck there for a long long time. And when we checked from ZFS storage side, we found the following statistics:

1-high-io-before-stuck

 

And during our checking for the traffic source, the ZFS appliance backed to normal by itself:

2-recovered-by-itself

 

As we just configured LACP on this ZFS appliance the day before, so we doubted the issue was caused by incorrect network configuration. Here's the network config:

1-wrong-configuration

For "Policy", we should match with switch setup to even balance incoming/outgoing data flow.  Otherwise, we might experience uneven load balance. Our switch was set to L3, so L3 should be ok. We'll get better load spreading if the policy is L3+4 if the switch supports it.  With L3, all connections from any one IP will only use a single member of the aggregation.  With L3+4, it will load spread by UDP or TCP port too. More is here.

For "Mode", it should be set according to switch. If the switch is "passive" mode then server/storage needs to be on "active" mode, and vice versa.

For "Timer", it's regarding how often to check LACP status.

After checking switch setting, we found that the switch is in "Active" mode, and as ZFS appliance was also on "Active" mode, so that's the culprit. So we changed the setting to the following:

2-right-configurationAfter this, we had some observation and ZFS is now operating normally.

PS:

You should also have a check of disk operations, if there are timeout errors on the disks, then you should try replace them. Sometimes, a single disk may hang the SCSI bus.  Ideally, the system should fail the disk but it didn't happen. You should manually failed the disk to resolve the issue.

The ZFS Storage Appliance core analysis (previous note) confirms that the disk was the cause of the issue.

It was hanging up communication on the SCSI bus but once it was removed the issue was resolved.

It is uncommon for a single disk to hang up the bus, however; since the disks share the SCSI path (each drive does not have its own dedicated cabling and controller) it is sometimes seen.

You can check the ZFS appliance uptime by running "version show" in the console.

zfs-test:configuration> version show
Appliance Name: zfs-test
Appliance Product: Sun ZFS Storage 7320
Appliance Type: Sun ZFS Storage 7320
Appliance Version: 2013.06.05.2.2,1-1.1
First Installed: Sun Jul 22 2012 10:02:24 GMT+0000 (UTC)
Last Updated: Sun Oct 26 2014 22:11:03 GMT+0000 (UTC)
Last Booted: Wed Dec 10 2014 10:03:08 GMT+0000 (UTC)
Appliance Serial Number: d043d335-ae15-4350-ca35-b05ba2749c94
Chassis Serial Number: 1225FMM0GE
Software Part Number: Oracle 000-0000-00
Vendor Product ID: urn:uuid:418bff40-b518-11de-9e65-080020a9ed93
Browser Name: aksh 1.0
Browser Details: aksh
HTTP Server: Apache/2.2.24 (Unix)
SSL Version: OpenSSL 1.0.0k 5 Feb 2013
Appliance Kit: ak/SUNW,maguro_plus@2013.06.05.2.2,1-1.1
Operating System: SunOS 5.11 ak/generic@2013.06.05.2.2,1-1.1 64-bit
BIOS: American Megatrends Inc. 08080102 05/23/2011
Service Processor: 3.0.16.10

Categories: Hardware, NAS, Storage Tags:

resolved – auditd STDERR: Error deleting rule Error sending enable request (Operation not permitted)

September 19th, 2014 No comments

Today when I try to restart auditd, the following error message prompted:

[2014-09-18T19:26:41+00:00] ERROR: service[auditd] (cookbook-devops-kernelaudit::default line 14) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of /sbin/service auditd restart ----
STDOUT: Stopping auditd: [  OK  ]
Starting auditd: [FAILED]
STDERR: Error deleting rule (Operation not permitted)
Error sending enable request (Operation not permitted)
---- End output of /sbin/service auditd restart ----
Ran /sbin/service auditd restart returned 1

After some reading of manpage auditd, I realized that when audit "enabled" was set to 2(locked), any attempt to change the configuration in this mode will be audited and denied. And that maybe the reason of "STDERR: Error deleting rule (Operation not permitted)", "Error sending enable request (Operation not permitted)". Here's from man page of auditctl:

-e [0..2] Set enabled flag. When 0 is passed, this can be used to temporarily disable auditing. When 1 is passed as an argument, it will enable auditing. To lock the audit configuration so that it can't be changed, pass a 2 as the argument. Locking the configuration is intended to be the last command in audit.rules for anyone wishing this feature to be active. Any attempt to change the configuration in this mode will be audited and denied. The configuration can only be changed by rebooting the machine.

You can run auditctl -s to check the current setting:

[root@centos-doxer ~]# auditctl -s
AUDIT_STATUS: enabled=1 flag=1 pid=3154 rate_limit=0 backlog_limit=320 lost=0 backlog=0

And you can run auditctl -e <0|1|2> to change this attribute on the fly, or you can add -e <0|1|2> in /etc/audit/audit.rules. Please note after you modify this, a reboot is a must to make this into effect.

PS:

Here's more about linux audit.

resolved – Permission denied even after chmod 777 world readable writable

September 19th, 2014 No comments

Several team members asked me that when they want to change to some directories or read some files ,the system reported error "Permission denied". Even after setting world writable(chmod 777), the error was still there:

-bash-3.2$ cd /u01/local/config/m_domains/tasdc1_domain/servers/wls_sdi1/logs
-bash: cd: /u01/local/config/m_domains/tasdc1_domain/servers/wls_sdi1/logs: Permission denied

-bash-3.2$ cat /u01/local/config/m_domains/tasdc1_domain/servers/wls_sdi1/logs/wls_sdi1.out
cat: /u01/local/config/m_domains/tasdc1_domain/servers/wls_sdi1/logs/wls_sdi1.out: Permission denied

-bash-3.2$ ls -l /u01/local/config/m_domains/tasdc1_domain/servers/wls_sdi1/logs/wls_sdi1.out
-rwxrwxrwx 1 oracle oinstall 1100961066 Sep 19 07:37 /u01/local/config/m_domains/tasdc1_domain/servers/wls_sdi1/logs/wls_sdi1.out

In summary, if you want to read some file(e.g. wls_sdi1.out) under some directory(e.g. /u01/local/config/m_domains/tasdc1_domain/servers/wls_sdi1/logs), then except for "read bit" set on that file(chmod +r wls_sdi1.out), it's also needed that all parent directories of that file(/u01, /u01/local, /u01/local/wls, ......, /u01/local/config/m_domains/tasdc1_domain/servers/wls_sdi1/logs) have both "read bit" & "execute bit" set(you can check it by ls -ld <dir name>):

chmod +r wls_sdi1.out #first set "read bit" on the file
chmod +r /u01; chmod +x /u01; chmod +r /u01/local; chmod +x /u01/local; <...skipped...>chmod +r /u01/local/config/m_domains/tasdc1_domain/servers/wls_sdi1/logs; chmod +x /u01/local/config/m_domains/tasdc1_domain/servers/wls_sdi1/logs; #then set both "read bit" & "execute bit" on all parent directories

And at last, if you can log on as the file owner, then everything will be smooth. For /u01/local/config/m_domains/tasdc1_domain/servers/wls_sdi1/logs/wls_sdi1.out, it's owned by oracle user. So you can try log on as oracle user and do the operations.

Categories: IT Architecture, Kernel, Linux, Systems, Unix Tags:

arping in linux for getting MAC address and update ARP caches by broadcast

August 27th, 2014 No comments

Suppose we want to know MAC address of 10.182.120.210. then we can log on one linux host which is in the same subnet of 10.182.120.210, e.g. 10.182.120.188:

[root@centos-doxer ~]#arping -U -c 3 -I bond0 -s 10.182.120.188 10.182.120.210
ARPING 10.182.120.210 from 10.182.120.188 bond0
Unicast reply from 10.182.120.210 [00:21:CC:B7:1F:EB] 1.397ms
Unicast reply from 10.182.120.210 [00:21:CC:B7:1F:EB] 1.378ms
Sent 3 probes (1 broadcast(s))
Received 2 response(s)

So 00:21:CC:B7:1F:EB is the MAC address of 10.182.120.210. And from here we can see that IP address 10.182.120.210 is now used in local network.

Another use of arping is to update ARP cache. One scene is that, you assign a new machine with one being used IP address, then you will not able to log on the old machine with the IP address. Even after you shutdown the new machine, you may still not able to access the old machine. And here's the resolution:

Suppose we have configured the new machine NIC eth0 with IP address 192.168.0.2 which is already used by one old machine. Log on the new machine and run the following commands:

arping -A 192.168.0.2 -I eth0 192.168.0.2
arping -U -s 192.168.0.2 -I eth0 192.168.0.1 #this is sending ARP broadcast, and 192.168.0.1 is the gateway address.
/sbin/arping -I eth0 -c 3 -s 192.168.0.2 192.168.0.3 #update neighbours' ARP caches

resolved – IOError: [Errno 2] No such file or directory when creating VMs on Oracle VM Server

August 25th, 2014 No comments

Today when I tried to add one OVS server to Oracle VM Server server pool, there was error message like below:

Start - /OVS/running_pool/vm_test
PowerOn Failed : Result - failed:<Exception: return=>failed:<Exception: failed:<IOError: [Errno 2] No such file or directory: '/var/ovs/mount/85255944BDF24F62831E1C6E7101CF7A/running_pool/vm_test/vm.cfg'>

I log on one OVS server and found the path was there. And later I logged on all OVS servers in that server pool and found one OVS server did not have storage repo. So I removed that OVS server from pool and tried to added it back to pool and want to create the VM again. But this time, the following error messages prompted when I tried to add OVS server back:

2014-08-21 02:52:52.962 WARNING failed:errcode=50006, errmsg=Do 'clusterm_init_root_sr' on servers ('testhost1') failed.
StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteCluster.py", line 651, in _cluster_setup
_check(ret)
File "/opt/ovs-agent-2.3/OVSXCluster.py", line 340, in _check
raise OVSException(error=ret["error"])

2014-08-21 02:52:52.962 NOTIFICATION Failed setup cluster for agent 2.2.0...
2014-08-21 02:52:52.963 ERROR Cluster Setup when adding server
2014-08-21 02:52:52.970 ERROR [Server Pool Management][Server Pool][test_pool]:During adding servers ([testhost1]) to server pool (test_pool), Cluster setup failed: (OVM-1011 OVM Manager communication with host_master for operation HA Setup for Oracle VM Agent 2.2.0 failed:
errcode=50006, errmsg=Do 'clusterm_init_root_sr' on servers ('testhost1') failed.

From here, I realized that this error was caused by storage repo could not created on that OVS server testhost1. So I logged on testhost1 for a check. As the storage repo was one NFS share, so I tried do a showmount -e <nfs server>, and found it's not working. And then I tried to check the tracert to <nfs server>, and it's not going through.

From another host, showmount -e <nfs server> worked. So the problem was on OVS server testhost1. After more debugging, I found that one NIC was on the host but not pingable. Later I had a check of the switch, and found the NIC was unplugged. I plugged in the NIC and tried again with adding back OVS server, creating VM, and all went smoothly.

PS:

Suppose you want to know the NFS clients which mount one share from the NFS server, then on any client that has access to the NFS server, do the following:

[root@centos-doxer ~]# showmount -a nfs-server.example.com|grep to_be
10.182.120.188:/export/IDM_BR/share01_to_be_removed
test02.example:/export/IDM_BR/share01_to_be_removed

-a or --all

List both the client hostname or IP address and mounted directory in host:dir format. This info should not be considered reliable.

crontab cronjob failed with date single apostrophe date +%d-%b-%Y-%H-%M on linux

August 4th, 2014 No comments

I tried to creat one linux cronjob today, and want to note down date & time when the job was running, and here's the content:

echo '10 10 * * 1 root cd /var/log/ovm-manager/;tar zcvf oc4j.log.`date +%m-%d-%y`.tar.gz oc4j.log;echo "">/var/log/ovm-manager/oc4j.log' > /etc/cron.d/oc4j

However, this entry failed to run, and when check log in /var/log/cron:

Aug 4 06:24:01 testhost crond[1825]: (root) RELOAD (cron/root)
Aug 4 06:24:01 testhost crond[1825]: (root.bak) ORPHAN (no passwd entry)
Aug 4 06:25:01 testhost crond[28376]: (root) CMD (cd /var/log/ovm-manager/;tar zcvf oc4j.log.`date +)

So, the command was intercepted and that's the reason for the failure.

Eventually, I figured out that cron treats the % character specially (it is turned into a newline in the command). You must precede all % characters with a \ in a crontab file, which tells cron to just put a % in the command. And here's the updated version:

echo '10 10 * * 1 root cd /var/log/ovm-manager/;tar zcvf oc4j.log.`date +\%m-\%d-\%y`.tar.gz oc4j.log;echo "">/var/log/ovm-manager/oc4j.log' > /etc/cron.d/oc4j

This time, the job got ran successfully:

Aug 4 06:31:01 testhost crond[1825]: (root) RELOAD (cron/root)
Aug 4 06:31:01 testhost crond[1825]: (root.bak) ORPHAN (no passwd entry)
Aug 4 06:31:01 testhost crond[28503]: (root) CMD (cd /var/log/ovm-manager/;tar zcvf oc4j.log.`date +%m-%d-%y`.tar.gz oc4j.log;echo "">/var/log/ovm-manager/oc4j.log)

PS:

More on here http://stackoverflow.com/questions/1486088/cron-fails-on-single-apostrophe

Categories: IT Architecture, Linux, Systems, Unix Tags:

resolved – Kernel panic – not syncing: Attempted to kill init

July 29th, 2014 No comments

Today when I tried to poweron one VM hosted on XEN server, the following error messages prompted:

Write protecting the kernel read-only data: 6784k
Kernel panic - not syncing: Attempted to kill init! [failed one]
Pid: 1, comm: init Not tainted 2.6.32-300.29.1.el5uek #1
Call Trace:
[<ffffffff810579a2>] panic+0xa5/0x162
[<ffffffff8109b997>] ? atomic_add_unless+0x2e/0x47
[<ffffffff8109bdf9>] ? __put_css_set+0x29/0x179
[<ffffffff8145744c>] ? _write_lock_irq+0x10/0x20
[<ffffffff81062a65>] ? exit_ptrace+0xa7/0x118
[<ffffffff8105b076>] do_exit+0x7e/0x699
[<ffffffff8105b731>] sys_exit_group+0x0/0x1b
[<ffffffff8105b748>] sys_exit_group+0x17/0x1b
[<ffffffff81011db2>] system_call_fastpath+0x16/0x1b

This is quite weird as it's ok yesterday:

Write protecting the kernel read-only data: 6784k
blkfront: xvda: barriers enabled (tag) [normal one]
xvda: detected capacity change from 0 to 15126289920
xvda: xvda1 xvda2 xvda3
blkfront: xvdb: barriers enabled (tag)
xvdb: detected capacity change from 0 to 16777216000
xvdb: xvdb1
Setting capacity to 32768000
xvdb: detected capacity change from 0 to 16777216000
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: Disabled at runtime.
type=1404 audit(1406281405.511:2): selinux=0 auid=4294967295 ses=4294967295

After some checking, I found that this OVS server was hosting more than 40 VMs, and the VCPUs was tight. So I turned off some unused VMs and then issue resolved.

yum install specified version of packages

July 15th, 2014 No comments

Assume that you want to install one specified version of package, say glibc-2.5-118.el5_10.2.x86_64:

[root@centos-doxer ~]# yum list|grep glibc
glibc.i686 2.5-107.el5_9.4 installed
glibc.x86_64 2.5-107.el5_9.4 installed
glibc-common.x86_64 2.5-107.el5_9.4 installed
glibc-devel.i386 2.5-107.el5_9.4 installed
glibc-devel.x86_64 2.5-107.el5_9.4 installed
glibc-headers.x86_64 2.5-107.el5_9.4 installed
compat-glibc.i386 1:2.3.4-2.26 el5_latest
compat-glibc.x86_64 1:2.3.4-2.26 el5_latest
compat-glibc-headers.x86_64 1:2.3.4-2.26 el5_latest
glibc.i686 2.5-118.el5_10.2 el5_latest
glibc.x86_64 2.5-118.el5_10.2 el5_latest
glibc-common.x86_64 2.5-118.el5_10.2 el5_latest
glibc-devel.i386 2.5-118.el5_10.2 el5_latest
glibc-devel.x86_64 2.5-118.el5_10.2 el5_latest
glibc-headers.x86_64 2.5-118.el5_10.2 el5_latest
glibc-utils.x86_64 2.5-118.el5_10.2 el5_latest

Then you should execute glibc-2.5-118.el5_10.2.x86_64. The format of this command is yum install <packagename>-<version>.<platform, such as x86_64>.

Categories: IT Architecture, Linux, Systems Tags:

linux process accounting set up

July 8th, 2014 No comments

Ensure package psacct is installed and make it boot with system:

rpm -qa|grep -i psacct
chkconfig psacct on
service psacct start

Here're some useful commands

[root@qg-dc2-tas_sdi ~]# ac -p #Display time totals for each user
emcadm 0.00
test1 2.57
aime 37.04
oracle 32819.22
root 12886.86
testuser 1.47
total 45747.15

[root@qg-dc2-tas_sdi ~]# lastcomm testuser #Display command executed by user testuser
top testuser pts/5 0.02 secs Fri Jul 4 03:59
df testuser pts/5 0.00 secs Fri Jul 4 03:59

[root@qg-dc2-tas_sdi ~]# lastcomm top #Search the accounting logs by command name
top testuser pts/5 0.03 secs Fri Jul 4 04:02

[root@qg-dc2-tas_sdi ~]# lastcomm pts/5 #Search the accounting logs by terminal name pts/5
top testuser pts/5 0.03 secs Fri Jul 4 04:02
sleep X testuser pts/5 0.00 secs Fri Jul 4 04:02

[root@qg-dc2-tas_sdi ~]# sa |head #Use sa command to print summarizes information(e.g. the number of times the command was called and the system resources used) about previously executed commands.
332 73.36re 0.03cp 8022k
33 8.76re 0.02cp 7121k ***other*
14 0.02re 0.01cp 26025k perl
7 0.00re 0.00cp 16328k ps
49 0.00re 0.00cp 2620k find
42 0.00re 0.00cp 13982k grep
32 0.00re 0.00cp 952k tmpwatch
11 0.01re 0.00cp 13456k sh
11 0.00re 0.00cp 2179k makewhatis*
8 0.01re 0.00cp 2683k sort

[root@qg-dc2-tas_sdi ~]# sa -u |grep testuser #Display output per-user
testuser 0.00 cpu 14726k mem sleep
testuser 0.03 cpu 4248k mem top
testuser 0.00 cpu 22544k mem sshd *
testuser 0.00 cpu 4170k mem id
testuser 0.00 cpu 2586k mem hostname

[root@qg-dc2-tas_sdi ~]# sa -m | grep testuser #Display the number of processes and number of CPU minutes on a per-user basis
testuser 22 8.18re 0.00cp 7654k

Categories: IT Architecture, Linux, Systems, Unix Tags:

Enable NIS client on linux host

July 2nd, 2014 No comments

After you set up NIS server, you need set up NIS client. Here's the steps for enabling NIS client on linux box.

Ensure required packages are installed

rpm -qa|egrep 'yp-tools|ypbind|portmap'

Edit /etc/sysconfig/network

NISDOMAIN=example.com

Edit /etc/yp.conf
domain example.com server 10.229.169.88
domain example.com server 10.229.192.99

Set NIS domain-name

domainname example.com
ypdomainname example.com

Set /etc/nsswitch.conf

passwd: files nis
shadow: files nis
group: files nis
hosts: files dns nis
bootparams: nisplus [NOTFOUND=return] files
ethers: files
netmasks: files
networks: files
protocols: files
rpc: files
services: files
netgroup: nisplus
publickey: nisplus
automount: files nisplus
aliases: files nisplus
sudoers: files nis

Make sure the portmap service is running:

service portmap start

chkconfig portmap on

Start ypbind service:

service ypbind start
chkconfig ypbind on

Test it out:

rpcinfo -u localhost ypbind

ypcat passwd|egrep 'username'

If you want to set up sudo privileges for NIS users, then you can refer to this article resolved – /etc/sudoers: syntax error near line 10

PS:

If there's firewall between Linux NIS clients and NIS servers, then you should not startup ypbind(chkconfig ypbind off; service ypbind stop), if you startup ypbind, then the box will try to connect to NIS servers without stopping. Your linux box will get stuck and will take a long time for you to log on even as root. This is rule of thumb.

Categories: IT Architecture, Linux, Systems, Unix Tags:

resolved – /etc/sudoers: syntax error near line 10

July 2nd, 2014 No comments

When using /usr/sbin/visudo, after modification, errors occurred:

>>> /etc/sudoers: syntax error near line 10 <<<

Here's line 10:

User_Alias Users_SDITAS = username1, username2

Then I changed it as following:

User_Alias USERS_SDITAS = username1, username2

And now everything is ok. So this means that the alias name must all be uppercase.

PS:
1. Here's the explanation about User_Alias Users_SDITAS = username1, username2

The first part is the user,
The second is the terminal from where the user can use sudo command,
The third part is which users he may act as,
The last one, is which commands he may run when using sudo.
For example, root ALL=(ALL) ALL, means the root user can execute from ALL terminals, acting as ALL (any) users, and run ALL (any) command. And USERS_SDITAS ALL=(oracle) NOPASSWD:SETENV: CMD_MIGRATIONDC1DC3 means users in group USERS_SDITAS can execute from ALL terminals, acting as oracle user, and run commands in group CMD_MIGRATIONDC1DC3. (sudo -E -u oracle <command>, -E will pass invoking users env variables to target user if SETENV tag is added to sudo commands in /etc/sudoers. You'll get error message "sudo: sorry, you are not allowed to preserve the environment" if you did not add SETENV tag in /etc/sudoers. You can run sudo -l or sudo -ll to get a list of privilege commands for you or for others if you run sudo -l -U <username> )

2. One sample of /etc/sudoers configuration in linux(use visudo to edit, as visudo can check for errors after modification. You may need set "echo 'export PATH=/usr/bin:$PATH' >> /etc/profile" in some circumstances so that sudo will be /usr/bin/sudo):

Defaults logfile=/var/log/sudo.log

Defaults always_set_home #switched to target user's home directory when running sudo. Note that HOME is already set when the the env_reset option is enabled, so always_set_home is only effective for configurations where either env_reset is disabled(Defaults !env_reset) or HOME is present in the env_keep list(Defaults env_keep += HOME). This flag is off by default.
Host_Alias HOSTS_MIGRATIONDC1DC3 = slcn06vmf0012, slcn06vmf0013
Cmnd_Alias CMD_MIGRATIONDC1DC3 = /u01/local/wls/user_projects/domains/base_domain/bin/tasctl, /u01/shared/wls/Oracle_SDI1/sdictl/sdictl.sh
User_Alias USERS_SDITAS =username1, username2
USERS_SDITAS ALL=(ALL) NOPASSWD: /bin/su - oracle #users in USERS_SDITAS group can now sudo su - oracle without asking for a password
oracle ALL=(ALL) NOPASSWD:SETENV: CMD_MIGRATIONDC1DC3 #oracle user can run all commands in commands group CMD_MIGRATIONDC1DC3.

3. To check  whether some NIS users are using/bin/false shell(means they can not log on the host by ssh), use the following commands:

ypcat passwd|awk -F: '{if($1 ~ /^username1$|^username2$/) print}'|grep false

Categories: IT Architecture, Linux, Systems, Unix Tags: ,

Resolved – Your boot partition is on a disk using the GPT partitioning scheme but this machine cannot boot using GPT

June 12th, 2014 No comments

Today when I tried to install Oracle VM Server on one server, the following error occurred:

Your boot partition is on a disk using the GPT partitioning scheme but this machine cannot boot using GPT. This can happen if there is not enough space on your hard drive(s) for the installation.

So to went on with the installation, I had to think of a way to erase GPT partition table on the drive.

To do this, the first step is to fall into linux rescue mode when booting from CDROM:

rescue

Later, check with fdisk -l, I could see that /dev/sda was the only disk that needed erasing GPT label. So I used dd if=/dev/zero of=/dev/sda bs=512 count=1 to erase GPT table:

 

fdisk_dd

 

After this, run fdisk -l again, I saw that the partition table was now gone:

fdisk_dd_2

Later, re-initializing installation of OVS server. When the following message prompted, select "No":

select_no

And select "yes" when below message prompted so that we can make new partition table:

select_yes

The steps after this was normal ones, and the installation went smoothly.

Resolved – rm cannot remove some files with error message “Device or resource busy”

June 11th, 2014 No comments

If you meet problem when remove one file on linux with below error message:

[root@test-host ~]# rm -rf /u01/shared/*
rm: cannot remove `/u01/shared/WLS/oracle_common/soa/modules/oracle.soa.mgmt_11.1.1/.nfs0000000000004abf00000001': Device or resource busy
rm: cannot remove `/u01/shared/WLS/oracle_common/modules/oracle.jrf_11.1.1/.nfs0000000000005c7a00000002': Device or resource busy
rm: cannot remove `/u01/shared/WLS/OracleHome/soa/modules/oracle.soa.fabric_11.1.1/.nfs0000000000006bcf00000003': Device or resource busy

Then it means that some progresses were still referring to these files. You have to stop these processes before remove these files. You can use linux command lsof to find the processes using specific files:

[root@test-host ~]# lsof |grep nfs0000000000004abf00000001
java 2956 emcadm mem REG 0,21 1095768 19135 /u01/shared/WLS/oracle_common/soa/modules/oracle.soa.mgmt_11.1.1/.nfs0000000000004abf00000001 (slce49sn-nas:/export/C9QA123_DC1/tas_central_shared)
java 2956 emcadm 88r REG 0,21 1095768 19135 /u01/shared/WLS/oracle_common/soa/modules/oracle.soa.mgmt_11.1.1/.nfs0000000000004abf00000001 (slce49sn-nas:/export/C9QA123_DC1/tas_central_shared)

So from here you can see that processe with PID 2956 is still using file /u01/shared/WLS/oracle_common/soa/modules/oracle.soa.mgmt_11.1.1/.nfs0000000000004abf00000001.

However, some systems have no lsof installed by default. Then you can install it or by using the alternative one "fuser":

[root@test-host ~]# fuser -cu /u01/shared/WLS/oracle_common
/u01/shared/WLS/oracle_common: 2956m(emcadm) 7358c(aime)

Then you can see also that progresses with PIDs 2956 and 7358 are referring to the directory /u01/shared/WLS/oracle_common.

so you'll need stop the process first by killing it(or stop it using the processes own stop() method if defined):

kill -9 2956

After that, you can try remove the files again, should be ok this time.

Categories: IT Architecture, Kernel, Linux, Systems, Unix Tags:

Resolved – failed Exception check srv hostname/IP failedException Invalid hostname/IP configuration ocfs2 config failed Obsolete nodes found

June 3rd, 2014 No comments

Today when I tried to add two OVS servers into one server pool, errors were met. The first one was like below:

2014-06-03 04:26:08.965 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:09.485 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:09.497 NOTIFICATION [Server Pool Management][Server][hostname1.example.com]:Check agent (hostname1.example.com) connectivity.
2014-06-03 04:26:12.463 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:12.985 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:12.997 NOTIFICATION [Server Pool Management][Server][hostname1.example.com]:Check agent (hostname1.example.com) connectivity.
2014-06-03 04:26:13.004 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:13.522 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:13.535 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:13.980 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:16.307 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:16.831 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:16.844 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:17.284 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:17.290 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:17.814 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:17.827 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:18.272 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:18.279 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:18.799 NOTIFICATION Regisering server:hostname1.example.com...
2014-06-03 04:26:21.749 NOTIFICATION Register Server: hostname1.example.com success
2014-06-03 04:26:21.751 NOTIFICATION Getting host info for server:hostname1.example.com ...
2014-06-03 04:26:23.894 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) starting.
failed:<Exception: check srv('hostname1.example.com') hostname/IP failed! => <Exception: Invalid hostname/IP configuration: hostname=hostname1;ip=10.200.225.127>
2014-06-03 04:26:33.348 ERROR [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:During adding servers ([hostname1.example.com]) to server pool (DC1_DMZ_Service_Mid), Cluster setup failed: (OVM-1011 OVM Manager communication with materhost for operation HA Setup for Oracle VM Agent 2.2.0 failed:
failed:<Exception: check srv('hostname1.example.com') hostname/IP failed! => <Exception: Invalid hostname/IP configuration: hostname=hostname1;ip=10.200.225.127>

Also there's error message like below:

2014-06-03 04:59:11.003 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:11.524 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:11.536 NOTIFICATION [Server Pool Management][Server][hostname1-fe.example.com]:Check agent (hostname1-fe.example.com) connectivity.
2014-06-03 04:59:15.484 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:16.005 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:16.016 NOTIFICATION [Server Pool Management][Server][hostname1-fe.example.com]:Check agent (hostname1-fe.example.com) connectivity.
2014-06-03 04:59:16.025 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:16.546 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:16.559 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:17.014 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:18.950 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:19.470 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:19.483 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:19.926 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:19.955 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:20.476 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:20.490 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:20.943 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:20.947 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:21.471 NOTIFICATION Regisering server:hostname1-fe.example.com...
2014-06-03 04:59:24.439 NOTIFICATION Register Server: hostname1-fe.example.com success
2014-06-03 04:59:24.439 NOTIFICATION Getting host info for server:hostname1-fe.example.com ...
2014-06-03 04:59:26.577 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) starting.
failed:<Exception: check srv('hostname1-fe.example.com') ocfs2 config failed! => <Exception: Obsolete nodes found: >
2014-06-03 04:59:37.100 ERROR [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:During adding servers ([hostname1-fe.example.com]) to server pool (DC1_DMZ_Service_Mid), Cluster setup failed: (OVM-1011 OVM Manager communication with materhost for operation HA Setup for Oracle VM Agent 2.2.0 failed:
failed:<Exception: check srv('hostname1-fe.example.com') ocfs2 config failed! => <Exception: Obsolete nodes found: >

Then I was confused about "Obsolete nodes found" it complained. I could confirm that I've removed hostname1.example.com, and even after I checked OVM DB in OVS.OVS_SERVER, there's no record about hostname1.example.com.

Then after some searching, these errors were caused by obsolete info in OCFS2(Oracle Cluster File System). We should edit file /etc/ocfs2/cluster.conf and remove obsolete entries.

-bash-3.2# vi /etc/ocfs2/cluster.conf
node:
        ip_port     = 7777
        ip_address  = 10.200.169.190
        number      = 0
        name        = hostname1
        cluster     = ocfs2

node:
        ip_port     = 7777
        ip_address  = 10.200.169.191
        number      = 1
        name        = hostname2
        cluster     = ocfs2

cluster:
        node_count  = 2
        name        = ocfs2

So hostname2 was no longer needed or the IP address of hostname2 was changed, then you should remove entries related to hostname2, and modify node_count to 1. Later bounce ocfs2/o2cb services:

service ocfs2 restart

service o2cb restart

Later, I tried add OVS server again, and it worked! (Before adding that OVS server back, we need first remove its ovs-agent db: service ovs-agent stop; mv /etc/ovs-agent/db /var/tmp/db.bak.5; service ovs-agent start, and then configure ovs-agent service ovs-agent configure. You can also use /opt/ovs-agent-2.3/utils/cleanup.py to clean up too.)

 

SSH port forwarding

May 30th, 2014 No comments

As we know, SSH encrypts traffic between ssh client and ssh server. SSH forwarding can encrypt and forward data traffic of other TCP ports. This is AKA tunneling:

SSH is a client/server application that allows secure connectivity to servers. In practice, it is usually used just like Telnet. The advantage of SSH over Telnet is that it encrypts all data before sending it. While not originally designed to be a tunnel in the sense that VPN or GRE would be considered a tunnel, SSH can be used to access remote devices in addition to the one to which you have connected. While this does not have a direct application on Cisco routers, the concept is similar to that of VPN and GRE tunnels, and thus worth mentioning. I use SSH to access my home netork instead of a VPN. Here's PPTP vpn configuration on linux if you're interested. 

Also, when there's firewall blocking other TCP ports but allow SSH port 22, then you can use SSH to forward these TCP ports so that you can communicate to the blocked TCP ports.

SSH Local Port forwarding

img1LDAP server allows only localhost to visit it's 389 port. So how can we connect from another host to its 389 port?

On LdapClientHost:

ssh -L 7001:localhost:389 <user@LdapServerHost> #with format as ssh -L <local port>:<remote host>:<remote port> <SSH hostname>

After this, you can connect to LdapClientHost:7001, then the data flow will be like:

  1. App on LdapClientHost sends data to LdapClientHost:7001;
  2. SSH client on LdapClientHost will encrypt & forward data received on port 7001 to SSH server on LdapServerHost;
  3.  SSH Server will decrypt & forward data to LDAP:389. When got data back from LDAP:389, SSH Server will forward data back forth according to the same way as data comes in.

Maybe you'll ask whether we can connect from another host say LdapClientHost2 to LdapClientHost:7001 so that we can use the tunnel? The answer is no, as SSH Local Port forwarding will bind to loopback interface, you'll get "Connection refused" response when you connect from other hosts. But one good thing is that SSH has "-g" opthin which will allows remote hosts to connect to local forwarded ports:

ssh -g -L 7001:localhost:389 <user@LdapServerHost>

For example, on OVS server server_01, there's one VM named testvm_01, which is on vnc port 5926. I tried to connect from my PC using tightvnc viewer with server_01:5926 but failed as it's blocked by firewall. So on another host proxy_vm01, I run the following command:

ssh -g -L 7001:localhost:5926 root@server_01

Later, I connect from my PC using tightvnc viewer with centos-doxer:7001, then I can connect to testvm_01's console.

Anoter note is that, you will want ssh not to disconnect by itself after some time. So you'll need to modify ssh configuration file. Here's more about it: avoid putty ssh connection sever or disconnect or make ssh on linux not to disconnect after some certain time.

SSH Remote Port forwarding

img2On LdapServerHost:

ssh -R 7001:localhost:389 LdapClientHost #with format as ssh -R <local port>:<remote host>:<remote port> <SSH hostname>

This is called SSH Remote Port forwarding as of now, SSH will connect from LDAP server to LDAP client. And the dataflow is the same except that ssh client is now on LDAP server and ssh server is now on LDAP client:

  1. App on LdapClientHost sends data to LdapClientHost:7001;
  2. SSH server on LdapClientHost will encrypt & forward data received on port 7001 to SSH client on LdapServerHost;
  3. SSH client will decrypt & forward data to LDAP:389. When got data back from LDAP:389, SSH client will forward data back forth according to the same way as data comes in.

img3

On SSH Client(C):

ssh -g -L 7001:<B>:389 <D>

Then configure 7001 port on A and C. Please note that traffic between A)<-> (C) and (B)<->(D) are not encrypted by SSH.

One thing is that LDAP Server(B) is using private IP, and so you'll need to set NAT on SSH Server(D). You can take the following article for reference: NAT forwarding for ssh and vncviewer and NAT binding one priviate ip and one public ip together using linux as router.

SSH Dynamic port forwarding

img4

Sometimes there's no fixed service port, for example when we surf the internet, or talking using MSN. But we need protect our data when we using insecure network such as public WIFI. Here's when SSH Dynamic port forwarding comes into use.

ssh -D 7001 <SSH Server> #with format ssh -D <local port> <SSH Server>

After this, SSH will create a SOCKS proxy service. You can set proxy on MSN or browser to use localhost:7001 as SOCKS proxy, and you can browse internet for sites that are blocked on SSH client.

Here's what -D means:

-D [bind_address:]port
Specifies a local 'dynamic' application-level port forwarding. This works by allocating a socket to listen to port on the local side, optionally bound to the specified bind_address. Whenever a connection is made to this port, the connection is forwarded over the secure channel, and the application protocol is then used to determine where to connect to from the remote machine. Currently the SOCKS4 and SOCKS5 protocols are supported, and ssh will act as a SOCKS server. Only root can forward privileged ports. Dynamic port forwardings can also be specified in the configuration file.

SSH X port forwarding

img5
We can got GUI on Linux/Unix/Solaris/HP through VNC or X windows, here we'll take X windows for example.

In here, X client will be Linux/Unix/Solaris/HP servers, and X Server will be our client host(such as your PC). First, you'll need specify X server's location on X client:

export DISPLAY=myDesktop:1.0 #with format export DISPLAY=<X Server IP>:<display #>.<virtual #>

Then run X app on X client(Linux/Unix/Solaris/HP servers), and the GUI will show on X Server(such as your PC).

All goes smooth when there comes a firewall before Linux/Unix/Solaris/HP servers and X protocol is blocked. We now can use SSH port forwarding except for use VNC. And SSH port forwarding has an advantage of security upon VNC.

On X Server(your PC for example):

ssh -X <SSH Server>

Now you can run X app on remote servers, and GUI will show on client host. You can use XMing for example as Xserver when your PC is running Windows, and as for SSH client, putty or Cygwin are all ok. A more guide is Use xming, xshell, putty, tightvnc to display linux gui on windows desktop (x11 forwarding when behind firewall) which you'll find useful if you only want X windows.

resolved – check backend OHS httpd servers for BIG ip F5 LTM VIP

May 23rd, 2014 No comments

Assume you want to check the OHS or httpd servers one LTM VIP example.vip.com is routing traffic to. Then here's the steps:

  1. get the ip address of VIP example.vip.com;
  2. log on LTM's BUI. Local traffic-> virtual servers -> virtual server list, search ip
  3. click "edit" below column "resource"
  4. note down default pool
  5. search pool name in local traffic -> virtual servers -> pools -> pool list
  6. click the number below column members. Then you'll find the OHS servers and ports the VIP will route traffic to.

test telnet from VLAN on cisco router device

May 22nd, 2014 No comments

If you want to test telnet connection from one vlan to specific destination IP, here is the howto:

test-router# telnet 10.200.244.14 80 source vlan 125
Trying 10.200.244.14...
Connected to 10.200.244.14.
Escape character is '^]'.

Good luck.

Resolved – input_userauth_request: invalid user root

May 15th, 2014 2 comments

Today when I tried to ssh to one linux box but it failed, and /var/log/secure gave the following messages:

May 15 04:05:07 testbox sshd[22925]: User root from 10.120.120.188 not allowed because not listed in AllowUsers
May 15 04:05:07 testbox sshd[22928]: input_userauth_request: invalid user root
May 15 04:05:07 testbox unix_chkpwd[22929]: password check failed for user (root)
May 15 04:05:07 testbox sshd[22925]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.120.120.188 user=root
May 15 04:05:09 testbox sshd[22925]: Failed password for invalid user root from 10.120.120.188 port 50362 ssh2
May 15 04:05:10 testbox unix_chkpwd[22930]: password check failed for user (root)
May 15 04:05:11 testbox sshd[22928]: Connection closed by 10.120.120.188

Then I had a check of /etc/ssh/sshd_config and modified the following:

[root@testbox ~]# egrep 'PermitRoot|AllowUser' /etc/ssh/sshd_config
PermitRootLogin yes #change this to yes
#AllowUsers testuser #comment out this

Later, restart sshd, service sshd restart, and later ssh worked.

Categories: IT Architecture, Linux, Systems Tags: ,

resolved – fsinfo ERROR: Stale NFS file handle POST

May 15th, 2014 No comments

Today when I tried mount NFS share from one NFS server, it timeout with "mount.nfs: Connection timed out".

I tried to search something in /var/log/messages but no useful info there was found. So I used tcpdump on NFS client:

[root@dcs-hm1-qa132 ~]# tcpdump -nn -vvv host 10.120.33.90 #server is 10.120.33.90, client is 10.120.33.130
23:49:11.598407 IP (tos 0x0, ttl 64, id 26179, offset 0, flags [DF], proto TCP (6), length 96)
10.120.33.130.1649240682 > 10.120.33.90.2049: 40 null
23:49:11.598741 IP (tos 0x0, ttl 62, id 61186, offset 0, flags [DF], proto TCP (6), length 80)
10.120.33.90.2049 > 10.120.33.130.1649240682: reply ok 24 null
23:49:11.598812 IP (tos 0x0, ttl 64, id 26180, offset 0, flags [DF], proto TCP (6), length 148)
10.120.33.130.1666017898 > 10.120.33.90.2049: 92 fsinfo fh Unknown/0100010000000000000000000000000000000000000000000000000000000000
23:49:11.599176 IP (tos 0x0, ttl 62, id 61187, offset 0, flags [DF], proto TCP (6), length 88)
10.120.33.90.2049 > 10.120.33.130.1666017898: reply ok 32 fsinfo ERROR: Stale NFS file handle POST:
23:49:11.599254 IP (tos 0x0, ttl 64, id 26181, offset 0, flags [DF], proto TCP (6), length 148)
10.120.33.130.1682795114 > 10.120.33.90.2049: 92 fsinfo fh Unknown/010001000000000000002FFF000002580000012C0007B0C00000000A00000000
23:49:11.599627 IP (tos 0x0, ttl 62, id 61188, offset 0, flags [DF], proto TCP (6), length 88)
10.120.33.90.2049 > 10.120.33.130.1682795114: reply ok 32 fsinfo ERROR: Stale NFS file handle POST:

The reason of "ERROR: Stale NFS file handle POST" may caused by the following reasons:

1.The NFS server is no longer available
2.Something in the network is blocking
3.In a cluster during failover of NFS resource the major & minor numbers on the secondary server taking over is different from that of the primary.

To resolve the issue, you can try bounce NFS service on NFS server using /etc/init.d/nfs restart.

Categories: Hardware, NAS, Storage Tags:

resolved – show kitchen sink buttons when wordpress goes to fullscreen mode

April 11th, 2014 No comments

When you click the full-screen button of wordpress TinyMCE, wordpress will go to "Distraction-Free Writing mode", which benefits as the name suggests. However, you'll also find the toolbox of TinyMCE will only show a limited number of buttons and the second line of the toolbox(kitchen sink) will not show at all(I tried install plugin such as ultimate TinyMCE or advanced TinyMCE, but the issue remained):

full-screenPreviously, you can type ALT+SHIFT+G to go to another type of fullscreen mode, which has all buttons include kitchen sink ones. However, seems now the updated version of wordpress has disabled this feature.

To resolve this issue, we can insert the following code in functions.php of your theme:

function my_mce_fullscreen($buttons) {
$buttons[] = 'fullscreen';
return $buttons;
}
add_filter('mce_buttons', 'my_mce_fullscreen');

Later, the TinyMCE will have two full-screen button:

full-screen buttonsMake sure to click the SECOND full-screen button. When you do so, the editor will transform to the following appearance:

full-screen with kitchen sinkI assume this is what you're trying for, right?

 

 

Categories: Misc Tags: