server not pingable caused by failed switch port

It's found that server testhost01 is not pingable, and by locating the port on switch, below symptoms were found:

#1, port is is up and connected, and mac is learnt on the interface on both switch/L3 router

ucf-a1z6-as50-swi-1#sh int Gi1/27
GigabitEthernet1/27 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet Port, address is 5057.a8c4.99da (bia 5057.a8c4.99da)
Description: testhost01-net0

ucf-a1z6-as50-swi-1#sh mac address-table interface Gi1/27
vlan mac address type protocols port
-------+---------------+--------+---------------------+--------------------
124 0010.e00d.a2ba dynamic ip GigabitEthernet1/27

ucf-a1z6-rtr-1# sh ip arp vrf all | i 0010.e00d.a2ba
10.245.168.147 0.896955 0010.e00d.a2ba Vlan124

#2, we are continuously observing block transmit queue on the switch port

Oct 12 10:52:31.764: %C4K_HWPORTMAN-4-BLOCKEDTXQUEUE: Blocked transmit queue HwTxQId1 on Switch Phyport Gi1/27, count=583830
Oct 12 10:52:31.764: %C4K_HWPORTMAN-4-BLOCKEDTXQUEUE: Blocked transmit queue HwTxQId3 on Switch Phyport Gi1/27, count=583831
Oct 12 10:53:50.706: %C4K_HWPORTMAN-4-BLOCKEDTXQUEUE: Blocked transmit queue HwTxQId2 on Switch Phyport Gi1/27, count=532423

ucf-a1z6-as50-swi-1#sh clock
11:01:00.229 UTC Fri Oct 12 2018

#3, the drops on the switch port interface are increasing, and no traffic on the interface

ucf-a1z6-as50-swi-1#sh int gi1/27 | i drop|res|flap
Hardware is Gigabit Ethernet Port, address is 5057.a8c4.99da (bia 5057.a8c4.99da)
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 601929

ucf-a1z6-as50-swi-1#sh int gi1/27 | i drop|res|flap
Hardware is Gigabit Ethernet Port, address is 5057.a8c4.99da (bia 5057.a8c4.99da)
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 602075

ucf-a1z6-as50-swi-1#sh int gi1/27 | i load|rate
reliability 255/255, txload 1/255, rxload 1/255
Queueing strategy: fifo
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec

From above, we can see continuous pause frames being sent to the switch causes the transmit buffer on switch port to fill up. When the transmit buffer on switch port is full, it drops further traffic destined to the server.

And checking from server side, the host was not hung, and we tried rebooting host but issue persisted. Also, LED status of Gi1/27 is not amber either.

Based on above, we planned to try with below:

1#, move the cable from Gi1/27 to Gi1/7 which is of the same vlan on the switch and in "not connected" status.

2#, if above doesn't help, then most probably the NIC of server is faulty. We can try replace the NIC card or connect secondary NIC to Gi1/27 earlier unplugged.

Finally, we tried option #1, and the issue got resolved.

write-protected regular file caused cp failed

Found issue below on linux box:

[oracle@testvm ~]$ cp /tmp/stbeehive.cer /u01/shared/
cp: cannot create regular file `/u01/shared/stbeehive.cer': Permission denied

And /u01/shared was 777 permission, and /tmp/stbeehive.cer below:

[oracle@testvm ~]$ ls -l /tmp/stbeehive.cer
-r-xr-xr-x 1 oracle oinstall 1930 Sep 12 06:37 /tmp/stbeehive.cer

After some troubleshooting, it's found the dest file /u01/shared/stbeehive.cer was already there (without write permission)

[root@testvm ~]# ls -l /u01/shared/stbeehive.cer
-r-xr-xr-x 1 oracle oinstall 1930 Sep 12 06:36 /u01/shared/stbeehive.cer

And after removing the dest file, then cp went good

[oracle@testvm ~]$ rm /u01/shared/stbeehive.cer
rm: remove write-protected regular file `/u01/shared/stbeehive.cer'? y

[oracle@testvm ~]$ cp /tmp/stbeehive.cer /u01/shared/

OEL linux upgrade kernel howto

First, set yum repo according to OS version (skip this if you already have yum repo configured)

cd /etc/yum.repos.d; mkdir bak;unalias mv;mv -f *.repo bak;uname -r|grep -q el5 && curl 'http://public-yum.oracle.com/public-yum-el5.repo' -o public-yum-el5.repo;uname -r|grep -q el6 && curl 'http://public-yum.oracle.com/public-yum-ol6.repo' -o public-yum-ol6.repo;uname -r|grep -q el7 && curl 'http://yum.oracle.com/public-yum-ol7.repo' -o public-yum-el7.repo;

Now edit yum repo to specify UEK Release to upgrade to (search "UEK" in yum file), take OEL6 yum file for example

  • ol6_UEK_latest - enable this will upgrade kernel to latest kernel version of current release, e.g. from 2.6.39-200.xxx to 2.6.39-400.xxx
  • ol6_UEKR3_latest - will upgrade from 2.xxx to 3.xxx
  • ol6_UEKR4 - will upgrade from 2.xxx/3.xxx to 4.xxx

After above, use yum list to confirm the kernel that will be upgraded to:

  • yum list|grep kernel-uek

Do the upgrade now

  • yum update kernel-uek*

Or you can specify version to upgrade to, e.g. to upgrade OEL linux kernel to 2.6.39-400.300.2.el6uek:

  • yum update kernel-uek*2.6.39-400.300.2.el6uek*

Check to see if the new kernel is in /boot/grub/grub.conf. If it's in /etc/grub.conf, but NOT in /boot/grub/grub.conf, then you need do below:

cp /boot/grub/grub.conf /boot/grub/grub.conf.bak;cp /etc/grub.conf /etc/grub.conf.bak

cat /etc/grub.conf > /boot/grub/grub.conf

rm /etc/grub.conf

ln -s /boot/grub/grub.conf /etc/grub.conf

linux set or change timezone howto

Suppose that you want to set or change linux local timezone to UTC:

cp /etc/sysconfig/clock /etc/sysconfig/clock.bak

echo -e "ZONE=\"UTC\"\nUTC=true\nARC=false" > /etc/sysconfig/clock

mv /etc/localtime{,.bak}

ln -s /usr/share/zoneinfo/UTC /etc/localtime

echo 'export TZ=UTC' >> /etc/profile

Now run "date" to confirm it's UTC:

[root@andy-doxer ~]# date
Mon Aug 20 07:10:17 UTC 2018

resolved - Remote Certificate has expired, NSS error -8181

When trying below curl command for SSL cert, error occurred:

[test@host1 ~]$ curl -v -X GET -u user:pass https://example.com/v1/api.sh
* About to connect() to example.com port 443 (#0)
*   Trying 192.168.247.61... connected
* Connected to example.com (192.168.247.61) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* Remote Certificate has expired.
* NSS error -8181
* Closing connection #0
* Peer certificate cannot be authenticated with known CA certificates
curl: (60) Peer certificate cannot be authenticated with known CA certificates
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

However, the cert had not expired (it's Aug 2018):

 [test@host1 ~]# echo | openssl s_client -connect example.com:443 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | openssl x509 -noout -dates
 depth=2 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root CA
 verify return:1
 depth=1 C = US, O = DigiCert Inc, CN = DigiCert SHA2 Secure Server CA
 verify return:1
 depth=0 C = US, ST = California, L = Redwood City, O = Oracle Corporation, OU = Oracle OCI SALT-LAKE-CITY, CN = example.com
 verify return:1
 DONE
 notBefore=Dec 13 00:00:00 2017 GMT
 notAfter=Dec 13 12:00:00 2018 GMT

After some debugging, it's found the system date was "2000", and after setting the correct system time to 2018, the issue got resolved.

NFS read write access to normal user (permit root read write)

Assume that on SUN ZFS server, you want to create a NFS share which can only be RW to a normal user on a client machine (even root user can not read/write), then you can follow below steps:

  • when creating project, enter the normal user UID/GID and select 755

 

 

 

 

 

 

 

 

 

  • in NFS share Protocols tab, set "anonymous user mapping" to nobody, Enter the FQDN of the host you want to mount, and uncheck "Root access".

 

 

 

 

 

 

 

 

  • Now have a test on the host, you would find root cannot write, but the user corresponding to the UID/GID specified could.

ssh passwordless login with private key

On Server Side:

su - username

cd .ssh/

cat id_rsa.pub >> authorized_keys #if there is no id_rsa/id_rsa.pub, then generate them using "ssh-keygen -t rsa". When prompt for password, leave it empty

On Server Side:

Make sure "RSAAuthentication yes", "PubkeyAuthentication yes" is there in /etc/ssh/sshd_config (restart ssh if modified)

Make sure .ssh is 700, authorized_keys is 600

Copy id_rsa to client side, rename it as "private.key"

On client side:

chmod 600 private.key

ssh -i private.key username@server

resolved - ipmitool Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

If you met below error on physical servers (not VMs, as VM do not support IPMI)

    [root@localhost ~]# ipmitool
    Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

Then firstly you need make sure your server systemboard supports IPMI. 
Old system-boards might not support IPMI technology.

    [root@localhost ~]# dmidecode | grep -A 6 -i ipmi
    IPMI Device Information
        Interface Type: KCS (Keyboard Control Style)
        Specification Version: 1.5
        I2C Slave Address: 0x10
        NV Storage Device: Not Present
        Base Address: 0x0000000000000CA8 (I/O) #if not all zeros, then it supports IPMI
        Register Spacing: 32-bit Boundaries

If it's supported, then you need enable IPMI related modules:

    modprobe ipmi_devintf
    modprobe ipmi_si

Then add it to /etc/modules to have them loaded automatically:

    ipmi_devintf
    ipmi_si

To start IPMI:
    
    /etc/init.d/ipmi start
    /etc/init.d/ipmi status

PS:
    1. If there's no ipmitool command, try install it by "yum install -y OpenIPMI ipmitool"
    2. You may need add more modules

        [root@localhost ~]# modprobe ipmi_devintf
        [root@localhost ~]# modprobe ipmi_si
        [root@localhost ~]# modprobe ipmi_watchdog
        [root@localhost ~]# modprobe ipmi_poweroff
        [root@localhost ~]# modprobe ipmi_msghandler

google chrome installation on rhel7 oel7 centos7 linux

  • Setup root VNC on OEL7
  • Enable NetworkManager

chkconfig --list NetworkManager

service NetworkManager status

cd /etc/sysconfig/network-scripts/ #add or edit below line in ifcfg-eth0/1

NM_CONTROLLED=yes

chkconfig NetworkManager on

service NetworkManager status

service NetworkManager start

  • In root VNC

search "network" in "Activities", select "Network proxy",

Configure proxy with Automatic, use the url fit to your env (if needed)

Configure DNS for NICs in this same dialogue (in IPv4, DNS uncheck "Automatic", and enter DNS servers)

  • Reboot
  • After host is up

wget https://dl.google.com/linux/linux_signing_key.pub

rpm --import linux_signing_key.pub

wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm

yum -y localinstall google-chrome-stable_current_x86_64.rpm

rpm -qa|grep chrome

#If you are running chrome as root

google-chrome --no-sandbox &

xfs tips

Before growing an XFS file system with -D size, ensure that the underlying block device is of an appropriate size to hold the file system later (e.g. pvcreate/vgextend/lvextend).

xfs_growfs /partition/name

xfs_growfs /mount/point -D size

The -D size option grows the file system to the specified size (expressed in file system blocks). Without the -D size option, xfs_growfs will grow the file system to the maximum size supported by the device.

While XFS file systems can be grown while mounted, their size cannot be reduced at all.