write-protected regular file caused cp failed

Found issue below on linux box:

[oracle@testvm ~]$ cp /tmp/stbeehive.cer /u01/shared/
cp: cannot create regular file `/u01/shared/stbeehive.cer': Permission denied

And /u01/shared was 777 permission, and /tmp/stbeehive.cer below:

[oracle@testvm ~]$ ls -l /tmp/stbeehive.cer
-r-xr-xr-x 1 oracle oinstall 1930 Sep 12 06:37 /tmp/stbeehive.cer

After some troubleshooting, it's found the dest file /u01/shared/stbeehive.cer was already there (without write permission)

[root@testvm ~]# ls -l /u01/shared/stbeehive.cer
-r-xr-xr-x 1 oracle oinstall 1930 Sep 12 06:36 /u01/shared/stbeehive.cer

And after removing the dest file, then cp went good

[oracle@testvm ~]$ rm /u01/shared/stbeehive.cer
rm: remove write-protected regular file `/u01/shared/stbeehive.cer'? y

[oracle@testvm ~]$ cp /tmp/stbeehive.cer /u01/shared/

OEL linux upgrade kernel howto

First, set yum repo according to OS version (skip this if you already have yum repo configured)

cd /etc/yum.repos.d; mkdir bak;unalias mv;mv -f *.repo bak;uname -r|grep -q el5 && curl 'http://public-yum.oracle.com/public-yum-el5.repo' -o public-yum-el5.repo;uname -r|grep -q el6 && curl 'http://public-yum.oracle.com/public-yum-ol6.repo' -o public-yum-ol6.repo;uname -r|grep -q el7 && curl 'http://yum.oracle.com/public-yum-ol7.repo' -o public-yum-el7.repo;

Now edit yum repo to specify UEK Release to upgrade to (search "UEK" in yum file), take OEL6 yum file for example

  • ol6_UEK_latest - enable this will upgrade kernel to latest kernel version of current release, e.g. from 2.6.39-200.xxx to 2.6.39-400.xxx
  • ol6_UEKR3_latest - will upgrade from 2.xxx to 3.xxx
  • ol6_UEKR4 - will upgrade from 2.xxx/3.xxx to 4.xxx

After above, use yum list to confirm the kernel that will be upgraded to:

  • yum list|grep kernel-uek

Do the upgrade now

  • yum update kernel-uek*

Or you can specify version to upgrade to, e.g. to upgrade OEL linux kernel to 2.6.39-400.300.2.el6uek:

  • yum update kernel-uek*2.6.39-400.300.2.el6uek*

Check to see if the new kernel is in /boot/grub/grub.conf. If it's in /etc/grub.conf, but NOT in /boot/grub/grub.conf, then you need do below:

cp /boot/grub/grub.conf /boot/grub/grub.conf.bak;cp /etc/grub.conf /etc/grub.conf.bak

cat /etc/grub.conf > /boot/grub/grub.conf

rm /etc/grub.conf

ln -s /boot/grub/grub.conf /etc/grub.conf

linux set or change timezone howto

Suppose that you want to set or change linux local timezone to UTC:

cp /etc/sysconfig/clock /etc/sysconfig/clock.bak

echo -e "ZONE=\"UTC\"\nUTC=true\nARC=false" > /etc/sysconfig/clock

mv /etc/localtime{,.bak}

ln -s /usr/share/zoneinfo/UTC /etc/localtime

echo 'export TZ=UTC' >> /etc/profile

Now run "date" to confirm it's UTC:

[root@andy-doxer ~]# date
Mon Aug 20 07:10:17 UTC 2018

resolved - Remote Certificate has expired, NSS error -8181

When trying below curl command for SSL cert, error occurred:

[test@host1 ~]$ curl -v -X GET -u user:pass https://example.com/v1/api.sh
* About to connect() to example.com port 443 (#0)
*   Trying 192.168.247.61... connected
* Connected to example.com (192.168.247.61) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* Remote Certificate has expired.
* NSS error -8181
* Closing connection #0
* Peer certificate cannot be authenticated with known CA certificates
curl: (60) Peer certificate cannot be authenticated with known CA certificates
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

However, the cert had not expired (it's Aug 2018):

 [test@host1 ~]# echo | openssl s_client -connect example.com:443 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | openssl x509 -noout -dates
 depth=2 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root CA
 verify return:1
 depth=1 C = US, O = DigiCert Inc, CN = DigiCert SHA2 Secure Server CA
 verify return:1
 depth=0 C = US, ST = California, L = Redwood City, O = Oracle Corporation, OU = Oracle OCI SALT-LAKE-CITY, CN = example.com
 verify return:1
 DONE
 notBefore=Dec 13 00:00:00 2017 GMT
 notAfter=Dec 13 12:00:00 2018 GMT

After some debugging, it's found the system date was "2000", and after setting the correct system time to 2018, the issue got resolved.

ssh passwordless login with private key

On Server Side:

su - username

cd .ssh/

cat id_rsa.pub >> authorized_keys #if there is no id_rsa/id_rsa.pub, then generate them using "ssh-keygen -t rsa". When prompt for password, leave it empty

On Server Side:

Make sure "RSAAuthentication yes", "PubkeyAuthentication yes" is there in /etc/ssh/sshd_config (restart ssh if modified)

Make sure .ssh is 700, authorized_keys is 600

Copy id_rsa to client side, rename it as "private.key"

On client side:

chmod 600 private.key

ssh -i private.key username@server

resolved - ipmitool Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

If you met below error on physical servers (not VMs, as VM do not support IPMI)

    [root@localhost ~]# ipmitool
    Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

Then firstly you need make sure your server systemboard supports IPMI. 
Old system-boards might not support IPMI technology.

    [root@localhost ~]# dmidecode | grep -A 6 -i ipmi
    IPMI Device Information
        Interface Type: KCS (Keyboard Control Style)
        Specification Version: 1.5
        I2C Slave Address: 0x10
        NV Storage Device: Not Present
        Base Address: 0x0000000000000CA8 (I/O) #if not all zeros, then it supports IPMI
        Register Spacing: 32-bit Boundaries

If it's supported, then you need enable IPMI related modules:

    modprobe ipmi_devintf
    modprobe ipmi_si

Then add it to /etc/modules to have them loaded automatically:

    ipmi_devintf
    ipmi_si

To start IPMI:
    
    /etc/init.d/ipmi start
    /etc/init.d/ipmi status

PS:
    1. If there's no ipmitool command, try install it by "yum install -y OpenIPMI ipmitool"
    2. You may need add more modules

        [root@localhost ~]# modprobe ipmi_devintf
        [root@localhost ~]# modprobe ipmi_si
        [root@localhost ~]# modprobe ipmi_watchdog
        [root@localhost ~]# modprobe ipmi_poweroff
        [root@localhost ~]# modprobe ipmi_msghandler

VM shutdown stuck in “mount: you must specify the filesystem type, please stand by while rebooting the system”

When you issue "shutdown" or "reboot" on linux box and found "mount: you must specify the filesystem type, please stand by while rebooting the system":

Then one possible reason is that you have specified wrong mount options for nfs shares in /etc/fstab. For example, for nfsv3, please make sure to use below nfs options when you mount shares:

<share name> <mount dir> nfs rsize=32768,wsize=32768,hard,nolock,timeo=14,noacl,intr,mountvers=3,vers=3 0 0

And using below option will make VM shutdown stuck in "mount: you must specify the filesystem type". DO NOT use below:

<share name> <mount dir> nfs vers=3,rsize=32768,wsize=32768,hard,nolock,timeo=14,noacl,intr 0 0

TCP wrappers /etc/hosts.allow /etc/hosts.deny

A simple example on linux box:

[root@test ~]# cat /etc/hosts.allow
sshd : ALL EXCEPT host1.example.com
snmpd : ALL EXCEPT host1.example.com
ALL : localhost

[root@test ~]# cat /etc/hosts.deny
ALL:ALL

And here's explaining:

Service "sshd/snmpd" will accept connections from all hosts except host1.example.com. All services will accept connections from localhost. Other services will deny connections from all hosts.

 

resolved - net/core/dev.c:1894 skb_gso_segment+0x298/0x370()

Today on one of our servers, there were a lot of errors in /var/log/messages like below:

║Apr 14 21:50:25 test01 kernel: WARNING: at net/core/dev.c:1894
║skb_gso_segment+0x298/0x370()
║Apr 14 21:50:25 test01 kernel: Hardware name: SUN FIRE X4170 M3
║Apr 14 21:50:25 test01 kernel: : caps=(0x60014803, 0x0) len=255
║data_len=215 ip_summed=1
║Apr 14 21:50:25 test01 kernel: Modules linked in: dm_nfs nfs fscache
║auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd
║ @ sunrpc 8021q garp bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser
║ @ rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio
║ @ ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp dm_round_robin libiscsi
║ @ dm_multipath scsi_transport_iscsi xenfs xen_privcmd dm_mirror video sbs sbshc
║acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport sr_mod cdrom
║ixgbe hwmon dca snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
║snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore
║snd_page_alloc iTCO_wdt iTCO_vendor_support pcspkr ghes i2c_i801 hed i2c_core
║dm_region_hash dm_log dm_mod usb_storage ahci libahci sg shpchp megaraid_sas
║sd_mod crc_t10dif ext3 jbd mbcache
║Apr 14 21:50:25 test01 kernel: Pid: 0, comm: swapper Tainted: G W
║ 2.6.39-400.264.4.el5uek #1
║Apr 14 21:50:25 test01 kernel: Call Trace:
║Apr 14 21:50:25 test01 kernel: <IRQ> [<ffffffff8143dab8>] ?
║skb_gso_segment+0x298/0x370
║Apr 14 21:50:25 test01 kernel: [<ffffffff8106f300>]
║warn_slowpath_common+0x90/0xc0
║Apr 14 21:50:25 test01 kernel: [<ffffffff8106f42e>]
║warn_slowpath_fmt+0x6e/0x70
║Apr 14 21:50:25 test01 kernel: [<ffffffff810d73a7>] ?
║irq_to_desc+0x17/0x20
║Apr 14 21:50:25 test01 kernel: [<ffffffff812faf0c>] ?
║notify_remote_via_irq+0x2c/0x40
║Apr 14 21:50:25 test01 kernel: [<ffffffff8100a820>] ?
║xen_clocksource_read+0x20/0x30
║Apr 14 21:50:25 test01 kernel: [<ffffffff812faf4c>] ?
║xen_send_IPI_one+0x2c/0x40
║Apr 14 21:50:25 test01 kernel: [<ffffffff81011f10>] ?
║xen_smp_send_reschedule+0x10/0x20
║Apr 14 21:50:25 test01 kernel: [<ffffffff81056e0b>] ?
║ttwu_queue_remote+0x4b/0x60
║Apr 14 21:50:25 test01 kernel: [<ffffffff81509a7e>] ?
║_raw_spin_unlock_irqrestore+0x1e/0x30
║Apr 14 21:50:25 test01 kernel: [<ffffffff8143dab8>]
║skb_gso_segment+0x298/0x370
║Apr 14 21:50:25 test01 kernel: [<ffffffff8143dba6>]
║dev_gso_segment+0x16/0x50
║Apr 14 21:50:25 test01 kernel: [<ffffffff8143dfb5>]
║dev_hard_start_xmit+0x3d5/0x530
║Apr 14 21:50:25 test01 kernel: [<ffffffff8145a074>]
║sch_direct_xmit+0xc4/0x1d0
║Apr 14 21:50:25 test01 kernel: [<ffffffff8143e811>]
║dev_queue_xmit+0x161/0x410
║Apr 14 21:50:25 test01 kernel: [<ffffffff815099de>] ?
║_raw_spin_lock+0xe/0x20
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045820c>]
║br_dev_queue_push_xmit+0x6c/0xa0 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff81076e77>] ?
║local_bh_enable+0x27/0xa0
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045e7ba>]
║br_nf_dev_queue_xmit+0x2a/0x90 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045f668>]
║br_nf_post_routing+0x1f8/0x2e0 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff81467428>]
║nf_iterate+0x78/0x90
║Apr 14 21:50:25 test01 kernel: [<ffffffff8146777c>]
║nf_hook_slow+0x7c/0x130
║Apr 14 21:50:25 test01 kernel: [<ffffffffa04581a0>] ?
║br_forward_finish+0x70/0x70 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa04581a0>] ?
║br_forward_finish+0x70/0x70 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa0458130>] ?
║br_flood_deliver+0x20/0x20 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa0458186>]
║br_forward_finish+0x56/0x70 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045eba4>]
║br_nf_forward_finish+0xb4/0x180 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045f36f>]
║br_nf_forward_ip+0x26f/0x370 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff81467428>]
║nf_iterate+0x78/0x90
║Apr 14 21:50:25 test01 kernel: [<ffffffff8146777c>]
║nf_hook_slow+0x7c/0x130
║Apr 14 21:50:25 test01 kernel: [<ffffffffa0458130>] ?
║br_flood_deliver+0x20/0x20 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff81467428>] ?
║nf_iterate+0x78/0x90
║Apr 14 21:50:25 test01 kernel: [<ffffffffa0458130>] ?
║br_flood_deliver+0x20/0x20 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa04582c8>]
║__br_forward+0x88/0xc0 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa0458356>]
║br_forward+0x56/0x60 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa04591fc>]
║br_handle_frame_finish+0x1ac/0x240 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045ee1b>]
║br_nf_pre_routing_finish+0x1ab/0x350 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff8115bfe9>] ?
║kmem_cache_alloc_trace+0xc9/0x1a0
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045fc55>]
║br_nf_pre_routing+0x305/0x370 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff8100122a>] ?
║xen_hypercall_xen_version+0xa/0x20
║Apr 14 21:50:25 test01 kernel: [<ffffffff81467428>]
║nf_iterate+0x78/0x90
║Apr 14 21:50:25 test01 kernel: [<ffffffff8146777c>]
║nf_hook_slow+0x7c/0x130

To fix this, we should disable LRO(large receive offload) first:

for i in eth0 eth1 eth2 eth3;do /sbin/ethtool -K $i lro off;done

And if the NICs are of Intel 10G, the we should disable GRO(generic receive offload) too:

for i in eth0 eth1 eth2 eth3;do /sbin/ethtool -K $i gro off;done

Here's the command to disable both of LRO/GRO:

for i in eth0 eth1 eth2 eth3;do /sbin/ethtool -K $i gro off;/sbin/ethtool -K $i lro off;done

 

resolved - /etc/rc.local not executed on boot in linux

When you find your scripts in /etc/rc.local not executed along with system boots, then one possibility is that the previous subsys script takes too long to execute, as /etc/rc.local is usually the last one to execute, i.e. S99local. To prove which is the culprit subsys that gets stuck, you can edit /etc/rc.d/rc(which is from /etc/inittab):

[root@host1 tmp] vi /etc/rc.d/rc
# Now run the START scripts.
for i in /etc/rc$runlevel.d/S* ; do
        check_runlevel "$i" || continue

        # Check if the subsystem is already up.
        subsys=${i#/etc/rc$runlevel.d/S??}
        [ -f /var/lock/subsys/$subsys -o -f /var/lock/subsys/$subsys.init ] \
                && continue

        # If we're in confirmation mode, get user confirmation
        if [ -f /var/run/confirm ]; then
                confirm $subsys
                test $? = 1 && continue
        fi

        update_boot_stage "$subsys"
        # Bring the subsystem up.
        if [ "$subsys" = "halt" -o "$subsys" = "reboot" ]; then
                export LC_ALL=C
                exec $i start
        fi
        if LC_ALL=C egrep -q "^..*init.d/functions" $i \
                        || [ "$subsys" = "single" -o "$subsys" = "local" ]; then
                echo $i>>/var/tmp/process.txt
                $i start
                echo $i>>/var/tmp/process_end.txt
        else
                echo $i>>/var/tmp/process_self.txt
                action $"Starting $subsys: " $i start
                echo $i>>/var/tmp/process_self_end.txt
        fi
done

Then you can reboot the system, and check files /var/tmp/{process.txt,process_end.txt,process_self.txt,process_self_end.txt}. In one of the host, I found below entries:

[root@host1 tmp]# tail process.txt
/etc/rc3.d/S85gpm
/etc/rc3.d/S90crond
/etc/rc3.d/S90xfs
/etc/rc3.d/S91vncserver
/etc/rc3.d/S95anacron
/etc/rc3.d/S95atd
/etc/rc3.d/S95emagent_public
/etc/rc3.d/S97rhnsd
/etc/rc3.d/S98avahi-daemon
/etc/rc3.d/S98gcstartup

[root@host1 tmp]# tail process_end.txt
/etc/rc3.d/S85gpm
/etc/rc3.d/S90crond
/etc/rc3.d/S90xfs
/etc/rc3.d/S91vncserver
/etc/rc3.d/S95anacron
/etc/rc3.d/S95atd
/etc/rc3.d/S95emagent_public
/etc/rc3.d/S97rhnsd
/etc/rc3.d/S98avahi-daemon

So from here, we can see /etc/rc3.d/S98gcstartup tried start, but it took too long to finish. To make sure scripts in /etc/rc.local get executed and also the stuck script /etc/rc3.d/S98gcstartup get executed also, we can do this:

[root@host1 tmp]# mv /etc/rc3.d/S98gcstartup /etc/rc3.d/s98gcstartup
[root@host1 tmp]# vi /etc/rc.local

#!/bin/sh

touch /var/lock/subsys/local

#put your scripts here - begin

#put your scripts here - end

#put the stuck script here and make sure it's the last line
/etc/rc3.d/s98gcstartup start

After this, reboot the host and check whether scripts in /etc/rc.local got executed.