Archive

Author Archive

resolved – Artifactory: Return code is: 502, ReasonPhrase:cannot connect

August 22nd, 2016 Comments off

Today when I tried to visit Maven Artifactory, below error prompted in browser:

Return code is: 502, ReasonPhrase:cannot connect.

And here's the log in

2016-08-19 22:48:19,942 [art-init] [ERROR] (o.a.w.s.ArtifactoryContextConfigListener:91) - Application could not be initialized: Connection refused
java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.7.0_51]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) ~[na:1.7.0_51
]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:
1.7.0_51]

......

Firstly, I thought it was caused by JDBC issue, here's the config in JDBC config file storage.properties:

# cat /u01/oracle/artifactory_home/etc/storage.properties
type=oracle
driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@localhost:1521:ORCL
username=artifactory
password=password1

I tried connect to DB using sqlplus to have a test, it's OK:

[root@testvm ~]# su - oracle
[oracle@testvm ~]$ export ORACLE_HOME=/u01/oracle/db12c/product/12.1.0/dbhome_1
[oracle@testvm ~]$ ORACLE_SID=orcl
[oracle@testvm ~]$ export PATH=$ORACLE_HOME/bin:$PATH
[oracle@testvm ~]$ sqlplus artifactory/password1

SQL*Plus: Release 12.1.0.1.0 Production on Mon Aug 22 03:28:20 2016

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Last Successful login time: Mon Aug 22 2016 03:21:26 +00:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL>

After some debugging, I found the issue was caused by proxy setting. We should disable it like below:

[oracle@testvm ~]# grep proxy /etc/profile
#export http_proxy=http://myproxy.example.com:80/

Then log out and log on the host again, and ran below commands to shutdown/startup Artifactory:

[oracle@testvm ~]$ export ORACLE_HOME=/u01/oracle/db12c/product/12.1.0/dbhome_1/
[oracle@testvm ~]$ export ORACLE_SID=orcl
[oracle@testvm ~]$ export PATH=$PATH:$ORACLE_HOME/bin
[oracle@testvm ~]$ /u01/oracle/apache-tomcat-7.0.52/bin/shutdown.sh
[oracle@testvm ~]$ /u01/oracle/apache-tomcat-7.0.52/bin/startup.sh

After this, Artifactory started working again.

Categories: IT Architecture, Programming Tags:

lvm volume resize by extending virtual disk image file

June 27th, 2016 Comments off

Below should be ran from Dom0 hosting DomU:

[root@Dom0 ~]# virt-filesystems -a System.img
/dev/sda1
/dev/vg01/lv_root

[root@Dom0 ~]# virt-filesystems --long --parts --blkdevs -h -a System.img
Name Type MBR Size Parent
/dev/sda1 partition 83 500M /dev/sda
/dev/sda2 partition 8e 12G /dev/sda
/dev/sda device - 12G -

[root@Dom0 ~]# truncate -s 20G System_new.img

[root@Dom0 ~]# virt-resize --expand /dev/sda2 System.img System_new.img

[root@Dom0 ~]# mv System.img System.img.bak;mv System_new.img System.img

[root@Dom0 ~]# xm create vm.cfg -c #the first run may get issue "device cannot be connected", you can just run it again, the issue should be gone

Below should be ran from DomU:

[root@DomU ~]# vgs
VG #PV #LV #SN Attr VSize VFree
vg01 1 2 0 wz--n- 20.51g 8.00g

[root@DomU ~]# lvextend -L +8g /dev/mapper/vg01-lv_root
[root@DomU ~]# resize2fs /dev/mapper/vg01-lv_root

[root@DomU ~]# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg01-lv_root
20G 10G 10G 50% /

Categories: Clouding, IT Architecture, Oracle Cloud Tags:

linux dhcp config

June 8th, 2016 Comments off
yum install dhcp

vi /etc/dhcpd.conf

    ddns-update-style ad-hoc;
    default-lease-time 600;
    max-lease-time 7200;

    subnet 192.168.144.0 netmask 255.255.248.0 {
        range 192.168.149.112 192.168.149.113;
        option domain-name-servers 10.135.82.180, 10.135.82.188;
        option routers 192.168.144.1;
    }

    subnet 192.168.128.0 netmask 255.255.248.0 {
    }

    host andy {
        hardware ethernet 00:16:3E:86:51:DC;
        fixed-address 10.1.1.100;
    }

/etc/init.d/dhcpd start


[root@dhcpd ~]# ls -l /var/lib/dhcp/dhcpd.leases

[root@dhcpd ~]# lsof -i :67
    COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
    dhcpd 28207 root 5u IPv4 1019206 UDP *:bootps

[root@client ~]# dhclient eth0 -s 192.168.149.109

PS:

1. Here's an article about how DHCP works. You can get more info here and here.
2.For security reason, the DHCP offer packages sent from DHCP server we build may
  get blocked. So we need ask Network Support to config DHCP relay on their DHCP 
  server to relay specified requests to use our own DHCP server. 
  
  As indicated below: 

      Jun  8 09:22:50 host1 dhclient: No DHCPOFFERS received.

Categories: IT Architecture, Linux, Systems Tags:

TCP wrappers /etc/hosts.allow /etc/hosts.deny

June 2nd, 2016 Comments off

A simple example on linux box:

[root@test ~]# cat /etc/hosts.allow
sshd : ALL EXCEPT host1.example.com
snmpd : ALL EXCEPT host1.example.com
ALL : localhost

[root@test ~]# cat /etc/hosts.deny
ALL:ALL

And here's explaining:

Service "sshd/snmpd" will accept connections from all hosts except host1.example.com. All services will accept connections from localhost. Other services will deny connections from all hosts.

 

Categories: IT Architecture, Linux, Systems, Unix Tags:

resolved – The viewport is not between 320 and 420 pixels wide

April 28th, 2016 Comments off

Today when I tried to enable page-level ads of google adsense, I met below error:

This page cannot display anchor ads for the following reason(s):

  • The viewport is not between 320 and 420 pixels wide.
  • The current browser is not supported.

The cause by this is that the wordpress theme currently used didn't support viewport(more info about viewport is here). To enable viewport, we can install one plugin - Definitely allow mobile zooming. After activating them, you can try again.

 

Categories: Misc Tags:

resolved – Windows cannot access the specified device, path, or file. You may not have the appropriate permission to access the item

April 22nd, 2016 Comments off

I've been having this strange problem since this morning after starting office laptop from Hibernate. Restart is also not fixing it.

For any program, file, it just says that Windows cannot find the file ...etc . Simple search shows that it comes from anti virus. So, it is coming from Mcafee Host IPS. As soon as I turn that off from Mcaffee tray menu > Quick settings > Host IPS off, programs start working.

cannot_access

 

And here's a quick fix for this (Right click McAfee icon, then Quick settings, uncheck 'Host IPS - off'):

IPS_off

 

Categories: IT Architecture, Systems, Windows Tags:

resolved – net/core/dev.c:1894 skb_gso_segment+0x298/0x370()

April 19th, 2016 Comments off

Today on one of our servers, there were a lot of errors in /var/log/messages like below:

║Apr 14 21:50:25 test01 kernel: WARNING: at net/core/dev.c:1894
║skb_gso_segment+0x298/0x370()
║Apr 14 21:50:25 test01 kernel: Hardware name: SUN FIRE X4170 M3
║Apr 14 21:50:25 test01 kernel: : caps=(0x60014803, 0x0) len=255
║data_len=215 ip_summed=1
║Apr 14 21:50:25 test01 kernel: Modules linked in: dm_nfs nfs fscache
║auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd
║ @ sunrpc 8021q garp bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser
║ @ rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio
║ @ ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp dm_round_robin libiscsi
║ @ dm_multipath scsi_transport_iscsi xenfs xen_privcmd dm_mirror video sbs sbshc
║acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport sr_mod cdrom
║ixgbe hwmon dca snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
║snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore
║snd_page_alloc iTCO_wdt iTCO_vendor_support pcspkr ghes i2c_i801 hed i2c_core
║dm_region_hash dm_log dm_mod usb_storage ahci libahci sg shpchp megaraid_sas
║sd_mod crc_t10dif ext3 jbd mbcache
║Apr 14 21:50:25 test01 kernel: Pid: 0, comm: swapper Tainted: G W
║ 2.6.39-400.264.4.el5uek #1
║Apr 14 21:50:25 test01 kernel: Call Trace:
║Apr 14 21:50:25 test01 kernel: <IRQ> [<ffffffff8143dab8>] ?
║skb_gso_segment+0x298/0x370
║Apr 14 21:50:25 test01 kernel: [<ffffffff8106f300>]
║warn_slowpath_common+0x90/0xc0
║Apr 14 21:50:25 test01 kernel: [<ffffffff8106f42e>]
║warn_slowpath_fmt+0x6e/0x70
║Apr 14 21:50:25 test01 kernel: [<ffffffff810d73a7>] ?
║irq_to_desc+0x17/0x20
║Apr 14 21:50:25 test01 kernel: [<ffffffff812faf0c>] ?
║notify_remote_via_irq+0x2c/0x40
║Apr 14 21:50:25 test01 kernel: [<ffffffff8100a820>] ?
║xen_clocksource_read+0x20/0x30
║Apr 14 21:50:25 test01 kernel: [<ffffffff812faf4c>] ?
║xen_send_IPI_one+0x2c/0x40
║Apr 14 21:50:25 test01 kernel: [<ffffffff81011f10>] ?
║xen_smp_send_reschedule+0x10/0x20
║Apr 14 21:50:25 test01 kernel: [<ffffffff81056e0b>] ?
║ttwu_queue_remote+0x4b/0x60
║Apr 14 21:50:25 test01 kernel: [<ffffffff81509a7e>] ?
║_raw_spin_unlock_irqrestore+0x1e/0x30
║Apr 14 21:50:25 test01 kernel: [<ffffffff8143dab8>]
║skb_gso_segment+0x298/0x370
║Apr 14 21:50:25 test01 kernel: [<ffffffff8143dba6>]
║dev_gso_segment+0x16/0x50
║Apr 14 21:50:25 test01 kernel: [<ffffffff8143dfb5>]
║dev_hard_start_xmit+0x3d5/0x530
║Apr 14 21:50:25 test01 kernel: [<ffffffff8145a074>]
║sch_direct_xmit+0xc4/0x1d0
║Apr 14 21:50:25 test01 kernel: [<ffffffff8143e811>]
║dev_queue_xmit+0x161/0x410
║Apr 14 21:50:25 test01 kernel: [<ffffffff815099de>] ?
║_raw_spin_lock+0xe/0x20
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045820c>]
║br_dev_queue_push_xmit+0x6c/0xa0 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff81076e77>] ?
║local_bh_enable+0x27/0xa0
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045e7ba>]
║br_nf_dev_queue_xmit+0x2a/0x90 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045f668>]
║br_nf_post_routing+0x1f8/0x2e0 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff81467428>]
║nf_iterate+0x78/0x90
║Apr 14 21:50:25 test01 kernel: [<ffffffff8146777c>]
║nf_hook_slow+0x7c/0x130
║Apr 14 21:50:25 test01 kernel: [<ffffffffa04581a0>] ?
║br_forward_finish+0x70/0x70 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa04581a0>] ?
║br_forward_finish+0x70/0x70 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa0458130>] ?
║br_flood_deliver+0x20/0x20 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa0458186>]
║br_forward_finish+0x56/0x70 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045eba4>]
║br_nf_forward_finish+0xb4/0x180 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045f36f>]
║br_nf_forward_ip+0x26f/0x370 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff81467428>]
║nf_iterate+0x78/0x90
║Apr 14 21:50:25 test01 kernel: [<ffffffff8146777c>]
║nf_hook_slow+0x7c/0x130
║Apr 14 21:50:25 test01 kernel: [<ffffffffa0458130>] ?
║br_flood_deliver+0x20/0x20 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff81467428>] ?
║nf_iterate+0x78/0x90
║Apr 14 21:50:25 test01 kernel: [<ffffffffa0458130>] ?
║br_flood_deliver+0x20/0x20 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa04582c8>]
║__br_forward+0x88/0xc0 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa0458356>]
║br_forward+0x56/0x60 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa04591fc>]
║br_handle_frame_finish+0x1ac/0x240 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045ee1b>]
║br_nf_pre_routing_finish+0x1ab/0x350 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff8115bfe9>] ?
║kmem_cache_alloc_trace+0xc9/0x1a0
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045fc55>]
║br_nf_pre_routing+0x305/0x370 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff8100122a>] ?
║xen_hypercall_xen_version+0xa/0x20
║Apr 14 21:50:25 test01 kernel: [<ffffffff81467428>]
║nf_iterate+0x78/0x90
║Apr 14 21:50:25 test01 kernel: [<ffffffff8146777c>]
║nf_hook_slow+0x7c/0x130

To fix this, we should disable LRO(large receive offload) first:

for i in eth0 eth1 eth2 eth3;do /sbin/ethtool -K $i lro off;done

And if the NICs are of Intel 10G, the we should disable GRO(generic receive offload) too:

for i in eth0 eth1 eth2 eth3;do /sbin/ethtool -K $i gro off;done

Here's the command to disable both of LRO/GRO:

for i in eth0 eth1 eth2 eth3;do /sbin/ethtool -K $i gro off;/sbin/ethtool -K $i lro off;done

 

create shared iscsi LUNs from local disk on Linux

January 19th, 2016 Comments off

We can use iscsitarget to share local disks as iscsi LUNs for clients. Below are brief steps.

First, install some packages:

yum install kernel-devel iscsi-initiator-utils -y #'kernel-uek-devel' if you are using oracle linux with UEK
cd iscsitarget-1.4.20.2 #download the package from here
make
make install

And below are some useful tips about iscsitarget:

The iSCSI target consists of a kernel module (/lib/modules/`uname -r`/extra/iscsi/iscsi_trgt.ko)

The kernel modules will be installed in the module directory of the kernel

The daemon(/usr/sbin/ietd) and the control tool(/usr/sbin/ietadm)

/etc/init.d/iscsi-target status

/etc/iet/{ietd.conf, initiators.allow, targets.allow}

Later, we can modify IET(iSCSI Enterprise Target) config file:

vi /etc/iet/ietd.conf

    Target iqn.2001-04.org.doxer:server1-shared-local-disks
        Lun 0 Path=/dev/sdb1,Type=fileio,ScsiId=0,ScsiSN=doxerorg
        Lun 1 Path=/dev/sdb2,Type=fileio,ScsiId=1,ScsiSN=doxerorg
        Lun 2 Path=/dev/sdb3,Type=fileio,ScsiId=2,ScsiSN=doxerorg
        Lun 3 Path=/dev/sdb4,Type=fileio,ScsiId=3,ScsiSN=doxerorg
        Lun 4 Path=/dev/sdb5,Type=fileio,ScsiId=4,ScsiSN=doxerorg
        Lun 5 Path=/dev/sdb6,Type=fileio,ScsiId=5,ScsiSN=doxerorg
        Lun 6 Path=/dev/sdb7,Type=fileio,ScsiId=6,ScsiSN=doxerorg
        Lun 7 Path=/dev/sdb8,Type=fileio,ScsiId=7,ScsiSN=doxerorg

chkconfig iscsi-target on
/etc/init.d/iscsi-target start

Assume the server sharing local disks for iscsi LUN is with IP 192.168.10.212, and we can do below on client hosts to scan for iscsi LUNs:

[root@client01 ~]# iscsiadm -m discovery -t st -p 192.168.10.212
Starting iscsid:                                           [  OK  ]
192.168.10.212:3260,1 iqn.2001-04.org.doxer:server1-shared-local-disks

[root@client01 ~]# iscsiadm -m node -T iqn.2001-04.org.doxer:server1-shared-local-disks -p 192.168.10.212 -l
Logging in to [iface: default, target: iqn.2001-04.org.doxer:server1-shared-local-disks, portal: 192.168.10.212,3260] (multiple)
Login to [iface: default, target: iqn.2001-04.org.doxer:server1-shared-local-disks, portal: 192.168.10.212,3260] successful.

[root@client01 ~]# iscsiadm -m session --rescan #or iscsiadm -m session -r SID --rescan
Rescanning session [sid: 1, target: iqn.2001-04.org.doxer:server1-shared-local-disks, portal: 192.168.10.212,3260]

[root@client01 ~]# iscsiadm -m session -P 3

You can also scan for LUNs on the server with local disk shared, but you should make sure iscsi-target service boot up between network & iscsi:

mv /etc/rc3.d/S39iscsi-target /etc/rc3.d/S12iscsi-target

PS:

  1. iSCSI target port default to 3260. You can check iscsi connection info in /var/lib/iscsi/send_targets/ - <iscsi portal ip, port>, and /var/lib/iscsi/nodes/ - <target iqn>/<iscsi portal ip, port>.
  2. If there are multiple targets to log on, we can use "iscsiadm -m node --loginall=all".  "iscsiadm -m node -T iqn.2001-04.org.doxer:server1-shared-local-disks -p 192.168.10.212 -u" to log out.
  3. More info is here (includes windows iscsi operation), and here is about create iSCSI on Oracle ZFS appliance.

 

Categories: Hardware, NAS, SAN, Storage Tags:

resolved – /etc/rc.local not executed on boot in linux

November 11th, 2015 Comments off

When you find your scripts in /etc/rc.local not executed along with system boots, then one possibility is that the previous subsys script takes too long to execute, as /etc/rc.local is usually the last one to execute, i.e. S99local. To prove which is the culprit subsys that gets stuck, you can edit /etc/rc.d/rc(which is from /etc/inittab):

[root@host1 tmp] vi /etc/rc.d/rc
# Now run the START scripts.
for i in /etc/rc$runlevel.d/S* ; do
        check_runlevel "$i" || continue

        # Check if the subsystem is already up.
        subsys=${i#/etc/rc$runlevel.d/S??}
        [ -f /var/lock/subsys/$subsys -o -f /var/lock/subsys/$subsys.init ] \
                && continue

        # If we're in confirmation mode, get user confirmation
        if [ -f /var/run/confirm ]; then
                confirm $subsys
                test $? = 1 && continue
        fi

        update_boot_stage "$subsys"
        # Bring the subsystem up.
        if [ "$subsys" = "halt" -o "$subsys" = "reboot" ]; then
                export LC_ALL=C
                exec $i start
        fi
        if LC_ALL=C egrep -q "^..*init.d/functions" $i \
                        || [ "$subsys" = "single" -o "$subsys" = "local" ]; then
                echo $i>>/var/tmp/process.txt
                $i start
                echo $i>>/var/tmp/process_end.txt
        else
                echo $i>>/var/tmp/process_self.txt
                action $"Starting $subsys: " $i start
                echo $i>>/var/tmp/process_self_end.txt
        fi
done

Then you can reboot the system, and check files /var/tmp/{process.txt,process_end.txt,process_self.txt,process_self_end.txt}. In one of the host, I found below entries:

[root@host1 tmp]# tail process.txt
/etc/rc3.d/S85gpm
/etc/rc3.d/S90crond
/etc/rc3.d/S90xfs
/etc/rc3.d/S91vncserver
/etc/rc3.d/S95anacron
/etc/rc3.d/S95atd
/etc/rc3.d/S95emagent_public
/etc/rc3.d/S97rhnsd
/etc/rc3.d/S98avahi-daemon
/etc/rc3.d/S98gcstartup

[root@host1 tmp]# tail process_end.txt
/etc/rc3.d/S85gpm
/etc/rc3.d/S90crond
/etc/rc3.d/S90xfs
/etc/rc3.d/S91vncserver
/etc/rc3.d/S95anacron
/etc/rc3.d/S95atd
/etc/rc3.d/S95emagent_public
/etc/rc3.d/S97rhnsd
/etc/rc3.d/S98avahi-daemon

So from here, we can see /etc/rc3.d/S98gcstartup tried start, but it took too long to finish. To make sure scripts in /etc/rc.local get executed and also the stuck script /etc/rc3.d/S98gcstartup get executed also, we can do this:

[root@host1 tmp]# mv /etc/rc3.d/S98gcstartup /etc/rc3.d/s98gcstartup
[root@host1 tmp]# vi /etc/rc.local

#!/bin/sh

touch /var/lock/subsys/local

#put your scripts here - begin

#put your scripts here - end

#put the stuck script here and make sure it's the last line
/etc/rc3.d/s98gcstartup start

After this, reboot the host and check whether scripts in /etc/rc.local got executed.

Categories: IT Architecture, Kernel, Linux, Systems, Unix Tags:

resolved – xend error: (98, ‘Address already in use’)

November 4th, 2015 Comments off

Today one OVS server met issue with ovs-agent and need reboot. As there were VMs running on it, so I tried live migrating xen based VMs using "xm migrate -l", but below error occurred:

-bash-3.2# xm migrate -l vm1 server1
Error: can't connect: (111, 'Connection refused')
Usage: xm migrate  

Migrate a domain to another machine.

Options:

-h, --help           Print this help.
-l, --live           Use live migration.
-p=portnum, --port=portnum
                     Use specified port for migration.
-n=nodenum, --node=nodenum
                     Use specified NUMA node on target.
-s, --ssl            Use ssl connection for migration.

As xen migration use xend-relocation-server of xend-relocation-port, so this "Connection refused" issue was most likely related to this. And below is the configuration of /etc/xen/xend-config.sxp:

-bash-3.2# egrep -v '^#|^$' /etc/xen/xend-config.sxp
(xend-unix-server yes)
(xend-relocation-server yes)
(xend-relocation-ssl-server no)
(xend-unix-path /var/lib/xend/xend-socket)
(xend-relocation-port 8002)
(xend-relocation-server-ssl-key-file /etc/ovs-agent/cert/key.pem)
(xend-relocation-server-ssl-cert-file /etc/ovs-agent/cert/certificate.pem)
(xend-relocation-address '')
(xend-relocation-hosts-allow '')
(vif-script vif-bridge)
(dom0-min-mem 0)
(enable-dom0-ballooning no)
(dom0-cpus 0)
(vnc-listen '0.0.0.0')
(vncpasswd '')
(xend-domains-lock-path /opt/ovs-agent-2.3/utils/dlm.py)
(domain-shutdown-hook /opt/ovs-agent-2.3/utils/hook_vm_shutdown.py)

And to check the progresses related with these:

-bash-3.2# lsof -i :8002
COMMAND   PID     USER   FD   TYPE    DEVICE SIZE NODE NAME
xend    12095 root    5u  IPv4 146473964       TCP *:teradataordbms (LISTEN)

-bash-3.2# ps auxww|egrep '/opt/ovs-agent-2.3/utils/dlm.py|/opt/ovs-agent-2.3/utils/hook_vm_shutdown.py'
root  3501  0.0  0.0   3924   740 pts/0    S+   08:37   0:00 egrep /opt/ovs-agent-2.3/utils/dlm.py|/opt/ovs-agent-2.3/utils/hook_vm_shutdown.py
root 19007  0.0  0.0  12660  5840 ?        D    03:44   0:00 python /opt/ovs-agent-2.3/utils/dlm.py --lock --name vm1 --uuid 56f17372-0a86-4446-8603-d82423c54367
root 27446  0.0  0.0  12664  5956 ?        D    05:11   0:00 python /opt/ovs-agent-2.3/utils/dlm.py --lock --name vm2 --uuid eb1a4e84-3572-4543-8b1d-685b856d98c7

When processes went into D state(uninterruptable sleep), it'll be troublesome, as these processes can only be killed by reboot the whole system. However, on this server, we had many VMs running, and now live migration/relocation was blocked by issue caused by itself, and deadlock surfaced. And seems reboot was the only way to "resolve" the issue.

Firstly, I tried bounce xend(/etc/init.d/xend restart), but met below error indicated in /var/log/message:

[2015-11-04 04:39:43 24026] INFO (SrvDaemon:227) Xend stopped due to signal 15.
[2015-11-04 04:39:43 24115] INFO (SrvDaemon:332) Xend Daemon started
[2015-11-04 04:39:43 24115] INFO (SrvDaemon:336) Xend changeset: unavailable.
[2015-11-04 04:40:14 24115] ERROR (SrvDaemon:349) Exception starting xend ((98, 'Address already in use'))
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvDaemon.py", line 339, in run
    relocate.listenRelocation()
  File "/usr/lib/python2.4/site-packages/xen/xend/server/relocate.py", line 159, in listenRelocation
    hosts_allow = hosts_allow)
  File "/usr/lib/python2.4/site-packages/xen/web/tcp.py", line 36, in __init__
    connection.SocketListener.__init__(self, protocol_class)
  File "/usr/lib/python2.4/site-packages/xen/web/connection.py", line 89, in __init__
    self.sock = self.createSocket()
  File "/usr/lib/python2.4/site-packages/xen/web/tcp.py", line 49, in createSocket
    sock.bind((self.interface, self.port))
  File "", line 1, in bind
error: (98, 'Address already in use')

And later, I realized that we can change xend-relocation-port to have a try. So I made below changes to /etc/xen/xend-config.sxp:

(xend-relocation-port 8003)

And later, bounced xend:

/etc/init.d/xend stop; /etc/init.d/xend start

PS: xend bouncing will not affect running VMs, as I had compared qemu output(ps -ef|grep qemu). A tip here is that when xen related commands(xm list, and so on) stopped working, checking for "qemu" simulator processes will help you get the VM list.

After this, "xm migrate -l vm1 server1" still failed with the same can't connect: (111, 'Connection refused'). And I resolved this by specifying port:(you may need stop iptables too):

-bash-3.2# xm migrate -l -p 8002 vm1 server1

Now the live migration went on smoothly, and after all VMs were migrated, I changed xend-relocation-port back to 8002 and reboot the server to fix the D state(uninterruptable sleep) issue.

PS:

If you find error "Error: can't connect: (111, 'Connection refused')" even after above WA, then you can change back from 8003 to 8002, or even from 8003 to 8004, restart iptables, and try again.

Categories: Clouding, IT Architecture Tags:

resolved – mountd Cannot export /scratch, possibly unsupported filesystem or fsid= required

November 2nd, 2015 Comments off

Today when I tried to access one autofs exported filesystem on one client host, it reported error:

[root@server01 ~]# cd /net/client01/scratch/
-bash: cd: scratch/: No such file or directory

From server side, we can see it's exported and writable:

[root@client01 ~]# cat /etc/exports
/scratch *(rw,no_root_squash)

[root@client01 ~]# df -h /scratch
Filesystem            Size  Used Avail Use% Mounted on
nas01:/export/generic/share_scratch
                      200G  103G   98G  52% /scratch

So I tried mount manually on client side, but still error reported:

[root@server01 ~]# mount client01:/scratch /media
mount: client01:/scratch failed, reason given by server: Permission denied

Here's log on server side:

[root@client01 scratch]# tail -f /var/log/messages
Nov  2 03:41:58 client01 mountd[2195]: Caught signal 15, un-registering and exiting.
Nov  2 03:41:58 client01 kernel: nfsd: last server has exited, flushing export cache
Nov  2 03:41:59 client01 kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Nov  2 03:41:59 client01 kernel: NFSD: starting 90-second grace period
Nov  2 03:42:11 client01 mountd[16046]: authenticated mount request from 192.162.100.137:1002 for /scratch (/scratch)
Nov  2 03:42:11 client01 mountd[16046]: Cannot export /scratch, possibly unsupported filesystem or fsid= required
Nov  2 03:43:05 client01 mountd[16046]: authenticated mount request from 192.162.100.137:764 for /scratch (/scratch)
Nov  2 03:43:05 client01 mountd[16046]: Cannot export /scratch, possibly unsupported filesystem or fsid= required
Nov  2 03:44:11 client01 mountd[16046]: authenticated mount request from 192.165.28.40:670 for /scratch (/scratch)
Nov  2 03:44:11 client01 mountd[16046]: Cannot export /scratch, possibly unsupported filesystem or fsid= required

After some debugging, the reason was that the exported FS was already NFS filesystem on server side, and NFS FS cannot be exported again:

[root@client01 ~]# df -h /scratch
Filesystem            Size  Used Avail Use% Mounted on
nas01:/export/generic/share_scratch
                      200G  103G   98G  52% /scratch

To WA this, just do normal mount of the NFS share instead of using autofs export:

[root@server01 ~]# mount nas01:/export/generic/share_scratch /media
Categories: Hardware, IT Architecture, NAS, Storage Tags:

resolved – Yum Error: Cannot retrieve repository metadata (repomd.xml) for repository Please verify its path and try again

October 13th, 2015 Comments off

Today when I tried to install some package using yum on one linux host, below error prompted even I set up the right proxy:

[root@testvm1 yum.repos.d]# export http_proxy=http://right-proxy.example.com:80
[root@testvm1 yum.repos.d]# export ftp_proxy=http://right-proxy.example.com:80
[root@testvm1 yum.repos.d]# export https_proxy=http://right-proxy.example.com:80
[root@testvm1 yum.repos.d]# yum install xinetd
Loaded plugins: refresh-packagekit
http://public-yum.oracle.com/repo/OracleLinux/OL6/UEK/latest/x86_64/repodata/repomd.xml: [Errno 12] Timeout on http://public-yum.oracle.com/repo/OracleLinux/OL6/UEK/latest/x86_64/repodata/repomd.xml: (28, 'connect() timed out!')
Trying other mirror.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: ol6_UEK_latest. Please verify its path and try again

After some debugging, I found that there's wrong proxy setting in /etc/yum.conf(it take precede over proxy setting in shell). And by comment out the wrong proxy, yum worked.

From:

[root@testvm1 yum.repos.d]# grep proxy /etc/yum.conf
proxy=http://wrong-proxy.example.com:80

To:

[root@testvm1 yum.repos.d]# grep proxy /etc/yum.conf
#proxy=http://wrong-proxy.example.com:80
Categories: IT Architecture, Linux, Systems Tags:

resolved – sar -d failed with Requested activities not available in file

September 11th, 2015 Comments off

Today when I tried to get report for activity for each block device using "sar -d", error "Requested activities not available in file" prompted:

[root@test01 ~]# sar -f /var/log/sa/sa11 -d
Requested activities not available in file

To fix this, I did the following:

[root@test01 ~]# cat /etc/cron.d/sysstat
# run system activity accounting tool every 10 minutes
*/10 * * * * root /usr/lib64/sa/sa1 -d 1 1 #add -d. It was */10 * * * * root /usr/lib64/sa/sa1 1 1
# generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib64/sa/sa2 -A

Later, move /var/log/sa/sa11 and run sa1 with "-d" to generate a new one:

[root@test01 ~]# mv /var/log/sa/sa11{,.bak}
[root@test01 ~]# /usr/lib64/sa/sa1 -d 1 1 #this generated /var/log/sa/sa11
[root@test01 ~]# /usr/lib64/sa/sa1 -d 1 1 #this put data into /var/log/sa/sa11

After this, the disk activity data could be retrieved:

[root@test01 ~]# sar -f /var/log/sa/sa11 -d
Linux 2.6.18-238.0.0.0.1.el5xen (slc03nsv) 09/11/15

09:26:04 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
09:26:22 dev202-0 10.39 0.00 133.63 12.86 0.00 0.06 0.06 0.07
09:26:22 dev202-1 10.39 0.00 133.63 12.86 0.00 0.06 0.06 0.07
09:26:22 dev202-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: dev202-0 10.39 0.00 133.63 12.86 0.00 0.06 0.06 0.07
Average: dev202-1 10.39 0.00 133.63 12.86 0.00 0.06 0.06 0.07
Average: dev202-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

For column "DEV", you can check mapping in /dev/*(dev202-2 is /dev/xvda2):

[root@test01 ~]# ls -l /dev/xvda2
brw-r----- 1 root disk 202, 2 Jan 26 2015 /dev/xvda2

Or you can add "-p" to sar which is simper

[root@test01 ~]# sar -f /var/log/sa/sa11 -d -p
Linux 2.6.18-238.0.0.0.1.el5xen (slc03nsv) 09/11/15

09:26:04 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
09:26:22 xvda 10.39 0.00 133.63 12.86 0.00 0.06 0.06 0.07
09:26:22 root 10.39 0.00 133.63 12.86 0.00 0.06 0.06 0.07
09:26:22 xvda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: xvda 10.39 0.00 133.63 12.86 0.00 0.06 0.06 0.07
Average: root 10.39 0.00 133.63 12.86 0.00 0.06 0.06 0.07
Average: xvda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

PS:

Here is more info about sysstat in Linux.

Categories: IT Architecture, Linux, Systems, Unix Tags:

resolved – ORA-12578: TNS:wallet open failed

September 1st, 2015 Comments off

If you met error like "ORA-12578: TNS:wallet open failed", then one possibility is that the Oracle RAC Database is using a local wallet(created with parameter -auto_login_local, which is from 11.2 release, usually local wallet is used in a highly confidential system) but the wallet is migrated from another server.

The migrated local wallet can be opened and read without problems on the new host, but the information inside does not match the hostname and this leads to the error ORA-12578: TNS:wallet open failed. Be noted that even on the original host, the wallet cannot be used by another OS user.

Master encryption key is stored in wallet in TDE(transparent data encryption), it's the key that wraps(encrypts) the Oracle TDE columns and tablespace encryption keys. The wallet must be open before you can create the encrypted tablespace and before you can store or retrieve encrypted data. Also when recovering a database with encrypted tablespaces (for example after a SHUTDOWN ABORT or a catastrophic error that brings down the database instance), you must open the Oracle wallet after database mount and before database open, so the recovery process can decrypt data blocks and redo. When you open the wallet, it is available to all session, and it remains open until you explicitly close it or until the database is shut down.

Tablespace encryption encrypts at the physical block level, can perform better than encrypting many columns. When using column encryption for tables,  there is only one table key regardless of the number of encrypted columns in a table, and the table key is stored in data dictionary. And when using tablespace encryption, the tablespace key is stored in the header of each datafile of the encrypted tablespace.

Below is from here:

TDE uses a two tier key mechanism. When TDE column encryption is applied to an existing application table column, a new table key is created and stored in the Oracle data dictionary. When TDE tablespace encryption is used, the individual tablespace keys are stored in the header of the underlying OS file(s). The table and tablespace keys are encrypted using the TDE master encryption key. The master encryption key is generated when TDE is initialized and stored outside the database in the Oracle Wallet. Both the master key and table keys can be independently changed (rotated, re-keyed) based on company security policies. Tablespace keys cannot be re-keyed (rotated); work around is to move the data into a new encrypted tablespace. Oracle recommends backing up the wallet before and after each master key change.

Categories: Databases, IT Architecture, Oracle DB Tags:

resolved – nfsv4 Warning: rpc.idmapd appears not to be running. All uids will be mapped to the nobody uid

August 31st, 2015 Comments off

Today when we tried to mount a nfs share as NFSv4(mount -t nfs4 testnas:/export/testshare01 /media), the following message prompted:

Warning: rpc.idmapd appears not to be running.
All uids will be mapped to the nobody uid.

And I had a check of file permissions under the mount point, they were owned by nobody as indicated:

[root@testvm~]# ls -l /u01/local
total 8
drwxr-xr-x 2 nobody nobody 2 Dec 18 2014 SMKIT
drwxr-xr-x 4 nobody nobody 4 Dec 19 2014 ServiceManager
drwxr-xr-x 4 nobody nobody 4 Mar 31 08:47 ServiceManager.15.1.5
drwxr-xr-x 4 nobody nobody 4 May 13 06:55 ServiceManager.15.1.6

However, as I checked, rpcidmapd was running:

[root@testvm ~]# /etc/init.d/rpcidmapd status
rpc.idmapd (pid 11263) is running...

After some checking, I found it's caused by low nfs version and some missed nfs4 packages on the OEL5 boxes. You can do below to fix this:

yum -y update nfs-utils nfs-utils-lib nfs-utils-lib-devel sblim-cmpi-nfsv4 nfs4-acl-tools
/etc/init.d/nfs restart
/etc/init.d/rpcidmapd restart

If you are using Oracle SUN ZFS appliance, then please make sure to set on ZFS side anonymous user mapping to "root" and also Custom NFSv4 identity domain to the one in your env(e.g. example.com) to avoid NFS clients nobody owner issue.

resolved – yum Error performing checksum Trying other mirror and finally No more mirrors to try

August 27th, 2015 Comments off

Today when I was installing one package in Linux, below error prompted:

[root@testhost yum.repos.d]# yum list --disablerepo=* --enablerepo=yumpaas
Loaded plugins: rhnplugin, security
This system is not registered with ULN.
ULN support will be disabled.
yumpaas | 2.9 kB 00:00
yumpaas/primary_db | 30 kB 00:00
http://yumrepo.example.com/paas_oel5/repodata/b8e385ebfdd7bed69b7619e63cd82475c8bacc529db7b8c145609b64646d918a-primary.sqlite.bz2: [Errno -3] Error performing checksum
Trying other mirror.
yumpaas/primary_db | 30 kB 00:00
http://yumrepo.example.com/paas_oel5/repodata/b8e385ebfdd7bed69b7619e63cd82475c8bacc529db7b8c145609b64646d918a-primary.sqlite.bz2: [Errno -3] Error performing checksum
Trying other mirror.
Error: failure: repodata/b8e385ebfdd7bed69b7619e63cd82475c8bacc529db7b8c145609b64646d918a-primary.sqlite.bz2 from yumpaas: [Errno 256] No more mirrors to try.

The repo "yumpaas" is hosted on OEL 6 VM, which by default use sha2 for checksum. However, for OEL5 VMs(the VM running yum), yum uses sha1 by default. So I've WA this by install python-hashlib to extend yum's capability to handle sha2(python-hashlib from external repo EPEL).

[root@testhost yum.repos.d]# yum install python-hashlib

And after this, the problematic repo can be used. But to resolve this issue permanently without WA on OEL5 VMs, we should recreate the repo with sha1 for checksum algorithm(createrepo -s sha1).

resolved – Unable to locally verify the issuer’s authority

August 18th, 2015 Comments off

Today we found below error in weblogic server log:

[2015-08-16 14:38:21,866] [WARN] [N/A] [testhost.example.com] [Thrown exception when ...... javax.net.ssl.SSLHandshakeException: General SSLEngine problem
......
Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
......
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:323)

And we had a test on the linux server:

[root@testhost1 ~]# wget https://testurl.example.com/
--2015-08-17 10:15:56-- https://testurl.example.com/
Resolving testurl.example.com... 192.168.77.230
Connecting to testurl.example.com|192.168.77.230|:443... connected.
ERROR: cannot verify testurl.example.com's certificate, issued by `/C=US/O=Symantec Corporation/OU=Symantec Trust Network/CN=Symantec Class 3 Secure Server CA - G4':
Unable to locally verify the issuer's authority.
To connect to testurl.example.com insecurely, use `--no-check-certificate'.
Unable to establish SSL connection.

And tested using openssl s_client:

[root@testhost ~]# openssl s_client -connect testurl.example.com:443 -showcerts
CONNECTED(00000003)
depth=1 /C=US/O=Symantec Corporation/OU=Symantec Trust Network/CN=Symantec Class 3 Secure Server CA - G4
verify error:num=2:unable to get issuer certificate
issuer= /C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=(c) 2006 VeriSign, Inc. - For authorized use only/CN=VeriSign Class 3 Public Primary Certification Authority - G5
verify return:0
---
Certificate chain
0 s:/C=US/ST=California/L=Redwood Shores/O=Test Corporation/OU=FOR TESTING PURPOSES ONLY/CN=*.testurl.example.com
i:/C=US/O=Symantec Corporation/OU=Symantec Trust Network/CN=Symantec Class 3 Secure Server CA - G4
-----BEGIN CERTIFICATE-----
MIIFOTCCBCGgAwIBAgIQB1z+1jMRPPv/TrBA2qh5YzANBgkqhkiG9w0BAQsFADB+
MQswCQYDVQQGEwJVUzEdMBsGA
......
-----END CERTIFICATE-----
---
Server certificate
subject=/C=US/ST=California/L=Redwood Shores/O=Test Corporation/OU=FOR TESTING PURPOSES ONLY/CN=*.testurl.example.com
issuer=/C=US/O=Symantec Corporation/OU=Symantec Trust Network/CN=Symantec Class 3 Secure Server CA - G4
---
No client certificate CA names sent
---
SSL handshake has read 1510 bytes and written 447 bytes
---
New, TLSv1/SSLv3, Cipher is AES128-SHA
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
SSL-Session:
Protocol : TLSv1
Cipher : AES128-SHA
Session-ID: 2B5E228E4705C12D7EE7B5C608AE5E3781C5626C1A99A8D60D21DB2350D4F4DE
Session-ID-ctx:
Master-Key: F06632FD67E22EFB1134CDC0EEE8B8B44972747D804D5C7969095FCB2A692E90F111DC4FC6081F4D94C7561BB556DA16
Key-Arg : None
Krb5 Principal: None
Start Time: 1439793731
Timeout : 300 (sec)
Verify return code: 2 (unable to get issuer certificate)
---

From above, the issue should be caused by CA file missing, as the SSL certificate is relatively new using sha2. So I determined to download CA file from symantec, both "Symantec Class 3 Secure Server CA - G4" and "VeriSign Class 3 Public Primary Certification Authority - G5".

[root@testhost ~]# wget --no-check-certificate 'http://symantec.tbs-certificats.com/SymantecSSG4.crt'

[root@testhost ~]# openssl version -a|grep OPENSSLDIR

OPENSSLDIR: "/etc/pki/tls"

[root@testhost ~]# cp /etc/pki/tls/certs/ca-bundle.crt /etc/pki/tls/certs/ca-bundle.crt.bak

[root@testhost ~]# openssl x509 -text -in "SymantecSSG4.crt" >> /etc/pki/tls/certs/ca-bundle.crt

And test wget again, but still the same error:

[root@testhost1 ~]# wget https://testurl.example.com/
--2015-08-17 10:38:59-- https://testurl.example.com/
Resolving myproxy.example.com... 192.168.19.20
Connecting to myproxy.example.com|192.168.19.20|:80... connected.
ERROR: cannot verify testurl.example.com's certificate, issued by `/C=US/O=Symantec Corporation/OU=Symantec Trust Network/CN=Symantec Class 3 Secure Server CA - G4':
unable to get issuer certificate
To connect to testurl.example.com insecurely, use `--no-check-certificate'.
Unable to establish SSL connection.

Then I determined to install another CA indicated in the error log(G4 above may not be needed):

[root@testhost1 ~]# wget --no-check-certificate 'https://www.symantec.com/content/en/us/enterprise/verisign/roots/VeriSign-Class%203-Public-Primary-Certification-Authority-G5.pem'

[root@testhost1 ~]# openssl x509 -text -in "VeriSign-Class 3-Public-Primary-Certification-Authority-G5.pem" >> /etc/pki/tls/certs/ca-bundle.crt

This time, wget passed https:

[root@testhost1 ~]# wget https://testurl.example.com/
--2015-08-17 10:40:03-- https://testurl.example.com/
Resolving myproxy.example.com... 192.168.19.20
Connecting to myproxy.example.com|192.168.19.20|:80... connected.
Proxy request sent, awaiting response... 200 OK
Length: 11028 (11K) [text/html]
Saving to: `index.html'

100%[======================================>] 11,028 --.-K/s in 0.02s

2015-08-17 10:40:03 (526 KB/s) - `index.html' saved [11028/11028]

PS:
You can check more info here about adding trusted root certificates to the server - Mac OS X / Windows / Linux (Ubuntu, Debian) / Linux (CentOs 6) / Linux (CentOs 5)

resolved – passwd: User not known to the underlying authentication module

August 12th, 2015 Comments off

Today we met the following error when tried to change one user's password:

[root@test ~]# echo 2cool|passwd --stdin test
Changing password for user test.
passwd: User not known to the underlying authentication module

And after some searching work, we found it's caused by /etc/shadow file missing:

[root@test ~]# ls -l /etc/shadow
ls: /etc/shadow: No such file or directory

To generate the /etc/shadow file, use pwconv command:

[root@test ~]# pwconv

[root@test ~]# ls -l /etc/shadow
-r-------- 1 root root 1254 Aug 11 12:13 /etc/shadow

After this, we can reset password without issue:

[root@test ~]# echo mypass|passwd --stdin test
Changing password for user test.
passwd: all authentication tokens updated successfully.

Categories: IT Architecture, Linux, Systems, Unix Tags:

resolved – ORA-27303: additional information: Invalid protocol requested (2) or protocol not loaded

August 6th, 2015 Comments off

Today when I tried to start up crs after patching(crsctl start crs), the following error occurred in /u01/app/11.2.0.4/grid/log/test/alerttest.log:

2015-07-31 11:57:54.702:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(18654)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/scratch/app/11.2.0.4/grid/log/slcai081/agent/ohasd/oraagent_oracle/oraagent_oracle.log"
2015-07-31 11:57:59.894:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(18654)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/scratch/app/11.2.0.4/grid/log/slcai081/agent/ohasd/oraagent_oracle/oraagent_oracle.log"
2015-07-31 11:58:05.121:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(18654)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/scratch/app/11.2.0.4/grid/log/slcai081/agent/ohasd/oraagent_oracle/oraagent_oracle.log"
2015-07-31 11:58:10.322:
[ohasd(17944)]CRS-2807:Resource 'ora.asm' failed to start automatically.
2015-07-31 11:58:10.326:
[ohasd(17944)]CRS-2807:Resource 'ora.crsd' failed to start automatically.

And running crsctl check crs, I saw that CRS/EM were not up:

[root@test ~]# /u01/app/11.2.0.4/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager

Later I tried manually start up ora.crsd, but still failed:

[root@test ~]# /u01/app/11.2.0.4/grid/bin/crsctl start res ora.crsd -init
CRS-2672: Attempting to start 'ora.asm' on 'test'
CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-27504: IPC error creating OSD context
ORA-00600: internal error code, arguments: [OSDEP_INTERNAL], [], [], [], [], [], [], [], [], [], [], []
ORA-27302: failure occurred at: sskgxplp
ORA-27303: additional information: Invalid protocol requested (2) or protocol not loaded.
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0.4/grid/log/test/agent/ohasd/oraagent_oracle//oraagent_oracle.log".
CRS-2674: Start of 'ora.asm' on 'test' failed
CRS-2679: Attempting to clean 'ora.asm' on 'test'
CRS-2681: Clean of 'ora.asm' on 'test' succeeded
CRS-4000: Command Start failed, or completed with errors.

From the output, it's complaining about "Invalid protocol requested". As this RAC is non-exadata, so we should use ipc_g protocol rather than ipc_rds which is for Exadata when relinking oracle binaries. So I made the following change in the script:

#su $OWNER -c "cd $1/rdbms/lib && export ORACLE_HOME=$1 && /usr/bin/make -f ins_rdbms.mk ipc_rds lbac_off dv_off ioracle > /dev/null
su $OWNER -c "cd $1/rdbms/lib && export ORACLE_HOME=$1 && /usr/bin/make -f ins_rdbms.mk ipc_g ioracle > /dev/null"

After that, I rollbacked all patches, and re-run patching, and later all were ok.

PS:

You can run $GRID_HOME/bin/skgxpinfo & $ORACLE_HOME/bin/skgxpinfo to check whether RAC Interconnect is uing UDP or RDS.

Categories: Databases, IT Architecture, Oracle DB Tags:

remove usb disk from LVM

July 29th, 2015 Comments off

On some servers, USB stick may become part of the LVM volume group. The usb is more prone to fail, which will cause big issues when they start. Also they work at a different speed than the drives, and this also causes performance issues.

For example, on one server, you can see below:

[root@test ~]# vgs
  VG      #PV #LV #SN Attr   VSize VFree
  DomUVol   2   1   0 wz--n- 3.77T    0

[root@test~]# pvs
  PV         VG      Fmt  Attr PSize PFree
  /dev/sdb1  DomUVol lvm2 a--  3.76T    0 #all PE allocated
  /dev/sdc1  DomUVol lvm2 a--  3.59G    0 #this is usb device

[root@test~]# lvs
  LV      VG      Attr   LSize Origin Snap%  Move Log Copy%  Convert
  scratch DomUVol -wi-ao 3.77T

[root@test ~]# df -h /scratch
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/DomUVol-scratch
                      3.7T  257G  3.3T   8% /scratch

[root@test~]# pvdisplay /dev/sdc1
  --- Physical volume ---
  PV Name               /dev/sdc1
  VG Name               DomUVol
  PV Size               3.61 GB / not usable 14.61 MB
  Allocatable           yes (but full)
  PE Size (KByte)       32768
  Total PE              115
  Free PE               0
  Allocated PE          115 #so physical extents are allocated on this usb device
  PV UUID               a8a0P5-AlCz-Cu5e-acC2-ldEQ-NPCn-kc4Du0

As you can see from above, as the USB device has PE allocated, so to remove that(pvreduce), we need first move PE from it to other PV. We can see the other PV has all space allocated(can also confirm from vgs above):

[root@test~]# pvdisplay /dev/sdb1
  --- Physical volume ---
  PV Name               /dev/sdb1
  VG Name               DomUVol
  PV Size               3.76 TB / not usable 30.22 MB
  Allocatable           yes (but full)
  PE Size (KByte)       32768
  Total PE              123360
  Free PE               0
  Allocated PE          123360
  PV UUID               5IyCgh-JsiV-EnpO-XKj4-yxNq-pRjI-d7LKGy

Here are the steps for taking usb device out of VG:

umount /scratch
#fsck -y /dev/mapper/DomUVol-scratch
lvreduce --size -5G /dev/mapper/DomUVol-scratch
resize2fs /dev/mapper/DomUVol-scratch

[root@test ~]# vgs
  VG      #PV #LV #SN Attr   VSize VFree
  DomUVol   2   1   0 wz--n- 3.77T 5.00G

[root@test ~]# pvs
  PV         VG      Fmt  Attr PSize PFree
  /dev/sdb1  DomUVol lvm2 a--  3.76T 1.41G
  /dev/sdc1  DomUVol lvm2 a--  3.59G 3.59G #PEs on the usb device are all freed, if not, use pvmove /dev/sdc1. More info is here about pvmove

[root@test ~]# pvdisplay /dev/sdc1
  --- Physical volume ---
  PV Name               /dev/sdc1
  VG Name               DomUVol
  PV Size               3.61 GB / not usable 14.61 MB
  Allocatable           yes
  PE Size (KByte)       32768
  Total PE              115
  Free PE               115
  Allocated PE          0
  PV UUID               a8a0P5-AlCz-Cu5e-acC2-ldEQ-NPCn-kc4Du0

[root@test ~]# vgreduce DomUVol /dev/sdc1
  Removed "/dev/sdc1" from volume group "DomUVol"

[root@test ~]# pvs
  PV         VG      Fmt  Attr PSize PFree
  /dev/sdb1  DomUVol lvm2 a--  3.76T 1.41G
  /dev/sdc1          lvm2 a--  3.61G 3.61G #VG column is empty for the usb device, this confirms the usb device is taken out of VG. You can run pvremove /dev/sdc1 to remove the pv.

PS:

  1. If you want to shrink lvm volume(lvreduce) on /, then you'll need go to linux rescue mode. Select "Skip" when the system prompts for the option of mounting / to /mnt/sysimage. Run "lvm vgchange -a y" first and then the other steps are more or less the same as above, but you'll need type "lvm" before any lvm command, such as "lvm lvs", "lvm pvs", "lvm lvreduce" etc.
  2. You can refer to this article about using vgcfgrestore to restore vg config from /etc/lvm/archive/.

resolved – yum install Error: Protected multilib versions

June 30th, 2015 Comments off

Today when I tried to install firefox.i686 on Linux using yum, the following error occurred:

Protected multilib versions: librsvg2-2.26.0-14.el6.i686 != librsvg2-2.26.0-5.el6_1.1.x86_64
Error: Protected multilib versions: devhelp-2.28.1-6.el6.i686 != devhelp-2.28.1-3.el6.x86_64
Error: Protected multilib versions: ImageMagick-6.5.4.7-7.el6_5.i686 != ImageMagick-6.5.4.7-6.el6_2.x86_64
Error: Protected multilib versions: vte-0.25.1-9.el6.i686 != vte-0.25.1-8.el6_4.x86_64
Error: Protected multilib versions: polkit-gnome-0.96-4.el6.i686 != polkit-gnome-0.96-3.el6.x86_64
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest

To resolve this, just run yum update <package names>, then the problem will go away.

Categories: IT Architecture, Linux, Systems, Unix Tags:

resolved – ORA-01102: cannot mount database in EXCLUSIVE mode

June 16th, 2015 1 comment

Today when I tried to startup one RAC DB, it failed with ORA-01102:

[oracle@testvm ~]$ srvctl start database -d testdb -o "open"
PRCR-1079 : Failed to start resource ora.testdb.db
CRS-5017: The resource action "ora.testdb.db start" encountered the following error:
ORA-01102: cannot mount database in EXCLUSIVE mode
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0.4/grid/log/testvm/agent/crsd/oraagent_oracle//oraagent_oracle.log".

CRS-2674: Start of 'ora.testdb.db' on 'testvm' failed
CRS-2632: There are no more servers to try to place resource 'ora.testdb.db' on that would satisfy its placement policy

SQL> startup
ORA-01102: cannot mount database in EXCLUSIVE mode

Later, I realized that the DB was still out of cluster mode:

SQL> show parameter cluster;

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cluster_database boolean FALSE
cluster_database_instances integer 1
cluster_interconnects string

So I toke the following steps to take it into cluster mode:

SQL> alter system set cluster_database=true scope=spfile;

System altered.

SQL> alter system set cluster_database_instances=2 scope=spfile;

System altered.

After this, the DB started up normally.

Categories: Databases, IT Architecture, Oracle DB Tags:

generate a load on oracle database

June 5th, 2015 Comments off

Sometimes you're doing a test of Oracle database, but the DB load is not high but you want the load go high to facilitate your testing. Here's the way:

DECLARE
 N NUMBER;
BEGIN
FOR I IN 1..100000 LOOP
 SELECT /*+ ORDERED USE_NL(C) FULL(C) FULL(S)*/ COUNT(*) INTO N
 FROM SH.SALES S, SH.CUSTOMERS C
 WHERE C.CUST_ID = S.CUST_ID AND CUST_FIRST_NAME='Sarah'
 ORDER BY TIME_ID;
 DBMS_LOCK.SLEEP(1);
END LOOP;
END;
/

You can press Ctrl+C to cancel it.

Categories: Databases, IT Architecture, Oracle DB Tags:

create a big table with one million lines on oracle database for testing

May 10th, 2015 Comments off

#First, create tablespace along with datafile

SQL> create tablespace test datafile '/u01/app/oracle/product/11.2.0.4/dbhome_1/test/datafile1.dbf' size 512m;

#Then create table. We disable logging to avoid unnecessary redo data

SQL> create table bigtab tablespace test as select rownum id, a.* from all_objects a where 1=0;
SQL> alter table bigtab nologging;

#now populate the table named bigtab

DECLARE
  L_CNT NUMBER;
  L_ROWS NUMBER := 1000000;
BEGIN
  INSERT /*+ APPEND */ INTO BIGTAB SELECT ROWNUM, A.* FROM ALL_OBJECTS A;
  L_CNT := SQL%ROWCOUNT;
  COMMIT;
  WHILE (L_CNT < L_ROWS)
  LOOP
    INSERT /*+ APPEND */ INTO BIGTAB
    SELECT ROWNUM+L_CNT,
      OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID, DATA_OBJECT_ID, OBJECT_TYPE, CREATED,
      LAST_DDL_TIME, TIMESTAMP, STATUS, TEMPORARY, GENERATED, SECONDARY, NAMESPACE, EDITION_NAME
    FROM BIGTAB
      WHERE ROWNUM <= L_ROWS-L_CNT;
    L_CNT := L_CNT + SQL%ROWCOUNT;
    COMMIT;
   END LOOP;
END;
/

#check result

SQL> select count(*) from bigtab;

COUNT(*)
----------
1000000

#get the tracefile of the session

SQL> SELECT TRACEFILE FROM V$SESSION S, V$PROCESS P WHERE S.PADDR=P.ADDR AND S.SID=SYS_CONTEXT('USERENV','SID');

SQL> alter session set events '10046 trace name context forever, level 12';

#to get the maximum number of blocks that can be read

[oracle@testvm ~]$ grep scattered <trace file from above>

WAIT #139828744903832: nam='db file scattered read' ela= 1715 file#=10 block#=14192 blocks=8 obj#=56403 tim=1431233617613776
WAIT #139828744903832: nam='db file scattered read' ela= 6836 file#=10 block#=14268 blocks=8 obj#=56403 tim=1431233617620994

PS: 

More info is here.

Categories: Databases, Oracle DB Tags:

resolved – Checking for glibc-devel-2.12-1.7-i686; Not found. Failed

May 5th, 2015 Comments off

Today when I tried to install Oracle EM Cloud Control 12c, below error prompted when pre-checking:

pre-check failed

So from above, we can see that it's complaining about missing package "glibc-devel-2.12-1.7-i686"(Checking for glibc-devel-2.12-1.7-i686; Not found. Failed). And I found there were glibc related packages on the system:

[root@testvm ~]# rpm -qa|grep glibc
glibc-common-2.12-1.149.el6_6.7.x86_64
glibc-devel-2.12-1.149.el6_6.7.x86_64
glibc-headers-2.12-1.149.el6_6.7.x86_64
glibc-2.12-1.149.el6_6.7.x86_64

But they were all x86_64 version, not the missing i686 one. So I determined to install i686 ones:

[root@testvm]# yum install -y glibc.i686 glibc-devel.i686 glibc-static.i686

After this, and press "Rerun", the check succeeded.

Categories: IT Architecture, Linux, Systems, Unix Tags:

install oracle instant client to use sqlplus on linux

April 30th, 2015 Comments off

To use sqlplus to connect remotely to oracle database, you should have oracle database client installed on your box. To do this, you can follow below steps:

1. Download oracle-instantclient11.2-basic-11.2.0.3.0-1.x86_64.rpm and oracle-instantclient11.2-sqlplus-11.2.0.3.0-1.x86_64.rpm from here.

2. Install oracle-instantclient11.2-basic-11.2.0.3.0-1.x86_64.rpm and oracle-instantclient11.2-sqlplus-11.2.0.3.0-1.x86_64.rpm using rpm -i <package name>.

3. Set linux environment variables as below in ~/.bashrc:

export LANG=C
export HISTSIZE=100000
export HISTTIMEFORMAT="%h/%d - %H:%M:%S "
export ORACLE_HOME=/usr/lib/oracle/11.2/client64
export PATH=$PATH:/doxer/tools/bin:$ORACLE_HOME/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/oracle/11.2/client64/lib
export PS1='[\u@\h-doxer \W]\$ '

4. Connect using sqlplus, e.g. sqlplus sys/pass@scan-example.test.com:1521/service1 as sysdba

Categories: Databases, IT Architecture, Oracle DB Tags:

raid10 and raid01

April 21st, 2015 Comments off

RAID 0 over RAID 1(raid 0+1, raid 10, stripe of mirrors, better)

(RAID 1) A = Drive A1 + Drive A2 (Mirrored)
(RAID 1) B = Drive B1 + Drive B2 (Mirrored)
RAID 0 = (RAID 1) A + (RAID 1) B (Striped)

stripe-of-mirrors-raid10


RAID 1 over RAID 0(raid 1+0, raid01, mirror of stripes)

(RAID 0) A = Drive A1 + Drive A2 (Striped)
(RAID 0) B = Drive B1 + Drive B2 (Striped)
RAID 1 = (RAID 1) A + (RAID 1) B (Mirrored)
mirror-of-stripes

PS:

For write performance: raid0 > raid10 > raid5

For read performance: raid1

For data protection: raid1

Raid2 - put parity data into multiple disks. stipe using bit/byte

Raid3 - put parity data into single disk. Good for sequential data, but for random data, parity disk will become bottleneck. stipe using bit/byte

Raid4 - put parity data into multiple disks. stipe using block/record.

Raid5 - put parity data into all disks. Good for small/random access. write punishment(one write will generate two reads for old parity/data, two writes for new parity/data)

Raid6 - added another parity data. Can tolerate two disk failing. Worse write punishment.

Categories: Hardware, IT Architecture, Storage, Systems Tags:

printtbl8.sql – oracle sqlplus print output vertically

April 17th, 2015 Comments off

Put the following as printtbl8.sql in $ORACLE_HOME/rdbms/admin/printtbl8.sql:

set serveroutput on
set linesize 200
declare
    l_theCursor    integer default dbms_sql.open_cursor;
    l_columnValue    varchar2(4000);
    l_status        integer;
    l_descTbl    dbms_sql.desc_tab;
    l_colCnt        number;
    procedure execute_immediate( p_sql in varchar2 )
    is
    BEGIN
        dbms_sql.parse(l_theCursor,p_sql,dbms_sql.native);
        l_status := dbms_sql.execute(l_theCursor);
    END;
begin
    execute_immediate( 'alter session set nls_date_format=
                        ''dd-mon-yyyy hh24:mi:ss'' ');
    dbms_sql.parse(    l_theCursor,
                    replace( '&1', '"', ''''),
                    dbms_sql.native );
    dbms_sql.describe_columns( l_theCursor,
                            l_colCnt, l_descTbl );
    for i in 1 .. l_colCnt loop
        dbms_sql.define_column( l_theCursor, i,
                                l_columnValue, 4000 );
    end loop;
    l_status := dbms_sql.execute(l_theCursor);
    while ( dbms_sql.fetch_rows(l_theCursor) > 0 ) loop
        for i in 1 .. l_colCnt loop
            dbms_sql.column_value( l_theCursor, i,
                                l_columnValue );
            dbms_output.put_line
                ( rpad( l_descTbl(i).col_name,
         35 ) || ': ' || l_columnValue );
        end loop;
        dbms_output.put_line( '-----------------' );
    end loop;
    execute_immediate( 'alter session set nls_date_format=
                        ''dd-MON-yy'' ');
exception
    when others then
        execute_immediate( 'alter session set
                        nls_date_format=''dd-MON-yy'' ');
        raise;
end;
/

Now you can have a test in oracle sqlplus:

SQL> @?/rdbms/admin/printtbl8.sql 'select name,LOG_MODE,OPEN_MODE from v$database'
old 17: replace( '&1', '"', ''''),
new 17: replace( 'select name,LOG_MODE,OPEN_MODE from v$database', '"', ''''),

NAME : TEST
LOG_MODE : ARCHIVELOG
OPEN_MODE : READ WRITE
-----------------

PL/SQL procedure successfully completed.

Cool, right?

Categories: Databases, IT Architecture, Oracle DB Tags:

resolved – file filelists.xml.gz [Errno 5] OSError: [Errno 2] No such file or directory [Errno 256] No more mirrors to try

April 8th, 2015 Comments off

Today below error prompted when running yum install some packages in linux:

file://localhost/tmp/common1/x86_64/redhat/50/base/ga/Server/repodata/filelists.xml.gz: [Errno 5] OSError: [Errno 2] No such file or directory: '/tmp/common1/x86_64/redhat/50/base/ga/Server/repodata/filelists.xml.gz'
Trying other mirror.
Error: failure: repodata/filelists.xml.gz from base: [Errno 256] No more mirrors to try.
You could try running: package-cleanup --problems
package-cleanup --dupes
rpm -Va --nofiles --nodigest

After some checking(yum clean all, download repo to /etc/yum.repos.d, etc), I finally found it's caused by the following entries in /etc/yum.conf:

[base]
name=Red Hat Linux - Base
baseurl=file://localhost/tmp/common1/x86_64/redhat/50/base/ga/Server

After I commented them, yum install can work now.

 

Categories: IT Architecture, Linux, Systems, Unix Tags:

resolved – Starting MySQL.The server quit without updating PID file (/var/lib/mysql/testvm.pid).

April 3rd, 2015 Comments off

Today when I tried to start mysql it failed with below error:

[root@testvm ~]# /etc/init.d/mysql start
Starting MySQL.The server quit without updating PID file (/var/lib/mysql/testvm.pid).

First I had a check of /var/lib/mysql/testvm.err, and it had below entries:

2015-04-03 00:11:39 2925 [Note] InnoDB: Using CPU crc32 instructions
/usr/sbin/mysqld: Can't create/write to file '/tmp/ibDvk6bb' (Errcode: 13 - Permission denied)
2015-04-03 00:11:39 7f28af6c6720 InnoDB: Error: unable to create temporary file; errno: 13
2015-04-03 00:11:39 2925 [ERROR] Plugin 'InnoDB' init function returned error.
2015-04-03 00:11:39 2925 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2015-04-03 00:11:39 2925 [ERROR] Unknown/unsupported storage engine: InnoDB
2015-04-03 00:11:39 2925 [ERROR] Aborting

I had a check of /tmp permission, and it's not correct:

[root@testvm ~]# ls -ld /tmp
drwx------ 19 root root 4096 Apr 3 07:15 /tmp

So I changed permission for /tmp to 777 with sticky bit:

[root@testvm ~]# chmod 1777 /tmp

[root@testvm ~]# ls -ld /tmp
drwxrwxrwt 19 root root 4096 Apr 3 07:15 /tmp

However, when I tried start mysql, it failed again with below errors in /var/lib/mysql/testvm.err:

2015-04-03 00:20:42 18724 [ERROR] InnoDB: auto-extending data file ./ibdata1 is of a different size 640 pages (rounded down to MB) than specified in the .cnf file: initial 768 pages, max 0 (relevant if non-zero) pages!
2015-04-03 00:20:42 18724 [ERROR] InnoDB: Could not open or create the system tablespace. If you tried to add new data files to the system tablespace, and it failed here, you should now edit innodb_data_file_path in my.cnf back to what it was, and remove the new ibdata files InnoDB created in this failed attempt. InnoDB only wrote those files full of zeros, but did not yet use them in any way. But be careful: do not remove old data files which contain your precious data!
2015-04-03 00:20:42 18724 [ERROR] Plugin 'InnoDB' init function returned error.
2015-04-03 00:20:42 18724 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2015-04-03 00:20:42 18724 [ERROR] Unknown/unsupported storage engine: InnoDB
2015-04-03 00:20:42 18724 [ERROR] Aborting

So it's all about InnoDB engine. As InnoDB was not required in our env, so I determined to disable InnoDB:

[root@testvm ~]# vi /etc/my.cnf
[mysqld]
innodb=OFF
ignore-builtin-innodb
skip-innodb
default-storage-engine=myisam
default-tmp-storage-engine=myisam

Later, the start of mysql succeeded.

Categories: Databases, IT Architecture, MySQL DB Tags: