Archive

Archive for the ‘Clouding’ Category

lvm volume resize by extending virtual disk image file

June 27th, 2016 Comments off

Below should be ran from Dom0 hosting DomU:

[root@Dom0 ~]# virt-filesystems -a System.img
/dev/sda1
/dev/vg01/lv_root

[root@Dom0 ~]# virt-filesystems --long --parts --blkdevs -h -a System.img
Name Type MBR Size Parent
/dev/sda1 partition 83 500M /dev/sda
/dev/sda2 partition 8e 12G /dev/sda
/dev/sda device - 12G -

[root@Dom0 ~]# truncate -s 20G System_new.img

[root@Dom0 ~]# virt-resize --expand /dev/sda2 System.img System_new.img

[root@Dom0 ~]# mv System.img System.img.bak;mv System_new.img System.img

[root@Dom0 ~]# xm create vm.cfg -c #the first run may get issue "device cannot be connected", you can just run it again, the issue should be gone

Below should be ran from DomU:

[root@DomU ~]# vgs
VG #PV #LV #SN Attr VSize VFree
vg01 1 2 0 wz--n- 20.51g 8.00g

[root@DomU ~]# lvextend -L +8g /dev/mapper/vg01-lv_root
[root@DomU ~]# resize2fs /dev/mapper/vg01-lv_root

[root@DomU ~]# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg01-lv_root
20G 10G 10G 50% /

Categories: Clouding, IT Architecture, Oracle Cloud Tags:

resolved – xend error: (98, ‘Address already in use’)

November 4th, 2015 Comments off

Today one OVS server met issue with ovs-agent and need reboot. As there were VMs running on it, so I tried live migrating xen based VMs using "xm migrate -l", but below error occurred:

-bash-3.2# xm migrate -l vm1 server1
Error: can't connect: (111, 'Connection refused')
Usage: xm migrate  

Migrate a domain to another machine.

Options:

-h, --help           Print this help.
-l, --live           Use live migration.
-p=portnum, --port=portnum
                     Use specified port for migration.
-n=nodenum, --node=nodenum
                     Use specified NUMA node on target.
-s, --ssl            Use ssl connection for migration.

As xen migration use xend-relocation-server of xend-relocation-port, so this "Connection refused" issue was most likely related to this. And below is the configuration of /etc/xen/xend-config.sxp:

-bash-3.2# egrep -v '^#|^$' /etc/xen/xend-config.sxp
(xend-unix-server yes)
(xend-relocation-server yes)
(xend-relocation-ssl-server no)
(xend-unix-path /var/lib/xend/xend-socket)
(xend-relocation-port 8002)
(xend-relocation-server-ssl-key-file /etc/ovs-agent/cert/key.pem)
(xend-relocation-server-ssl-cert-file /etc/ovs-agent/cert/certificate.pem)
(xend-relocation-address '')
(xend-relocation-hosts-allow '')
(vif-script vif-bridge)
(dom0-min-mem 0)
(enable-dom0-ballooning no)
(dom0-cpus 0)
(vnc-listen '0.0.0.0')
(vncpasswd '')
(xend-domains-lock-path /opt/ovs-agent-2.3/utils/dlm.py)
(domain-shutdown-hook /opt/ovs-agent-2.3/utils/hook_vm_shutdown.py)

And to check the progresses related with these:

-bash-3.2# lsof -i :8002
COMMAND   PID     USER   FD   TYPE    DEVICE SIZE NODE NAME
xend    12095 root    5u  IPv4 146473964       TCP *:teradataordbms (LISTEN)

-bash-3.2# ps auxww|egrep '/opt/ovs-agent-2.3/utils/dlm.py|/opt/ovs-agent-2.3/utils/hook_vm_shutdown.py'
root  3501  0.0  0.0   3924   740 pts/0    S+   08:37   0:00 egrep /opt/ovs-agent-2.3/utils/dlm.py|/opt/ovs-agent-2.3/utils/hook_vm_shutdown.py
root 19007  0.0  0.0  12660  5840 ?        D    03:44   0:00 python /opt/ovs-agent-2.3/utils/dlm.py --lock --name vm1 --uuid 56f17372-0a86-4446-8603-d82423c54367
root 27446  0.0  0.0  12664  5956 ?        D    05:11   0:00 python /opt/ovs-agent-2.3/utils/dlm.py --lock --name vm2 --uuid eb1a4e84-3572-4543-8b1d-685b856d98c7

When processes went into D state(uninterruptable sleep), it'll be troublesome, as these processes can only be killed by reboot the whole system. However, on this server, we had many VMs running, and now live migration/relocation was blocked by issue caused by itself, and deadlock surfaced. And seems reboot was the only way to "resolve" the issue.

Firstly, I tried bounce xend(/etc/init.d/xend restart), but met below error indicated in /var/log/message:

[2015-11-04 04:39:43 24026] INFO (SrvDaemon:227) Xend stopped due to signal 15.
[2015-11-04 04:39:43 24115] INFO (SrvDaemon:332) Xend Daemon started
[2015-11-04 04:39:43 24115] INFO (SrvDaemon:336) Xend changeset: unavailable.
[2015-11-04 04:40:14 24115] ERROR (SrvDaemon:349) Exception starting xend ((98, 'Address already in use'))
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvDaemon.py", line 339, in run
    relocate.listenRelocation()
  File "/usr/lib/python2.4/site-packages/xen/xend/server/relocate.py", line 159, in listenRelocation
    hosts_allow = hosts_allow)
  File "/usr/lib/python2.4/site-packages/xen/web/tcp.py", line 36, in __init__
    connection.SocketListener.__init__(self, protocol_class)
  File "/usr/lib/python2.4/site-packages/xen/web/connection.py", line 89, in __init__
    self.sock = self.createSocket()
  File "/usr/lib/python2.4/site-packages/xen/web/tcp.py", line 49, in createSocket
    sock.bind((self.interface, self.port))
  File "", line 1, in bind
error: (98, 'Address already in use')

And later, I realized that we can change xend-relocation-port to have a try. So I made below changes to /etc/xen/xend-config.sxp:

(xend-relocation-port 8003)

And later, bounced xend:

/etc/init.d/xend stop; /etc/init.d/xend start

PS: xend bouncing will not affect running VMs, as I had compared qemu output(ps -ef|grep qemu). A tip here is that when xen related commands(xm list, and so on) stopped working, checking for "qemu" simulator processes will help you get the VM list.

After this, "xm migrate -l vm1 server1" still failed with the same can't connect: (111, 'Connection refused'). And I resolved this by specifying port:(you may need stop iptables too):

-bash-3.2# xm migrate -l -p 8002 vm1 server1

Now the live migration went on smoothly, and after all VMs were migrated, I changed xend-relocation-port back to 8002 and reboot the server to fix the D state(uninterruptable sleep) issue.

PS:

If you find error "Error: can't connect: (111, 'Connection refused')" even after above WA, then you can change back from 8003 to 8002, or even from 8003 to 8004, restart iptables, and try again.

Categories: Clouding, IT Architecture Tags:

resolved – Error: Unable to connect to xend: Connection reset by peer. Is xend running?

January 7th, 2015 Comments off

Today I met some issue when trying to run xm commands on a XEN server:

[root@xenhost1 ~]# xm list
Error: Unable to connect to xend: Connection reset by peer. Is xend running?

I had a check, and found xend was actually running:

[root@xenhost1 ~]# /etc/init.d/xend status
xend daemon running (pid 8329)

After some debugging, I found it's caused by libvirtd & xend corrupted. And then I did a bounce of them:

[root@xenhost1 ~]# /etc/init.d/libvirtd restart
Stopping libvirtd daemon: [ OK ]
Starting libvirtd daemon: [ OK ]

[root@xenhost1 ~]# /etc/init.d/xend restart #this may not be needed 
restarting xend...
xend daemon running (pid 19684)

Later, the xm commands went good.

PS:

  • If you met issue like "resolved - xend error: (98, 'Address already in use')" when restart xend or "can't connect: (111, 'Connection refused')" when doing xm live migrate, then you can refer this article.
  • For more information about libvirt, you can check here.

 

Categories: Clouding, IT Architecture, Oracle Cloud Tags:

resolved – cssh installation on linux server

December 29th, 2014 Comments off

ClusterSSH can be used if you need controls a number of xterm windows via a single graphical console window, and you want to run commands interactively on multiple servers over an ssh connection. This guide will show the process to install clusterssh on a linux box from tarball.

At the very first, you should download cssh tarball App-ClusterSSH-4.03_04.tar.gz from sourceforge. You may need export proxy settings if it's needed in your env:

export https_proxy=http://my-proxy.example.com:80/
export http_proxy=http://my-proxy.example.com:80/
export ftp_proxy=http://my-proxy.example.com:80/

After the proxy setting, you can now get the package:

wget 'http://sourceforge.net/projects/clusterssh/files/latest/download'
tar zxvf App-ClusterSSH-4.03_04.tar.gz
cd App-ClusterSSH-4.03_04
cat README

Before installing, let's install some prerequisites packages:

yum install gcc libX11-devel gnome* -y
yum groupinstall "X Window System" -y
yum groupinstall "GNOME Desktop Environment" -y
yum groupinstall "Graphical Internet" -y
yum groupinstall "Graphics" -y

Now run "perl Build.PL" as indicated by README:

[root@centos-32bits App-ClusterSSH-4.03_04]# perl Build.PL
Can't locate Module/Build.pm in @INC (@INC contains: /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.8/i386-linux-thread-multi /usr/lib/perl5/5.8.8 .) at Build.PL line 5.
BEGIN failed--compilation aborted at Build.PL line 5.

As it challenged, you need install Module::Build.pm first. Let's use cpan to install that module.

Run "cpan" and enter "follow" when below info occurred:

Policy on building prerequisites (follow, ask or ignore)? [ask] follow

If you had already ran cpan before, then you can configure the policy as below:

cpan> o conf prerequisites_policy follow
cpan> o conf commit

Now Let's install Module::Build:

cpan> install Module::Build

After the installation, let's run "perl Build.PL" again:

[root@centos-32bits App-ClusterSSH-4.03_04]# perl Build.PL
Checking prerequisites...
  requires:
    !  Exception::Class is not installed
    !  Tk is not installed
    !  Try::Tiny is not installed
    !  X11::Protocol is not installed
  build_requires:
    !  CPAN::Changes is not installed
    !  File::Slurp is not installed
    !  File::Which is not installed
    !  Readonly is not installed
    !  Test::Differences is not installed
    !  Test::DistManifest is not installed
    !  Test::PerlTidy is not installed
    !  Test::Pod is not installed
    !  Test::Pod::Coverage is not installed
    !  Test::Trap is not installed

ERRORS/WARNINGS FOUND IN PREREQUISITES.  You may wish to install the versions
of the modules indicated above before proceeding with this installation

Run 'Build installdeps' to install missing prerequisites.

Created MYMETA.yml and MYMETA.json
Creating new 'Build' script for 'App-ClusterSSH' version '4.03_04'

As the output says, run "./Build installdeps" to install the missing packages. Make sure you're in GUI env(through vncserver maybe), as "perl Build.PL" has a step to test GUI.

[root@centos-32bits App-ClusterSSH-4.03_04]# ./Build installdeps

......

Running Mkbootstrap for Tk::Xlib ()
chmod 644 "Xlib.bs"
"/usr/bin/perl" "/usr/lib/perl5/5.8.8/ExtUtils/xsubpp" -typemap "/usr/lib/perl5/5.8.8/ExtUtils/typemap" -typemap "/root/.cpan/build/Tk-804.032/Tk/typemap" Xlib.xs > Xlib.xsc && mv Xlib.xsc Xlib.c
make[1]: *** No rule to make target `pTk/tkInt.h', needed by `Xlib.o'. Stop.
make[1]: Leaving directory `/root/.cpan/build/Tk-804.032/Xlib'
make: *** [subdirs] Error 2
/usr/bin/make -- NOT OK
Running make test
Can't test without successful make
Running make install
make had returned bad status, install seems impossible

Errors again, we can see it's complaining something about TK related thing. To resolve this, I manully installed the latest perl-tk module as below:

wget --no-check-certificate 'https://github.com/eserte/perl-tk/archive/master.zip'
unzip master
cd perl-tk-master
perl Makefile.PL
make
make install

After this, let's run "./Build installdeps" and "perl Build.PL" again which all went through good:

[root@centos-32bits App-ClusterSSH-4.03_04]# ./Build installdeps

[root@centos-32bits App-ClusterSSH-4.03_04]# perl Build.PL

And let's run ./Build now:

[root@centos-32bits App-ClusterSSH-4.03_04]# ./Build
Building App-ClusterSSH
Generating: ccon
Generating: crsh
Generating: cssh
Generating: ctel

And now "./Build install" which is the last step:

[root@centos-32bits App-ClusterSSH-4.03_04]# ./Build install

After installation, let's have a test:

[root@centos-32bits App-ClusterSSH-4.03_04]# echo 'svr testserver1 testserver2' > /etc/clusters

Now run 'cssh svr', and you'll get the charm!

clusterssh

PS: 

If you met error like below:

Can't connect to display `unix:1': No such file or directory at /usr/local/share/perl5/X11/Protocol.pm line 2264.

And you are connecting to vnc session like below:

root 3291 1 0 07:36 ? 00:00:02 /usr/bin/Xvnc :1 -desktop Yue-test:1 (root) -auth /root/.Xauthority -geometry 1600x900 -rfbwait 30000 -rfbauth /root/.vnc/passwd -rfbport 5901 -fp catalogue:/etc/X11/fontpath.d -pn

Then make sure to do below:

export DISPLAY=localhost:1.0

Categories: Clouding, IT Architecture, Linux, Systems, Unix Tags:

resolved – switching from Unbreakable Enterprise Kernel Release 2(UEKR2) to UEKR3 on Oracle Linux 6

November 24th, 2014 Comments off

As we can see from here, the available kernels include the following 3 for Oracle Linux 6:

3.8.13 Unbreakable Enterprise Kernel Release 3 (x86_64 only)
2.6.39 Unbreakable Enterprise Kernel Release 2**
2.6.32 (Red Hat compatible kernel)

On one of our OEL6 VM, we found that it's using UEKR2:

[root@testbox aime]# cat /etc/issue
Oracle Linux Server release 6.4
Kernel \r on an \m

[root@testbox aime]# uname -r
2.6.39-400.211.1.el6uek.x86_64

So how can we switch the kernel to UEKR3(3.8)?

If your linux version is 6.4, first do a "yum update -y" to upgrade to 6.5 and uppper, and then reboot the host, and follow steps below.

[root@testbox aime]# ls -l /etc/grub.conf
lrwxrwxrwx. 1 root root 22 Aug 21 18:24 /etc/grub.conf -> ../boot/grub/grub.conf

[root@testbox aime]# yum update -y

If your linux version is 6.5 and upper, you'll find /etc/grub.conf and /boot/grub/grub.conf are different files(for yum update one. If your host is OEL6.5 when installed, then /etc/grub.conf should be softlink too):

[root@testbox ~]# ls -l /etc/grub.conf
-rw------- 1 root root 2356 Oct 20 05:26 /etc/grub.conf

[root@testbox ~]# ls -l /boot/grub/grub.conf
-rw------- 1 root root 1585 Nov 23 21:46 /boot/grub/grub.conf

In /etc/grub.conf, you'll see entry like below:

title Oracle Linux Server Unbreakable Enterprise Kernel (3.8.13-44.1.3.el6uek.x86_64)
root (hd0,0)
kernel /vmlinuz-3.8.13-44.1.3.el6uek.x86_64 ro root=/dev/mapper/vg01-lv_root rd_LVM_LV=vg01/lv_root rd_NO_LUKS rd_LVM_LV=vg01/lv_swap LANG=en_US.UTF-8 KEYTABLE=us console=hvc0 rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_NO_DM rhgb quiet
initrd /initramfs-3.8.13-44.1.3.el6uek.x86_64.img

What you'll need to do is just copying the entries above from /etc/grub.conf to /boot/grub/grub.conf(make sure /boot/grub/grub.conf not be a softlink, else you may met error "Boot loader didn't return any data"), and then reboot the VM.

After rebooting, you'll find the kernel is now at UEKR3(3.8).

PS:

If you find the VM is OEL6.5 and /etc/grub.conf is a softlink to /boot/grub/grub.conf, then you could do the following to upgrade kernel to UEKR3:

1. add the following lines to /etc/yum.repos.d/public-yum-ol6.repo:

[public_ol6_UEKR3]
name=UEKR3 for Oracle Linux 6 ($basearch)
baseurl=http://public-yum.oracle.com/repo/OracleLinux/OL6/UEKR3/latest/$basearch/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1

2. List and install UEKR3:

[root@testbox aime]# yum list|grep kernel-uek|grep public_ol6_UEKR3
kernel-uek.x86_64 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-debug.x86_64 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-debug-devel.x86_64 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-devel.x86_64 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-doc.noarch 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-firmware.noarch 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-headers.x86_64 3.8.13-26.2.4.el6uek public_ol6_UEKR3

[root@testbox aime]# yum install -y kernel-uek* --disablerepo=* --enablerepo=public_ol6_UEKR3

3. Reboot

resolved – Exception: ha_check_cpu_compatibility failed:

November 12th, 2014 Comments off

Today when I tried to add one OVS server in OVMM pool, the following error messages prompted:

2014-11-12 06:25:18.083 WARNING failed:errcode=00000, errmsg=Unexpected error: <Exception: ha_check_cpu_compatibility
failed:<Exception: CPU not compatible! {'new_ovs_03': 'vendor_id=GenuineIntel;cpu_family=6;model=45', 'slce27vmf1002': 'vendor_id=GenuineIntel;cpu_family=6;model=44', 'new_ovs_03': 'vendor_id=GenuineIntel;cpu_family=6;model=45'}>

StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteHA.py", line 248, in ha_check_cpu_compatibility
raise Exception("CPU not compatible! %s" % repr(d))
>

StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteCluster.py", line 609, in cluster_check_prerequisite
raise Exception(msg)

StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteCluster.py", line 646, in _cluster_setup
#_check(ret)
File "/opt/ovs-agent-2.3/OVSXCluster.py", line 340, in _check
raise OVSException(error=ret["error"])

2014-11-12 06:25:18.083 NOTIFICATION Failed setup cluster for agent 2.2.0...
2014-11-12 06:25:18.083 ERROR Cluster Setup when adding server
2014-11-12 06:25:18.087 ERROR [Server Pool Management][Server Pool][test_serverpool]:During adding servers ([new_ovs_03]) to server pool (test_serverpool), Cluster setup failed: (OVM-1011 OVM Manager communication with new_ovs_03 for operation HA Setup for Oracle VM Agent 2.2.0 failed:
errcode=00000, errmsg=Unexpected error: <Exception: ha_check_cpu_compatibility
failed:<Exception: CPU not compatible! {'new_ovs_03': 'vendor_id=GenuineIntel;cpu_family=6;model=45', 'slce27vmf1002': 'vendor_id=GenuineIntel;cpu_family=6;model=44', 'new_ovs_03': 'vendor_id=GenuineIntel;cpu_family=6;model=45'}>

)

As stated in the error message, the adding failed at cpu check. To resolve this, we can comment out the code where cpu check occurred.

File /opt/ovs-agent-2.3/OVSSiteCluster.py in line 646 on each OVS server in the server pool:

#ret = cluster_check_prerequisite(ha_enable=ha_enable)
#_check(ret)

Then bounce ovs-agent on each OVS server, and try add again. Please note that this WA will make live migration between VMs not possible(actually it's cpu of different arch that makes live migration not possible).

Resolved – AttributeError: ‘NoneType’ object has no attribute ‘_imgName’

November 6th, 2014 Comments off

Today when I tried to list Virtual Machines on one Oracle OVMM, error message prompted:

[root@ovmm_test ~]# ovm -uadmin -ppassword vm ls
Traceback (most recent call last):
  File "/usr/bin/ovm", line 43, in ?
    ovmcli.ovmmain.main(sys.argv[1:])
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmmain.py", line 122, in main
    return ovm.ovmcli.runcmd(args)
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmcli.py", line 147, in runcmd
    return method(options)
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmcli.py", line 1578, in do_vm_ls
    result.append((serverpool._serverPoolName, vm._imgName))
AttributeError: 'NoneType' object has no attribute '_imgName'

Then I tried list VMs by server pool:

[root@ovmm_test ~]# ovm -uadmin -ppassword vm ls -s Pool1_test
Name                 Size(MB) Mem   VCPUs Status  Server_Pool
testvm1              17750    8196  4     Running Pool1_test
testvm2               50518    8196  4     Running Pool1_test
testvm3          19546    8192  2     Running Pool1_test
testvm4          50518    20929 4     Running Pool1_test
testvm5          19546    8192  2     Running Pool1_test
[root@ovmm_test ~]# ovm -uadmin -ppassword vm ls -s Pool1_test_A
Traceback (most recent call last):
  File "/usr/bin/ovm", line 43, in ?
    ovmcli.ovmmain.main(sys.argv[1:])
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmmain.py", line 122, in main
    return ovm.ovmcli.runcmd(args)
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmcli.py", line 147, in runcmd
    return method(options)
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmcli.py", line 1578, in do_vm_ls
    result.append((serverpool._serverPoolName, vm._imgName))
AttributeError: 'NoneType' object has no attribute '_imgName'

One pool was working and the other was not, so the problematic VMs must reside in pool Pool1_test_A.

Another symptom was that, although ovmcli wouldn't work, the OVMM GUI worked as expected and returns all the VMs.

As ovmcli read entries from Oracle DB(SID XE) on OVMM, so the issue maybe caused by the inconsistency between DB & OVMM agent DB.

I got the list of all VMs on the problematic server pool from OVMM GUI, and then ran the following query to get all entries in DB:

select IMG_NAME from OVS_VM_IMG where SITE_ID=110 and STATUS !='Running' and length(IMG_NAME)>50; #Pool1_test_A was with ID 110, got from table OVS_SITE. I used length() here because in the problematic server pool, VMs all should have IMG_NAME with more than 50 characters; if less than 50, then they were VM templates which should have no issue

Comparing the output from OVMM GUI & OVM DB, I found some entries which only existed in DB. And for all of these entries, they all had "Status" in "Creating", and also the

select IMG_NAME from OVS_VM_IMG where STATUS='Creating';

Then I removed these weird entries:

delete from OVS_VM_IMG where SITE_ID=110 and STATUS !='Running' and length(IMG_NAME)>50; #you can try rename/drop table OVS_VM_IMG(alter table TBL1 rename to TBL2; drop table TBL1), remove entries in backup table(OVS_VM_IMG_bak20141106), and then rename backup table(OVS_VM_IMG_bak20141106) to OVS_VM_IMG if failed at this step caused by foreign key or other reasons (don't do this, it will cause problem about constraints etc)

After this, the issue got resolved.

PS:

  1. After you removed entries with STATUS being "Creating", and if you found some more entries of this kind occurred in OVM DB, then maybe it's caused by VM templates not working or DB table corrupted. In this case, you'll need recover OVMM by rollong back to previous version of your backup, and then import VM templates/VM images etc.
  2. Actually the issue was caused by breaking constraint(constraint name conflicts caused by table backups. So better not to backup table when doing operation against OVMM DB using sql directly). This issue can be resolved by alter constraint name, and later remove backup tables.

resolved – IOError: [Errno 2] No such file or directory when creating VMs on Oracle VM Server

August 25th, 2014 Comments off

Today when I tried to add one OVS server to Oracle VM Server server pool, there was error message like below:

Start - /OVS/running_pool/vm_test
PowerOn Failed : Result - failed:<Exception: return=>failed:<Exception: failed:<IOError: [Errno 2] No such file or directory: '/var/ovs/mount/85255944BDF24F62831E1C6E7101CF7A/running_pool/vm_test/vm.cfg'>

I log on one OVS server and found the path was there. And later I logged on all OVS servers in that server pool and found one OVS server did not have storage repo. So I removed that OVS server from pool and tried to added it back to pool and want to create the VM again. But this time, the following error messages prompted when I tried to add OVS server back:

2014-08-21 02:52:52.962 WARNING failed:errcode=50006, errmsg=Do 'clusterm_init_root_sr' on servers ('testhost1') failed.
StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteCluster.py", line 651, in _cluster_setup
_check(ret)
File "/opt/ovs-agent-2.3/OVSXCluster.py", line 340, in _check
raise OVSException(error=ret["error"])

2014-08-21 02:52:52.962 NOTIFICATION Failed setup cluster for agent 2.2.0...
2014-08-21 02:52:52.963 ERROR Cluster Setup when adding server
2014-08-21 02:52:52.970 ERROR [Server Pool Management][Server Pool][test_pool]:During adding servers ([testhost1]) to server pool (test_pool), Cluster setup failed: (OVM-1011 OVM Manager communication with host_master for operation HA Setup for Oracle VM Agent 2.2.0 failed:
errcode=50006, errmsg=Do 'clusterm_init_root_sr' on servers ('testhost1') failed.

From here, I realized that this error was caused by storage repo could not created on that OVS server testhost1. So I logged on testhost1 for a check. As the storage repo was one NFS share, so I tried do a showmount -e <nfs server>, and found it's not working. And then I tried to check the tracert to <nfs server>, and it's not going through.

From another host, showmount -e <nfs server> worked. So the problem was on OVS server testhost1. After more debugging, I found that one NIC was on the host but not pingable. Later I had a check of the switch, and found the NIC was unplugged. I plugged in the NIC and tried again with adding back OVS server, creating VM, and all went smoothly.

PS:

Suppose you want to know the NFS clients which mount one share from the NFS server, then on any client that has access to the NFS server, do the following:

[root@centos-doxer ~]# showmount -a nfs-server.example.com|grep to_be
10.182.120.188:/export/IDM_BR/share01_to_be_removed
test02.example:/export/IDM_BR/share01_to_be_removed

-a or --all

List both the client hostname or IP address and mounted directory in host:dir format. This info should not be considered reliable.

PPS:

If you met below error when trying to poweron VM:

2016-08-19 02:58:28.474 NOTIFICATION [PowerOn][testvm]: Start - /OVS/running_pool/testvm
2016-08-19 02:58:39.802 NOTIFICATION [PowerOn][testvm]: Result - failed:<error: (113, 'No route to host')>
StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteVM.py", line 168, in start_vm
raise e
2016-08-19 02:58:39.802 WARNING [PowerOn][testvm]: Result - failed:<error: (113, 'No route to host')>
StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteVM.py", line 168, in start_vm
raise e

Or below error when trying to import template:

Register an empty virtual machine

register the virtual machine genernal information

get the virtual machine detail information...

Invalid virtual machine type(HVM/PVM).

Then the cause should be the existence of problematic OVS server. You should check ocfs2.conf or remove maintenance OVS server then have a test.

resolved – Kernel panic – not syncing: Attempted to kill init

July 29th, 2014 Comments off

Today when I tried to poweron one VM hosted on XEN server, the following error messages prompted:

Write protecting the kernel read-only data: 6784k
Kernel panic - not syncing: Attempted to kill init! [failed one]
Pid: 1, comm: init Not tainted 2.6.32-300.29.1.el5uek #1
Call Trace:
[<ffffffff810579a2>] panic+0xa5/0x162
[<ffffffff8109b997>] ? atomic_add_unless+0x2e/0x47
[<ffffffff8109bdf9>] ? __put_css_set+0x29/0x179
[<ffffffff8145744c>] ? _write_lock_irq+0x10/0x20
[<ffffffff81062a65>] ? exit_ptrace+0xa7/0x118
[<ffffffff8105b076>] do_exit+0x7e/0x699
[<ffffffff8105b731>] sys_exit_group+0x0/0x1b
[<ffffffff8105b748>] sys_exit_group+0x17/0x1b
[<ffffffff81011db2>] system_call_fastpath+0x16/0x1b

This is quite weird as it's ok yesterday:

Write protecting the kernel read-only data: 6784k
blkfront: xvda: barriers enabled (tag) [normal one]
xvda: detected capacity change from 0 to 15126289920
xvda: xvda1 xvda2 xvda3
blkfront: xvdb: barriers enabled (tag)
xvdb: detected capacity change from 0 to 16777216000
xvdb: xvdb1
Setting capacity to 32768000
xvdb: detected capacity change from 0 to 16777216000
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: Disabled at runtime.
type=1404 audit(1406281405.511:2): selinux=0 auid=4294967295 ses=4294967295

After some checking, I found that this OVS server was hosting more than 40 VMs, and the VCPUs was tight. So I turned off some unused VMs and then issue resolved.

Resolved – Your boot partition is on a disk using the GPT partitioning scheme but this machine cannot boot using GPT

June 12th, 2014 1 comment

Today when I tried to install Oracle VM Server on one server, the following error occurred:

Your boot partition is on a disk using the GPT partitioning scheme but this machine cannot boot using GPT. This can happen if there is not enough space on your hard drive(s) for the installation.

So to went on with the installation, I had to think of a way to erase GPT partition table on the drive.

To do this, the first step is to fall into linux rescue mode when booting from CDROM(another way is when installing OVS, Use Alt-F2 to access a different terminal screen to the installer. Use fdisk from the command line to manually repartition the disk using a dos partition table.):

rescue

Later, check with fdisk -l, I could see that /dev/sda was the only disk that needed erasing GPT label. So I used dd if=/dev/zero of=/dev/sda bs=512 count=1 to erase GPT table:

 

fdisk_dd

 

After this, run fdisk -l again, I saw that the partition table was now gone:

fdisk_dd_2

Later, re-initializing installation of OVS server. When the following message prompted, select "No":

select_no

And select "yes" when below message prompted so that we can make new partition table:

select_yes

The steps after this was normal ones, and the installation went smoothly.

PS:

If the disk is more than 2T, then there's no way to soft convert from GPT to MBR, so you'll need decrease the disk size from BIOS booting process. Here's an example of using LSI Raid Controller MegaRAID BIOS Config Utility Drive 252 to reconfig disks from Oracle iLOM GUI console(you can check here for MegaCLI). (tips - the first sector contains MBR<446 byptes> and partition table<64 byptes>, 3 main partitions and 1 extend partition. The extend partition can have many logical partitions)

Alt + A to enable/disable shortcut select.

When short cut disabled - TAB to move between items, Enter will work as expected when shortcut disabled.

When shortcut enabled - Space key acts as Enter when shortcut enabled.

If you want to go into BIOS, press F2.

Press Ctrl + H to goto MegaRaid WebBIOS(in newer version, press Ctrl + R):

屏幕快照 2016-04-11 17.37.27

2-disks

3-clear configuration

4-add configuration

5-manual configuration

6-partition

7-raid-hole

8-raid-1

9-raid-1

10-another-one

11-set-boot

12-raid5

13-overview-1

14-overview-2

Below is for new version(use small strip size for DB):

webBIOS-megaraid

 

Resolved – failed Exception check srv hostname/IP failedException Invalid hostname/IP configuration ocfs2 config failed Obsolete nodes found

June 3rd, 2014 Comments off

Today when I tried to add two OVS servers into one server pool, errors were met. The first one was like below:

2014-06-03 04:26:08.965 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:09.485 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:09.497 NOTIFICATION [Server Pool Management][Server][hostname1.example.com]:Check agent (hostname1.example.com) connectivity.
2014-06-03 04:26:12.463 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:12.985 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:12.997 NOTIFICATION [Server Pool Management][Server][hostname1.example.com]:Check agent (hostname1.example.com) connectivity.
2014-06-03 04:26:13.004 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:13.522 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:13.535 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:13.980 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:16.307 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:16.831 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:16.844 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:17.284 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:17.290 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:17.814 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:17.827 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:18.272 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:18.279 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:18.799 NOTIFICATION Regisering server:hostname1.example.com...
2014-06-03 04:26:21.749 NOTIFICATION Register Server: hostname1.example.com success
2014-06-03 04:26:21.751 NOTIFICATION Getting host info for server:hostname1.example.com ...
2014-06-03 04:26:23.894 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) starting.
failed:<Exception: check srv('hostname1.example.com') hostname/IP failed! => <Exception: Invalid hostname/IP configuration: hostname=hostname1;ip=10.200.225.127>
2014-06-03 04:26:33.348 ERROR [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:During adding servers ([hostname1.example.com]) to server pool (DC1_DMZ_Service_Mid), Cluster setup failed: (OVM-1011 OVM Manager communication with materhost for operation HA Setup for Oracle VM Agent 2.2.0 failed:
failed:<Exception: check srv('hostname1.example.com') hostname/IP failed! => <Exception: Invalid hostname/IP configuration: hostname=hostname1;ip=10.200.225.127>

Also there's error message like below:

2014-06-03 04:59:11.003 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:11.524 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:11.536 NOTIFICATION [Server Pool Management][Server][hostname1-fe.example.com]:Check agent (hostname1-fe.example.com) connectivity.
2014-06-03 04:59:15.484 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:16.005 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:16.016 NOTIFICATION [Server Pool Management][Server][hostname1-fe.example.com]:Check agent (hostname1-fe.example.com) connectivity.
2014-06-03 04:59:16.025 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:16.546 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:16.559 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:17.014 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:18.950 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:19.470 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:19.483 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:19.926 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:19.955 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:20.476 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:20.490 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:20.943 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:20.947 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:21.471 NOTIFICATION Regisering server:hostname1-fe.example.com...
2014-06-03 04:59:24.439 NOTIFICATION Register Server: hostname1-fe.example.com success
2014-06-03 04:59:24.439 NOTIFICATION Getting host info for server:hostname1-fe.example.com ...
2014-06-03 04:59:26.577 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) starting.
failed:<Exception: check srv('hostname1-fe.example.com') ocfs2 config failed! => <Exception: Obsolete nodes found: >
2014-06-03 04:59:37.100 ERROR [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:During adding servers ([hostname1-fe.example.com]) to server pool (DC1_DMZ_Service_Mid), Cluster setup failed: (OVM-1011 OVM Manager communication with materhost for operation HA Setup for Oracle VM Agent 2.2.0 failed:
failed:<Exception: check srv('hostname1-fe.example.com') ocfs2 config failed! => <Exception: Obsolete nodes found: >

Then I was confused about "Obsolete nodes found" it complained. I could confirm that I've removed hostname1.example.com, and even after I checked OVM DB in OVS.OVS_SERVER, there's no record about hostname1.example.com.

Then after some searching, these errors were caused by obsolete info in OCFS2(Oracle Cluster File System). We should edit file /etc/ocfs2/cluster.conf and remove obsolete entries.

-bash-3.2# vi /etc/ocfs2/cluster.conf
node:
        ip_port     = 7777
        ip_address  = 10.200.169.190
        number      = 0
        name        = hostname1
        cluster     = ocfs2

node:
        ip_port     = 7777
        ip_address  = 10.200.169.191
        number      = 1
        name        = hostname2
        cluster     = ocfs2

cluster:
        node_count  = 2
        name        = ocfs2

So hostname2 was no longer needed or the IP address of hostname2 was changed, then you should remove entries related to hostname2, and modify node_count to 1. Later bounce ocfs2/o2cb services:

service ocfs2 restart

service o2cb restart

Later, I tried add OVS server again, and it worked! (Before adding that OVS server back, we need first remove its ovs-agent db: service ovs-agent stop; mv /etc/ovs-agent/db /var/tmp/db.bak.5; service ovs-agent start, and then configure ovs-agent service ovs-agent configure. You can also use /opt/ovs-agent-2.3/utils/cleanup.py to clean up too.)

PS:

Here is more about ocfs2 and o2cb http://www.doxer.org/o2cb-for-ocfs2/

resolved – check backend OHS httpd servers for BIG ip F5 LTM VIP

May 23rd, 2014 Comments off

Assume you want to check the OHS or httpd servers one LTM VIP example.vip.com is routing traffic to. Then here's the steps:

  1. get the ip address of VIP example.vip.com;
  2. log on LTM's BUI. Local traffic-> virtual servers -> virtual server list, search ip
  3. click "edit" below column "resource"
  4. note down default pool
  5. search pool name in local traffic -> virtual servers -> pools -> pool list
  6. click the number below column members. Then you'll find the OHS servers and ports the VIP will route traffic to.

PS:

  • To check connections including one specific IP, run below
    • show /sys connection |grep -w <IP>

Oracle VM operations – poweron, poweroff, status, stat -r

January 27th, 2014 Comments off

Here's the script:
#!/usr/bin/perl
#1.OVM must be running before operations
#2.run ovm_vm_operation.pl status before running ovm_vm_operation.pl poweroff or poweron
use Net::SSH::Perl;
$host = $ARGV[0];
$operation = $ARGV[1];
$user = 'root';
$password = 'password';

$newname=$ARGV[2];
$newcpu=$ARGV[3];
$newmemory=$ARGV[4];
$newpool=$ARGV[5];
$newtmpl=$ARGV[6];
$newbridge=$ARGV[7];
$newbridge2=$ARGV[8];
$newvif='vif0';
$newvif2='VIF1';

if($host eq "help") {
print "$0 OVM-name status|poweron|poweroff|reboot|stat-r|stat-r-all|pool|new vmname 1 4096 poolname tmplname FE BE\n";
exit;
}

$ssh = Net::SSH::Perl->new($host);
$ssh->login($user,$password);

if($operation eq "status") {
($stdout,$stderr,$exit) = $ssh->cmd("ovm -uadmin -ppassword vm ls|grep -v VM_test");
open($host_fd,'>',"/var/tmp/${host}.status");
select $host_fd;
print $stdout;
close $host_fd;
} elsif($operation eq "poweroff") {
open($poweroff_fd,'<',"/var/tmp/${host}.status");
foreach(<$poweroff_fd>){
if($_ =~ "Server_Pool|OVM|Powered") {
next;
}
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+([a-zA-Z]{1,})\s+(.*)/){
$ssh->cmd("ovm -uadmin -ppassword vm poweroff -n $1 -s $6 -f");
sleep 12;
}
}
} elsif($operation eq "reboot") {
open($poweroff_fd,'<',"/var/tmp/${host}.status");
foreach(<$poweroff_fd>){
if($_ =~ "Server_Pool|OVM|Powered") {
next;
}
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+([a-zA-Z]{1,})\s+(.*)/){
$ssh->cmd("ovm -uadmin -ppassword vm reboot -n $1 -s $6");
sleep 12;
}
}
} elsif($operation eq "poweron") {
open($poweron_fd,'<',"/var/tmp/${host}.status");
foreach(<$poweron_fd>){
if($_ =~ "Server_Pool|OVM|Running|used|poweroff") {
next;
}
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+([a-zA-Z]{1,})\s+Off(.*)/){
$ssh->cmd("ovm -uadmin -ppassword vm poweron -n $1 -s $6");
#print "ovm -uadmin -ppassword vm poweron -n $1 -s $6";
sleep 15;
}
}
} elsif($operation eq "stat-r") {
open($poweroff_fd,'<',"/var/tmp/${host}.status");
foreach(<$poweroff_fd>){
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+(Shutting\sDown|Initializing|Error|Unknown|Rebooting|Deleting)\s+(.*)/){
#print "ovm -uadmin -ppassword vm stat -r -n $1 -s $6";
$ssh->cmd("ovm -uadmin -ppassword vm stat -r -n $1 -s $6");
sleep 1;
}
}
} elsif($operation eq "stat-r-all") {
open($poweroff_fd,'<',"/var/tmp/${host}.status");
foreach(<$poweroff_fd>){
$ssh->cmd("ovm -uadmin -ppassword vm stat -r -n $1 -s $6");
sleep 1;
}
} elsif($operation eq "pool") {
($stdoutp,$stderrp,$exitp) = $ssh->cmd("ovm -uadmin -ppassword svrp ls|grep Inactive");
open($host_fdp,'>',"/var/tmp/${host}-poolstatus");
select $host_fdp;
print $stdoutp;
close $host_fdp;
} elsif($operation eq "new") {
($stdoutp,$stderrp,$exitp) = $ssh->cmd("ovm -uadmin -ppassword tmpl ls -s $newpool | grep $newtmpl");
if($stdoutp =~ /$newtmpl/){
($stdoutp2,$stderrp2,$exitp2) = $ssh->cmd("ovm -uadmin -ppassword vm new -m template -s $newpool -t $newtmpl -n $newname -c password");
if($stdoutp2 =~ /is being created/){
print "Creating VM $newname in pool $newpool on OVMM $host now!"."\n";
while(1){
($stdoutp3,$stderrp3,$exitp3) = $ssh->cmd("ovm -uadmin -ppassword vm stat -n $newname -s $newpool");
if($stdoutp3 =~ /Powered Off/){
print "Done VM creation."."\n";
last;
}
sleep 300
}

print "Setting Cpu/Memory now."."\n";
($stdoutp32,$stderrp32,$exitp32) = $ssh->cmd("ovm -uadmin -ppassword vm conf -n $newname -s $newpool -x $newmemory -m $newmemory -c $newcpu -P");
sleep 2;

print "Creating NICs now."."\n";
($stdoutp4,$stderrp4,$exitp4) = $ssh->cmd("ovm -uadmin -ppassword vm nic conf -n $newname -s $newpool -N $newvif -i VIF0 -b $newbridge");
sleep 2;
($stdoutp5,$stderrp5,$exitp5) = $ssh->cmd("ovm -uadmin -ppassword vm nic add -n $newname -s $newpool -N $newvif2 -b $newbridge2");
sleep 2;

print "Powering on VM now."."\n";
($stdoutp6,$stderrp6,$exitp6) = $ssh->cmd("ovm -uadmin -ppassword vm poweron -n $newname -s $newpool");
sleep 30;

while(1){
($stdoutp7,$stderrp7,$exitp7) = $ssh->cmd("ovm -uadmin -ppassword vm info -n $newname -s $newpool");
if($stdoutp7 =~ /Running on: sl/){
print "VM is now Running, you can configure VM on hypervisor now:"."\n";
print $stdoutp7."\n";
last;
}
sleep 30;
}

#($stdoutp8,$stderrp8,$exitp8) = $ssh->cmd("ovm -uadmin -ppassword vm ls -l | grep $newname");
#print "You can configure VM on hypervisor now:"."\n";
#print $stdoutp8."\n";
} else {
print $stdoutp2."\n";
exit;
}
} else {
print "No template named $newtmpl in pool $newpool\n";
exit;
}
}

You can use the following to make the script run in parallel:

for i in <all OVMs>;do (./ovm_vm_operation.pl $i status &);done

resolved – ESXi Failed to lock the file

January 13th, 2014 Comments off

When I was power on one VM in ESXi, one error occurred:

An error was received from the ESX host while powering on VM doxer-test.
Cannot open the disk '/vmfs/volumes/4726d591-9c3bdf6c/doxer-test/doxer-test_1.vmdk' or one of the snapshot disks it depends on.
Failed to lock the file

And also:

unable to access file since it is locked

This apparently was caused by some storage issue. I firstly googled and found most of the posts were telling stories about ESXi working mechanism, and I tried some of them but with no luck.

Then I thought of that our storage datastore was using NFS/ZFS, and NFS has file lock issue as you know. So I mount the nfs share which datastore was using and removed one file named lck-c30d000000000000. After this, the VM booted up successfully! (or we can log on ESXi host, and remove lock file there also)

Common storage multi path Path-Management Software

December 12th, 2013 Comments off
Vendor Path-Management Software URL
Hewlett-Packard AutoPath, SecurePath www.hp.com
Microsoft MPIO www.microsoft.com
Hitachi Dynamic Link Manager www.hds.com
EMC PowerPath www.emc.com
IBM RDAC, MultiPath Driver www.ibm.com
Sun MPXIO www.sun.com
VERITAS Dynamic Multipathing (DMP) www.veritas.com

VLAN in windows hyper-v

November 26th, 2013 Comments off

Briefly, a virtual LAN (VLAN) can be regarded as a broadcast domain. It operates on the OSI
network layer 2. The exact protocol definition is known as 802.1Q. Each network packet belong-
ing to a VLAN has an identifier. This is just a number between 0 and 4095, with both 0 and 4095
reserved for other uses. Let’s assume a VLAN with an identifier of 10. A NIC configured with
the VLAN ID of 10 will pick up network packets with the same ID and will ignore all other IDs.
The point of VLANs is that switches and routers enabled for 802.1Q can present VLANs to dif-
ferent switch ports in the network. In other words, where a normal IP subnet is limited to a set
of ports on a physical switch, a subnet defined in a VLAN can be present on any switch port—if
so configured, of course.

Getting back to the VLAN functionality in Hyper-V: both virtual switches and virtual NICs
can detect and use VLAN IDs. Both can accept and reject network packets based on VLAN ID,
which means that the VM does not have to do it itself. The use of VLAN enables Hyper-V to
participate in more advanced network designs. One limitation in the current implementation is
that a virtual switch can have just one VLAN ID, although that should not matter too much in
practice. The default setting is to accept all VLAN IDs.

hadoop installation on centos linux – pseudodistributed mode

September 18th, 2013 Comments off

First, install JDK and set JAVA_HOME:

yum install jdk-1.6.0_30-fcs

export JAVA_HOME=/usr/java/jdk1.6.0_30

Now install hadoop rpm:

rpm -Uvh hadoop-1.2.1-1.x86_64.rpm

run hadoop version to verify that hadoop was successfully installed:

[root@node3 hadoop]# hadoop version

Hadoop 1.2.1 

After this, let's config hadoop to run in Pseudodistributed mode:

[root@node3 hadoop]# cat /etc/hadoop/core-site.xml

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost/</value> </property> </configuration> [root@node3 hadoop]# cat /etc/hadoop/hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> [root@node3 hadoop]# cat /etc/hadoop/mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:8021</value> </property> </configuration>

We need configure password-less configuration for ssh localhost if we're running in Pseudodistributed mode. This mainly means you can "ssh localhost" without a password(ssh-keygen -t rsa/append id_rsa.pub to authorized_keys). After above ssh configuration, let's go formating HDFS filesystem:

hadoop namenode -format

Now we can start daemons:

start-dfs.sh start-mapred.sh

PS: I found that start-dfs.sh and start-mapred.sh and some other hadoop related scripts are not with execution permission initially, so you may run the following script to fix this:

for i in `find /usr/sbin/ -type f ! -perm -u+x`;do chmod u+x $i;done

That's all for hadoop installation on linux. You can now visit http://<ip of node>:50030/jobtracker.jsp and http://<ip of node>:50070/dfshealth.jsp to see status of hadoop jobtracker/namenode respectively.

PS:

  1. <Hadoop: The Definitive Guide> is a good book about hadoop.
  2. Here's about YARN which is mapreduce v2

yarn

 

Categories: Clouding, IT Architecture Tags:

chef installation on centos linux

July 19th, 2013 Comments off

We need install chef server, chef workstation, chef nodes to make chef working, here's the steps:

###Install chef server
1.Go to http://www.opscode.com/chef/install.
2.Click the Chef Server tab.
3.Select the operating system, version, and architecture.
4.Select the version of Chef Server 11.x to download, and then click the link that appears to download the package.
5.Install the downloaded package using the correct method for the operating system on which Chef Server 11.x will be installed.
6.Configure Chef Server 11.x by running the following command:
$ sudo chef-server-ctl reconfigure

add sudo even you're root, or you may encouter some errors like the following:
/opt/chef-server/embedded/service/chef-pedant/lib/pedant/config.rb:34:in `from_argv': Configuration file '/var/opt/chef-server/chef-pedant/etc/pedant_config.rb' not found! (RuntimeError)
from /opt/chef-server/embedded/service/chef-pedant/lib/pedant.rb:44:in `setup'
from ./bin/chef-pedant:26:in `<main>'
Error reading file /etc/chef-server/chef-webui.pem

This command will set up all of the required components, including Erchef, RabbitMQ, PostgreSQL, and all of the cookbooks that are used by chef-solo to maintain Chef Server 11.x.

7.Verify the the hostname for the Chef Server by running the hostname command. The hostname for the Chef Server must be a FQDN(test.domain.name).

8.Verify the installation of Chef Server 11.x by running the following command:

$ sudo chef-server-ctl test
This will run the chef-pedant test suite against the installed Chef Server 11.x and will report back that everything is working and installed correctly.

###Install chef workstation
1.Go to: http://www.opscode.com/chef/install/, select the operating system, version, and architecture appropriate for your environment, and identify the URL that will be used to download the package or download the package directly.
2.Run the commands identified above: curl -L http://www.opscode.com/chef/install.sh | sudo bash (you can also wget the script and run it)
3.After installation, you can run chef-client -v to see the version of chef
4.Install git(yum install git for centos/redhat if you're using EPEL repo)
5.Clone the Chef repository: cd ~; git clone git://github.com/opscode/chef-repo.git (If you met some connection errors, you can change git:// to http://, i.e. git clone http://github.com/opscode/chef-repo.git)
6.Create the .chef directory: mkdir -p ~/chef-repo/.chef
7.Copy admin.pem, chef-validator.pem from chef server(in /etc/chef-server/ on chef server) to workstation(/etc/chef, if not exists, create the directory)
8.Now generate knife.rb using command knife configure --initial

Overwrite /root/.chef/knife.rb? (Y/N) y
Please enter the chef server URL: [http://server.domain.name:4000] https://server.domain.name
Please enter a name for the new user: [root] user1
Please enter the existing admin name: [admin]
Please enter the location of the existing admin's private key: [/etc/chef/admin.pem]
Please enter the validation clientname: [chef-validator]
Please enter the location of the validation key: [/etc/chef/validation.pem] /etc/chef/chef-validator.pem
Please enter the path to a chef repository (or leave blank):
Creating initial API user...
Please enter a password for the new user:
Created user[user1]
Configuration file written to /root/.chef/knife.rb

[root@workstation]# cp /root/.chef/knife.rb /root/chef-repo/.chef/
[root@workstation]# cp /etc/chef/admin.pem /root/chef-repo/.chef/
[root@workstation]# cp /etc/chef/chef-validator.pem /root/chef-repo/.chef/

9.Add ruby to PATH: echo 'export PATH="/opt/chef/embedded/bin:$PATH"' >> ~/.bash_profile && source ~/.bash_profile
10.Verify the chef workstation install
cd ~/chef-repo
knife client list
knife node show <node name>
knife user list

###Install chef client on nodes
On chef workstation:
knife bootstrap <node ip or FQDN> -x username -P password
client01
knife client show <node ip or FQDN just added>

Here's some definitions that may help you out of mouthful of definitions in chef(from http://www.jasongrimes.org/2012/06/managing-lamp-environments-with-chef-vagrant-and-ec2-1-of-3/):

  • Declare policy using resources
  • Collect resources into recipes
  • Package recipes and supporting code into cookbooks
  • Apply cookbooks to nodes using roles
  • Run Chef to configure nodes according to their assigned roles

###First try of chef populating

I want to install nginx on client named chef-client01 which is a redhat box. Here's the steps to do this:

The following steps are executed on chef workstation:

First, we need create environment and assign node to the environment:

cd /root/chef-repo
vi environments/prod.rb

name "prod"
description "The production environment"

git add environments
git commit -m 'Add production environments.'
knife environment from file environments/prod.rb

knife node edit <node name> #change environment from _default to prod
Now let's install php cookbook and its dependencies:

knife cookbook site install yum
knife cookbook site install runit
knife cookbook site install ohai
knife cookbook site install php
knife cookbook upload --all

Now let's create a role and make recipes available to environment "prod" which contains our node:

vi roles/db_master.rb

name "db_master"
description "Master database server"

all_env = [
"recipe[php]"
]

run_list(all_env)

env_run_lists(
"_default" => all_env,
"prod" => all_env,
"dev" => all_env,
)

git add roles
git commit -m 'Add LAMP roles.'
knife role from file roles/db_master.rb

Finally, we should make the node install php:

knife ssh "name:<node name>" "chef-client" -x root -P <your password on chef node>

Or you can run chef-client from chef node.

PS:
1.Here's all aspects of chef http://docs.opscode.com/
2.Here's more detailed installation guide of installing chef http://docs.opscode.com/install_server.html
3.You may need set http proxy when doing some of the downloading or knife bootstrap steps. You may try export http_proxy=http://<your_proxy:port> for wget or try knife bootstrap <other options> --bootstrap-proxy http://<your_proxy:port> for knife bootstrap

4.For chef solo install and configure, you can refer to the following article http://gettingstartedwithchef.com/first-steps-with-chef.html

 

Categories: Clouding, IT Architecture Tags:

make linux image template for use on OVM EC2 Esxi

July 18th, 2013 Comments off

No default gateway:

[root@centos images]# cat /etc/sysconfig/network

NETWORKING=yes
HOSTNAME=localhost.localdomain

Comment out udev for NICs:

[root@centos images]# cat /etc/udev/rules.d/70-persistent-net.rules
# This file was automatically generated by the /lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.

# PCI device 0x15ad:0x07b0 (vmxnet3)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:0c:29:f5:c1:86", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"

# PCI device 0x15ad:0x07b0 (vmxnet3)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:0c:29:f5:c1:90", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"

Use DHCP:

[root@centos images]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
BOOTPROTO="dhcp"
ONBOOT="yes"
TYPE="Ethernet"

Categories: Clouding, IT Architecture Tags:

install vm with virt-install libvirt in Xen – vm.cfg

July 8th, 2013 Comments off
  • whether your cpu support full virtualization aka HVM

egrep '(svm|vmx)' --color=always /proc/cpuinfo

xm info|grep -i hvm

#on xen 3.4

virt_caps : hvm
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64

#on xen 4.1

virt_caps : hvm hvm_directio
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64

  • Install required packages

yum install kvm libvirt libvirt-python python-virtinst bridge-utils virt-viewer virt-manager

  • Now add bridge and attach Interface to bridge

#brctl addbr virbr0 #you'll need bridge set up before using virt-install

#brctl addif virbr0 eth1 #eth1 will disconnect

vi ifcfg-eth0

DEVICE=eth0
HWADDR=00:16:3E:71:43:2B
ONBOOT=yes
TYPE=Ethernet
IPV6INIT=no
USERCTL=no
BRIDGE=virbr0


vi ifcfg-virbr0

DEVICE=virbr0
TYPE=Bridge
BOOTPROTO=static
DNS1=192.135.82.180
GATEWAY=192.168.144.1
IPADDR=192.168.149.111
NETMASK=255.255.255.0
ONBOOT=yes
SEARCH="example.com"

  • Prepare storage (20G image)

dd if=/dev/zero of=oel5.8_20G.img bs=1 count=0 seek=20000

  • Connect to vm console in VNC and ensure the status is Active

virt-manager

PS:

1. Change 'no' to 'yes' in /etc/xen/xend-config.sxp for xend-unix-server and restart xend if you met problem saying "Unable to open a connection to the Xen hypervisor/daemon". 

2. To manager VMs created by virt-install(-v/-p to specify hvm/pvm), you can check virsh(virsh setmaxmem/setmem/console/start/destroy/detach-disk/list/rename VM here)

  • Starting to install vm

/etc/init.d/libvirtd start

virt-install --name=oel5.8-15G --arch=x86_64 --vcpus=2 --ram=1024 --os-type=linux --os-variant=rhel5 --virt-type=kvm --connect=qemu:///system --network bridge:virbr0 --cdrom=iso/OLinux5.8_64.iso --disk path=oel5.8_15g.img,size=15 --accelerate --vnc --keymap=us

PS:

1. You can also install OS from CD(VMs will be HVM), and to do this you need set cdrom in vm.cfg and later xm create vm.cfg -c:

disk = ['file:/test/test_cellnode1/System.img,xvda,w',

'file:/test/OEL64.iso,xvdc:cdrom,r']

boot = 'cd'

2.You can read more docs about virtualization here https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Virtualization/

3.If you're using OVS/OVM, this command may help you (using XEN by default):

virt-install -n <name of VM> -r 1024 --vcpus=2 -f <path to System.img> -b <bridge name> --vif-type=netfront -p -l <http://path-to-mount-dir-of-OS-ISO> --nographics --os-type=linux --os-variant=rhel6

4.You can use oraclevm-template --config --force to config VM's network/gateway/hostname automatically. Here's the package: http://public-yum.oracle.com/repo/EnterpriseLinux/EL5/addons/x86_64/getPackage/ovm-template-config-1.1.0-9.el5.noarch.rpm  or http://public-yum.oracle.com/repo/OracleLinux/OL6/addons/x86_64/getPackage/ol-template-config-1.1.0-11.el6.noarch.rpm (version may change)

5. To use HVM instead of PVM, you can put below into vm.cfg:

acpi = 1
apic = 1
pae = 1
builder = 'hvm' #'generic' is for pvm
kernel = '/usr/lib/xen/boot/hvmloader'
device_model = '/usr/lib/xen/bin/qemu-dm'
cpuid = ['1:edx=xxxxxxxxxxxxxxxxxxx0xxxxxxxxxxxx']
disk = ['file:/EXAVMIMAGES/test_cellnode1/System.img,xvda,w',

'file:/OVS/Repositories/0004fb00000300008a3e7f5e0671eeb5/ISOs/0004fb00001500003119b8ff5f3ccf93.iso,xvdb:cdrom,r'
]
boot = 'cd'
memory = '4096'
maxmem = '4096'
OVM_simple_name = 'cellnode1'
name = 'cellnode1'
OVM_os_type = 'Oracle Linux 6'
vcpus = '2'
uuid = '42bd1a4f-241f-4a8f-b7a3-09aadfb28bf1'
on_crash = 'restart'
on_reboot = 'restart'
serial = 'pty'
keymap = 'en-us'
vif = ['type=netfront,mac=00:16:3e:e3:64:d1,bridge=vmbondeth0']
timer_mode = 2
pci=['0000:30:00.1']

#xen_platform_pci = "1"

#sdl = "0"
#stdvga = "0"

#pci=[
#'0000:3a:10.0,power_mgmt=0,msitranslate=0,permissive=0,rdm_policy=relaxed',
#'0000:3a:10.2,power_mgmt=0,msitranslate=0,permissive=0,rdm_policy=relaxed',
#]
#pvh = "1"
#cpus="8-23"
vnc = "1"
vnclisten = "0.0.0.0"
vncconsole = "1"
vncdisplay = "1"
serial = "pty"

Among above, pci=['0000:30:00.1'] is to use infiniband pass-through. To install infiniband driver, you can do below:

yum groupinstall "Infiniband Support" -y
yum install infiniband-diags perftest qperf opensm -y
rm -f /etc/modprobe.conf
chkconfig rdma on
chkconfig opensm on
/etc/rc.d/init.d/rdma restart
/etc/rc.d/init.d/opensm restart #Opensm cannot start on OEL6.4, but works on linux kernel 3.5.16. – IB works.

After this, ifconfig will show infiniband devices in ifconfig, and you can configure IP on them.

6. Another vm.cfg for HVM:

builder = "hvm"
#builder = "generic"
hap = "1"
nestedhvm = "1"
name = "oel71"
memory = "4096"
vcpus = "2"
on_poweroff = "destroy"
on_reboot = "restart"
on_crash = "restart"
disk = [ "file:/vmrepo/oel71/System.img,xvda,w",
'file:/vmrepo/oel71/OEL7.1.iso,xvdb:cdrom,r'
]
#vif = ['bridge=v130,mac=00:16:3E:5B:5C:2E,type=netfront',]
#vif = [ "mac=00:50:56:4F:11:23,bridge=v307_FE,model=e1000", ]
vif = [ "bridge=v307_FE,mac=00:50:56:4F:11:23,type=netfront", ]
#d(cdrom), c(disk), more info about boot order is here, and here is all options for vm.cfg
boot = "d"
vfb = ['type=vnc,vncunused=1,vnclisten=0.0.0.0,vncpasswd=test']
#uuid = '9afe4c23-6c68-4ce9-9418-b23d5d8b00ca'
#vnc = "1"
#vnclisten = "0.0.0.0"
#vncconsole = "1"
#vncdisplay = "1"
#serial = "pty"

Below is vm.cfg from OVS3 when VM created from OVMM3:

vif = ['mac=00:21:f6:04:30:18,bridge=xenbr0', 'mac=00:21:f6:04:30:4b,bridge=xenbr0']
OVM_simple_name = 'test'
disk = ['file:/OVS/Repositories/0004fb00000300008eb56cf6d1201964/VirtualDisks/0004fb0000120000465f2b5ce4f23b1b.img,xvda,w', 'file:/OVS/Repositories/0004fb00000300008eb56cf6d1201964/VirtualDisks/0004fb00001200000e7f0f5c8078439b.img,xvdb,w', 'file:/OVS/Repositories/0004fb00000300008eb56cf6d1201964/VirtualDisks/0004fb000012000022ecb2ec0d44be24.img,xvdc,w']
bootargs = ''
disk_other_config = []
uuid = '0004fb00-0006-0000-a432-8b8d10231fd5'
on_reboot = 'restart'
cpu_weight = 27500
memory = 73728
cpu_cap = 0
maxvcpus = 8
OVM_high_availability = False
OVM_description = 'Import URLs: [http://test.example.com/templates/OEL6_latest/System.img, http://test.example.com/templates/OEL6_latest/oem.img, http://test.example.com/templates/OEL6_latest/u02.img, http://test.example.com/templ'
on_poweroff = 'destroy'
on_crash = 'restart'
bootloader = '/usr/bin/pygrub'
name = '0004fb0000060000a4328b8d10231fd5'
guest_os_type = 'default'
vif_other_config = []
vfb = ['type=vnc,vncunused=1,vnclisten=127.0.0.1,keymap=en-us']
vcpus = 8
OVM_os_type = 'None'
OVM_cpu_compat_group = ''
OVM_domain_type = 'xen_pvm'

7. Sometimes you may need change Baud Rate on grub.conf after "initrd" if you find xm console won't work(but I've observed that the VM cannot boot after changing this, so be prepared):

console=tty0 console=ttyS0,9600n8

If still failed using windows vnc, you can try use linux vncviewer(first yum install vnc), e.g. vncviewer <xenserver:port>.

8. To check whether the VM is hvm or pvm, you can run "xm list -l <vm name> | grep -i hvm" to check.

9. Here is more info about libvirt tools(virt-install/virt-manager)

create vm image from qemu-kvm

July 1st, 2013 Comments off

yum install kvm libvirt libvirt-python python-virtinst -y

virt-install --name=vmname1 --arch=x86_64 --vcpus=2 --ram=8192 --os-type=linux --os-variant=rhel6 --virt-type=kvm --connect=qemu:///system --network bridge:v115_FE --cdrom=/OVS/iso_pool/OLinux6.6_64.iso --disk path=/OVS/seed_pool/vmname1/System.img,size=40 --accelerate --vnc --keymap=us

virsh list #images will be in /var/lib/libvirt/images/.
virsh console

PS:

  1. For xen and more about virt-install, you can refer to this article.
  2. KVM is full virtualization solution, and XEN can be full/para virtualization. Only HVM needs QEMU emulation. More info is here.
  3. For KVM hvm, we can use PVHVM driver(virtio, more info is here http://my.oschina.net/davehe/blog/130124)
Categories: Clouding, IT Architecture Tags:

o2cb for OCFS2

July 1st, 2013 Comments off
o2cb - Default cluster stack for the OCFS2 file system, it includes
  • a node manager (o2nm) to keep track of the nodes in the cluster,
  • a heartbeat agent (o2hb) to detect live nodes
  • a network agent (o2net) for intra-cluster node communication
  • a distributed lock manager (o2dlm) to keep track of lock resources
  • All these components are in-kernel.
  • It also includes an in-memory file system, dlmfs, to allow userspace to access the in-kernel dlm
  • /etc/ocfs2/cluster.conf, /etc/sysconfig/o2cb, /sys/kernel/config/cluster
  • https://oss.oracle.com/projects/ocfs2-tools/dist/documentation/v1.4/o2cb.html

Table 6.1 Cluster services

Service Description
02net The o2net process creates TCP/IP intra-cluster node communication channels on port 7777 and sends regular keep-alive packages to each node in the cluster to validate if the nodes are alive. The intra-cluster node communication uses the network with the Cluster Heartbeat role. By default, this is the Server Management network. You can however create a separate network for this function. See Section 5.2, “Network Usage” for information about the Cluster Heartbeat role. Make sure the firewall on each Oracle VM Server in the cluster allows network traffic on the heartbeat network. By default, the firewall is disabled on Oracle VM Servers after installation.
o2hb-diskid The server pool cluster also employs a disk heartbeat check. The o2hb process is responsible for the global disk heartbeat component of cluster. The heartbeat feature uses a file in the hidden region of the server pool file system. Each pool member writes to its own block of this region every two seconds, indicating it is alive. It also reads the region to maintain a map of live nodes. If a server pool member's block is no longer updated, the Oracle VM Server is considered dead. If an Oracle VM Server dies, the Oracle VM Server is fenced. Fencing forcefully removes dead members from the server pool to make sure active pool members are not obstructed from accessing the fenced Oracle VM Server's resources.
o2cb The o2cb service is central to cluster operations. When an Oracle VM Server boots, the o2cb service starts automatically. This service must be up for the mount of shared repositories to succeed.
ocfs2 The ocfs2 service is responsible for the file system operations. This service also starts automatically.
ocfs2_dlm and ocfs2_dlmfs The DLM modules (ocfs2_dlm, ocfs2_dlmfs) and processes (user_dlm, dlm_thread, dlm_wq, dlm_reco_thread, and so on) are part of the Distributed Lock Manager.

OCFS2 uses a DLM to track and manage locks on resources across the cluster. It is called distributed because each Oracle VM Server in the cluster only maintains lock information for the resources it is interested in. If an Oracle VM Server dies while holding locks for resources in the cluster, for example, a lock on a virtual machine, the remaining Oracle VM Servers in the server pool gather information to reconstruct the lock state maintained by the dead Oracle VM Server.

PS:

Here is more about ocfs2 and o2cb http://docs.oracle.com/cd/E37670_01/E37355/html/ol_ocfs2.html

vsphere esxi tips

July 1st, 2013 Comments off
vicfg-<esxcfg- deprecated> and other vCLI commands, include ESXCLI<from the server with vCLI package installed OR from the vMA virtual machine OR through vcenter server<-vihost parameter>>
esxcli<better use vCLI or PowerCLI instead. directly from esxi shell<console> OR from the server with vCLI package installed OR from the vMA virtual machine OR from vsphere PowerCLI prompt by using Get-EsxCli> OR through vcenter server<-vihost parameter>
localcli <localcli commands are equivalent to ESXCLI commands, but bypass hostd. The localcli commands are only for situations when hostd is unavailable and cannot be restarted. After you run a localcli command, you must restart hostd. Run ESXCLI commands after the restart. If you use a localcli command in other situations, an inconsistent system state and potential failure can result.>
PowerCLI cmdlets<windows powershell>
Some examples:
vicfg-hostops <conn_options> --operation shutdown --force
vicfg-hostops <conn_options> --operation shutdown --cluster <my_cluster>
vmware-cmd --config esxhome.cfg -l
vmware-cmd --config esxhome.cfg '/vmfs/volumes/505f5efb-38f8b83f-e1ce-1c6f65d2477b/OracleLinux/OracleLinux.vmx' getuptime
esxcli [options] {namespace}+ {cmd} [cmd options]
esxcli --config esxhome.cfg network ip interface list
esxcli --config esxhome.cfg fcoe adapter list
esxcli --config esxhome.cfg storage nfs add -H <hostname> -s <sharepoint> -v <volumename>
esxcli --config esxhome.cfg --formatter=csv network ip interface list
esxcli --config esxhome.cfg --reason <reason> system shutdown poweroff <must be in maintenance mode>
esxcli --config esxhome.cfg --reason <reason> system shutdown reboot
esxcli <conn_options> system maintenanceMode set --enable true
~ # esxcli vm process list
UCF-ZFS001
World ID: 35425
Process ID: 0
VMX Cartel ID: 35356
UUID: 42 29 c5 ae 06 c7 19 f2-1e 85 88 eb 3f 19 6f 65
Display Name: UCF-ZFS001
Config File: /vmfs/volumes/5739ec95-8876d0ed-193d-0010e03ca4e8/UCF-ZFS001/UCF-ZFS001.vmx
~ # vim-cmd vmsvc/getallvms
Vmid Name File Guest OS Version Annotation
2 UCF-ZFS001 [hyper01] UCF-ZFS001/UCF-ZFS001.vmx solaris11_64Guest vmx-10 Oracle

enable vm virtualization support in esxi

June 24th, 2013 Comments off

If you want to enable your newly created VM's virtualization support, you can follow these steps:

  1. In Vm setting -> Options -> CPU/MMU Virtualization, select either the third for forth checkbox:enable_virtualization
  2. Go to esxi console, locate your VM's vmx configuration file(under /vmfs/volumes/Datastore/Nimbula_Node05 in my case), and add a line:

vhv.enable = TRUE

After these steps, your vm should now support nested virtualization. You can run egrep '(vmx|svm)' --color=always /proc/cpuinfo to confirm whether virtualization is enabled or not now.

cpu usage in xen vm – using xentop

June 7th, 2013 Comments off

To check how much cpu one vm is consuming, we can use xentop for this analyzing:

[test@test ~]# xentop -b -i 2 -d 1
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR SSID
11572_test_0106_us_oracle_com --b--- 8412 0.0 34603008 34.4 34603008 34.4 6 2 196796 1111779 2 90 37651 3174172 0
16026_test_0093_us_oracle_com --b--- 4255 0.0 1048576 1.0 1048576 1.0 2 2 2092803 2914101 3 851 49446 1918010 0
16051_test_0094_us_oracle_com -----r 3636909 0.0 56623104 56.3 56623104 56.3 24 2 1553871 970055 2 417 101921 10195220 0
Domain-0 -----r 36197 0.0 2621440 2.6 no limit n/a 24 0 0 0 0 0 0 0 0
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR SSID
11572_test_0106_us_oracle_com --b--- 8412 0.1 34603008 34.4 34603008 34.4 6 2 196796 1111780 2 90 37651 3174172 0
16026_test_0093_us_oracle_com --b--- 4255 0.1 1048576 1.0 1048576 1.0 2 2 2092803 2914102 3 851 49446 1918015 0
16051_test_0094_us_oracle_com -----r 3636933 2396.8 56623104 56.3 56623104 56.3 24 2 1553895 970090 2 417 101921 10195220 0
Domain-0 -----r 36197 2.7 2621440 2.6 no limit n/a 24 0 0 0 0 0 0 0 0

So we can see that for vm '16051_test_0094_us_oracle_com', it has 24 vcpus, but the CPU(%) has reached 2396.8. We can calculate from 2396.8/24, that's almost 100% usage of all the vcpus. So we can see that this vm is quite busy.

For network traffic, during two seconds, we can see NETTX(k) increased by 24k, and NETRX(k) increased by 35k.

howto about xen vm live migration from standalone Oracle Virtual Server(OVS) to Oracle VM Manager

May 24th, 2013 Comments off
  • PRECHECK
    1. standalone OVS(source server) and OVS managed by Oracle VM Manager(destination server) must be the same type of machine, we can use command dmidecode |grep 'Product Name' to confirm.
    2. make sure xend relocation server has been configured and is running, run the following commands to confirm:

grep xend-relocation /etc/xen/xend-config.sxp |grep -v '#'
(xend-relocation-server yes)
(xend-relocation-ssl-server yes)
(xend-relocation-port 8002)
(xend-relocation-server-ssl-key-file /etc/ovs-agent/cert/key.pem)
(xend-relocation-server-ssl-cert-file /etc/ovs-agent/cert/certificate.pem)
(xend-relocation-address '')

lsof -i :8002
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
xend 8372 root 5u IPv4 17979 TCP *:teradataordbms (LISTEN)

3.make sure ports are open between source and destination servers, run telnet <server_name> 8002 to confirm

4.Make sure source & destination servers are in the same subnet

  • CREATE STORAGE REPO

To live migrate xen vm, the source & destination servers should have one NFS mounted. In Oracle VM, we can fulfill this by creating another storage repo for the current server pool.
The steps for creating storgage repo:
First, make sure the NFS share are writable to OVSes managed by Oracle VM Manager;
Second, run "/opt/ovs-agent-2.3/utils/repos.py -n <NFS share>" on master OVS;
Third, run "/opt/ovs-agent-2.3/utils/repos.py -i" on master OVS to make the storage repo seen by all OVSes managed by Oracle VM Manager;

  • CREATE SYMBOLIC LINK

For live migration, the mount directories of the NFS share must be the same on source & destination OVS. But as the mount directory is automatically created by Oracle VM when creating the storage repo, so we must create symbolic link on destination OVS.
Assuming we have xen VM configuration on source OVS like the following:


disk = ['file:/repo_standalone/testvm/System.img,xvda,w']

Then we'll link storage repo dir to /repo_standalone:

cd /
ln -s /var/ovs/mount/<uuid> /repo_standalone

  • LIVE MIGRATE

Now on source OVS, let's do the migration to destination OVS which has enough free memory

time xm migrate -l <vm> <destination OVS>

  • IMPORT IMAGE

After the VM live migrated to destination OVS, we'll need import the migrated VM to Oracle VM Manager. We'll create another soft link under running_pool so that Oracle VM Manager can see the image:

cd /var/ovs/mount/<uuid>
ln -s /var/ovs/mount/<uuid>/<vm> .

After this, open GUI of Oracle VM Manager and then import & approve the system image.

PS:
You don't need change VM configuration file(vm.cfg) manually, as after image imported to Oracle VM Manager the configuration file will be changed automatically by Oracle VM.

vmware vsphere esxi vicfg esxcli localcli PowerCLI

May 21st, 2013 Comments off
vicfg-<esxcfg- deprecated> and other vCLI commands, include ESXCLI<from the server with vCLI package installed OR from the vMA virtual machine OR through vcenter server<-vihost parameter>>
esxcli<better use vCLI or PowerCLI instead. directly from esxi shell<console> OR from the server with vCLI package installed OR from the vMA virtual machine OR from vsphere PowerCLI prompt by using Get-EsxCli> OR through vcenter server<-vihost parameter>
localcli <localcli commands are equivalent to ESXCLI commands, but bypass hostd. The localcli commands are only for situations when hostd is unavailable and cannot be restarted. After you run a localcli command, you must restart hostd. Run ESXCLI commands after the restart. If you use a localcli command in other situations, an inconsistent system state and potential failure can result.>
PowerCLI cmdlets<windows powershell>
Some examples:
vicfg-hostops <conn_options> --operation shutdown --force
vicfg-hostops <conn_options> --operation shutdown --cluster <my_cluster>
vmware-cmd --config esxhome.cfg -l
vmware-cmd --config esxhome.cfg '/vmfs/volumes/505f5efb-38f8b83f-e1ce-1c6f65d2477b/OracleLinux/OracleLinux.vmx' getuptime
esxcli [options] {namespace}+ {cmd} [cmd options]
esxcli --config esxhome.cfg network ip interface list
esxcli --config esxhome.cfg fcoe adapter list
esxcli --config esxhome.cfg storage nfs add -H <hostname> -s <sharepoint> -v <volumename>
esxcli --config esxhome.cfg --formatter=csv network ip interface list
esxcli --config esxhome.cfg --reason <reason> system shutdown poweroff <must be in maintenance mode>
esxcli --config esxhome.cfg --reason <reason> system shutdown reboot
esxcli <conn_options> system maintenanceMode set --enable true

oracle ocfs2 cluster filesystem best practise

May 21st, 2013 Comments off
  • To check current settings of o2cb, check files under /sys/kernel/config/cluster/ocfs2/
  • To set new value for o2cb:

service o2cb unload
service o2cb configure

heartbeat dead threshold 151 #Iterations before a node is considered dead
network idle timeout 120000 #Time in ms before a network connection is considered dead
network keepalive delay 5000 #Max time in ms before a keepalive packet is sent
network reconnect delay 5000 #Min time in ms between connection attempts

service o2cb load

service o2cb status #will show new configuration if OVS in server pool; or it will show offline

PS:

o2cb - Default cluster stack for the OCFS2 file system, it includes
  • a node manager (o2nm) to keep track of the nodes in the cluster,
  • a heartbeat agent (o2hb) to detect live nodes
  • a network agent (o2net) for intra-cluster node communication
  • a distributed lock manager (o2dlm) to keep track of lock resources
  • All these components are in-kernel.
  • It also includes an in-memory file system, dlmfs, to allow userspace to access the in-kernel dlm
  • main conf files: /etc/ocfs2/cluster.conf, /etc/sysconfig/o2cb
  • more info here https://oss.oracle.com/projects/ocfs2-tools/dist/documentation/v1.4/o2cb.html

SaaS, PaaS, IaaS cloud differences in three illustrations

May 21st, 2013 Comments off
SaaS

SaaS

PaaS

PaaS

 

IaaS

IaaS

Categories: Clouding, IT Architecture Tags: , ,

resolved – change xen vm root password

May 21st, 2013 Comments off

You can change Virtual Machine(xen) root password with following ways:

losetup -f #to check the next usable loop device
#vgs;pvs #if LVM is implemented in Virtual Machine
losetup <output of losetup -f> System.img #associate loop devices with regular file System.img. Read/Write to /dev/loop<x> will be redirected to System.img
fdisk -l /dev/loop0

  • If there're multiple partitions:

kpartx -av /dev/loop0
#vgchange -a y <VGroup> #may need run vgscan first
#vgs;pvs
#mount /dev/mapper/<vg name>-<lv name> /mnt
mount -t ext3 /dev/mapper/<partition name of /etc> /mnt

  • If there's only one root partition:

#vgchange -a y <VGroup>

mount /dev/loop0 /mnt

After mounting, you can change root password now:

vi /mnt/etc/rc.local #echo password | passwd --stdin root
sync;sync;sync
umount /mnt
#vgchange -a n <VGroup>
kpartx -d /dev/loop0
losetup  -d /dev/loop0
vi /etc/rc.local #comment out "echo test| passwd --stdin root"

After all these steps, boot up the VM using xm create vm.cfg, and you'll find password for root has been changed.