lvm volume resize by extending virtual disk image file

Below should be ran from Dom0 hosting DomU:

[root@Dom0 ~]# virt-filesystems -a System.img
/dev/sda1
/dev/vg01/lv_root

[root@Dom0 ~]# virt-filesystems --long --parts --blkdevs -h -a System.img
Name Type MBR Size Parent
/dev/sda1 partition 83 500M /dev/sda
/dev/sda2 partition 8e 12G /dev/sda
/dev/sda device - 12G -

[root@Dom0 ~]# truncate -s 20G System_new.img

[root@Dom0 ~]# virt-resize --expand /dev/sda2 System.img System_new.img

[root@Dom0 ~]# mv System.img System.img.bak;mv System_new.img System.img

[root@Dom0 ~]# xm create vm.cfg -c #the first run may get issue "device cannot be connected", you can just run it again, the issue should be gone

Below should be ran from DomU:

[root@DomU ~]# vgs
VG #PV #LV #SN Attr VSize VFree
vg01 1 2 0 wz--n- 20.51g 8.00g

[root@DomU ~]# lvextend -L +8g /dev/mapper/vg01-lv_root
[root@DomU ~]# resize2fs /dev/mapper/vg01-lv_root

[root@DomU ~]# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg01-lv_root
20G 10G 10G 50% /

resolved - Error: Unable to connect to xend: Connection reset by peer. Is xend running?

Today I met some issue when trying to run xm commands on a XEN server:

[root@xenhost1 ~]# xm list
Error: Unable to connect to xend: Connection reset by peer. Is xend running?

I had a check, and found xend was actually running:

[root@xenhost1 ~]# /etc/init.d/xend status
xend daemon running (pid 8329)

After some debugging, I found it's caused by libvirtd & xend corrupted. And then I did a bounce of them:

[root@xenhost1 ~]# /etc/init.d/libvirtd restart
Stopping libvirtd daemon: [ OK ]
Starting libvirtd daemon: [ OK ]

[root@xenhost1 ~]# /etc/init.d/xend restart #this may not be needed 
restarting xend...
xend daemon running (pid 19684)

Later, the xm commands went good.

PS:

  • If you met issue like "resolved - xend error: (98, 'Address already in use')" when restart xend or "can't connect: (111, 'Connection refused')" when doing xm live migrate, then you can refer this article.
  • For more information about libvirt, you can check here.

 

resolved - switching from Unbreakable Enterprise Kernel Release 2(UEKR2) to UEKR3 on Oracle Linux 6

As we can see from here, the available kernels include the following 3 for Oracle Linux 6:

3.8.13 Unbreakable Enterprise Kernel Release 3 (x86_64 only)
2.6.39 Unbreakable Enterprise Kernel Release 2**
2.6.32 (Red Hat compatible kernel)

On one of our OEL6 VM, we found that it's using UEKR2:

[root@testbox aime]# cat /etc/issue
Oracle Linux Server release 6.4
Kernel \r on an \m

[root@testbox aime]# uname -r
2.6.39-400.211.1.el6uek.x86_64

So how can we switch the kernel to UEKR3(3.8)?

If your linux version is 6.4, first do a "yum update -y" to upgrade to 6.5 and uppper, and then reboot the host, and follow steps below.

[root@testbox aime]# ls -l /etc/grub.conf
lrwxrwxrwx. 1 root root 22 Aug 21 18:24 /etc/grub.conf -> ../boot/grub/grub.conf

[root@testbox aime]# yum update -y

If your linux version is 6.5 and upper, you'll find /etc/grub.conf and /boot/grub/grub.conf are different files(for yum update one. If your host is OEL6.5 when installed, then /etc/grub.conf should be softlink too):

[root@testbox ~]# ls -l /etc/grub.conf
-rw------- 1 root root 2356 Oct 20 05:26 /etc/grub.conf

[root@testbox ~]# ls -l /boot/grub/grub.conf
-rw------- 1 root root 1585 Nov 23 21:46 /boot/grub/grub.conf

In /etc/grub.conf, you'll see entry like below:

title Oracle Linux Server Unbreakable Enterprise Kernel (3.8.13-44.1.3.el6uek.x86_64)
root (hd0,0)
kernel /vmlinuz-3.8.13-44.1.3.el6uek.x86_64 ro root=/dev/mapper/vg01-lv_root rd_LVM_LV=vg01/lv_root rd_NO_LUKS rd_LVM_LV=vg01/lv_swap LANG=en_US.UTF-8 KEYTABLE=us console=hvc0 rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_NO_DM rhgb quiet
initrd /initramfs-3.8.13-44.1.3.el6uek.x86_64.img

What you'll need to do is just copying the entries above from /etc/grub.conf to /boot/grub/grub.conf(make sure /boot/grub/grub.conf not be a softlink, else you may met error "Boot loader didn't return any data"), and then reboot the VM.

After rebooting, you'll find the kernel is now at UEKR3(3.8).

PS:

If you find the VM is OEL6.5 and /etc/grub.conf is a softlink to /boot/grub/grub.conf, then you could do the following to upgrade kernel to UEKR3:

1. add the following lines to /etc/yum.repos.d/public-yum-ol6.repo:

[public_ol6_UEKR3]
name=UEKR3 for Oracle Linux 6 ($basearch)
baseurl=http://public-yum.oracle.com/repo/OracleLinux/OL6/UEKR3/latest/$basearch/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1

2. List and install UEKR3:

[root@testbox aime]# yum list|grep kernel-uek|grep public_ol6_UEKR3
kernel-uek.x86_64 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-debug.x86_64 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-debug-devel.x86_64 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-devel.x86_64 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-doc.noarch 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-firmware.noarch 3.8.13-44.1.5.el6uek public_ol6_UEKR3
kernel-uek-headers.x86_64 3.8.13-26.2.4.el6uek public_ol6_UEKR3

[root@testbox aime]# yum install -y kernel-uek* --disablerepo=* --enablerepo=public_ol6_UEKR3

3. Reboot

resolved - Exception: ha_check_cpu_compatibility failed:

Today when I tried to add one OVS server in OVMM pool, the following error messages prompted:

2014-11-12 06:25:18.083 WARNING failed:errcode=00000, errmsg=Unexpected error: <Exception: ha_check_cpu_compatibility
failed:<Exception: CPU not compatible! {'new_ovs_03': 'vendor_id=GenuineIntel;cpu_family=6;model=45', 'slce27vmf1002': 'vendor_id=GenuineIntel;cpu_family=6;model=44', 'new_ovs_03': 'vendor_id=GenuineIntel;cpu_family=6;model=45'}>

StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteHA.py", line 248, in ha_check_cpu_compatibility
raise Exception("CPU not compatible! %s" % repr(d))
>

StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteCluster.py", line 609, in cluster_check_prerequisite
raise Exception(msg)

StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteCluster.py", line 646, in _cluster_setup
#_check(ret)
File "/opt/ovs-agent-2.3/OVSXCluster.py", line 340, in _check
raise OVSException(error=ret["error"])

2014-11-12 06:25:18.083 NOTIFICATION Failed setup cluster for agent 2.2.0...
2014-11-12 06:25:18.083 ERROR Cluster Setup when adding server
2014-11-12 06:25:18.087 ERROR [Server Pool Management][Server Pool][test_serverpool]:During adding servers ([new_ovs_03]) to server pool (test_serverpool), Cluster setup failed: (OVM-1011 OVM Manager communication with new_ovs_03 for operation HA Setup for Oracle VM Agent 2.2.0 failed:
errcode=00000, errmsg=Unexpected error: <Exception: ha_check_cpu_compatibility
failed:<Exception: CPU not compatible! {'new_ovs_03': 'vendor_id=GenuineIntel;cpu_family=6;model=45', 'slce27vmf1002': 'vendor_id=GenuineIntel;cpu_family=6;model=44', 'new_ovs_03': 'vendor_id=GenuineIntel;cpu_family=6;model=45'}>

)

As stated in the error message, the adding failed at cpu check. To resolve this, we can comment out the code where cpu check occurred.

File /opt/ovs-agent-2.3/OVSSiteCluster.py in line 646 on each OVS server in the server pool:

#ret = cluster_check_prerequisite(ha_enable=ha_enable)
#_check(ret)

Then bounce ovs-agent on each OVS server, and try add again. Please note that this WA will make live migration between VMs not possible(actually it's cpu of different arch that makes live migration not possible).

Resolved - AttributeError: 'NoneType' object has no attribute '_imgName'

Today when I tried to list Virtual Machines on one Oracle OVMM, error message prompted:

[root@ovmm_test ~]# ovm -uadmin -ppassword vm ls
Traceback (most recent call last):
  File "/usr/bin/ovm", line 43, in ?
    ovmcli.ovmmain.main(sys.argv[1:])
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmmain.py", line 122, in main
    return ovm.ovmcli.runcmd(args)
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmcli.py", line 147, in runcmd
    return method(options)
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmcli.py", line 1578, in do_vm_ls
    result.append((serverpool._serverPoolName, vm._imgName))
AttributeError: 'NoneType' object has no attribute '_imgName'

Then I tried list VMs by server pool:

[root@ovmm_test ~]# ovm -uadmin -ppassword vm ls -s Pool1_test
Name                 Size(MB) Mem   VCPUs Status  Server_Pool
testvm1              17750    8196  4     Running Pool1_test
testvm2               50518    8196  4     Running Pool1_test
testvm3          19546    8192  2     Running Pool1_test
testvm4          50518    20929 4     Running Pool1_test
testvm5          19546    8192  2     Running Pool1_test
[root@ovmm_test ~]# ovm -uadmin -ppassword vm ls -s Pool1_test_A
Traceback (most recent call last):
  File "/usr/bin/ovm", line 43, in ?
    ovmcli.ovmmain.main(sys.argv[1:])
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmmain.py", line 122, in main
    return ovm.ovmcli.runcmd(args)
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmcli.py", line 147, in runcmd
    return method(options)
  File "/usr/lib/python2.4/site-packages/ovmcli/ovmcli.py", line 1578, in do_vm_ls
    result.append((serverpool._serverPoolName, vm._imgName))
AttributeError: 'NoneType' object has no attribute '_imgName'

One pool was working and the other was not, so the problematic VMs must reside in pool Pool1_test_A.

Another symptom was that, although ovmcli wouldn't work, the OVMM GUI worked as expected and returns all the VMs.

As ovmcli read entries from Oracle DB(SID XE) on OVMM, so the issue maybe caused by the inconsistency between DB & OVMM agent DB.

I got the list of all VMs on the problematic server pool from OVMM GUI, and then ran the following query to get all entries in DB:

select IMG_NAME from OVS_VM_IMG where SITE_ID=110 and STATUS !='Running' and length(IMG_NAME)>50; #Pool1_test_A was with ID 110, got from table OVS_SITE. I used length() here because in the problematic server pool, VMs all should have IMG_NAME with more than 50 characters; if less than 50, then they were VM templates which should have no issue

Comparing the output from OVMM GUI & OVM DB, I found some entries which only existed in DB. And for all of these entries, they all had "Status" in "Creating", and also the

select IMG_NAME from OVS_VM_IMG where STATUS='Creating';

Then I removed these weird entries:

delete from OVS_VM_IMG where SITE_ID=110 and STATUS !='Running' and length(IMG_NAME)>50; #you can try rename/drop table OVS_VM_IMG(alter table TBL1 rename to TBL2; drop table TBL1), remove entries in backup table(OVS_VM_IMG_bak20141106), and then rename backup table(OVS_VM_IMG_bak20141106) to OVS_VM_IMG if failed at this step caused by foreign key or other reasons (don't do this, it will cause problem about constraints etc)

After this, the issue got resolved.

PS:

  1. After you removed entries with STATUS being "Creating", and if you found some more entries of this kind occurred in OVM DB, then maybe it's caused by VM templates not working or DB table corrupted. In this case, you'll need recover OVMM by rollong back to previous version of your backup, and then import VM templates/VM images etc.
  2. Actually the issue was caused by breaking constraint(constraint name conflicts caused by table backups. So better not to backup table when doing operation against OVMM DB using sql directly). This issue can be resolved by alter constraint name, and later remove backup tables.

resolved - IOError: [Errno 2] No such file or directory when creating VMs on Oracle VM Server

Today when I tried to add one OVS server to Oracle VM Server server pool, there was error message like below:

Start - /OVS/running_pool/vm_test
PowerOn Failed : Result - failed:<Exception: return=>failed:<Exception: failed:<IOError: [Errno 2] No such file or directory: '/var/ovs/mount/85255944BDF24F62831E1C6E7101CF7A/running_pool/vm_test/vm.cfg'>

I log on one OVS server and found the path was there. And later I logged on all OVS servers in that server pool and found one OVS server did not have storage repo. So I removed that OVS server from pool and tried to added it back to pool and want to create the VM again. But this time, the following error messages prompted when I tried to add OVS server back:

2014-08-21 02:52:52.962 WARNING failed:errcode=50006, errmsg=Do 'clusterm_init_root_sr' on servers ('testhost1') failed.
StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteCluster.py", line 651, in _cluster_setup
_check(ret)
File "/opt/ovs-agent-2.3/OVSXCluster.py", line 340, in _check
raise OVSException(error=ret["error"])

2014-08-21 02:52:52.962 NOTIFICATION Failed setup cluster for agent 2.2.0...
2014-08-21 02:52:52.963 ERROR Cluster Setup when adding server
2014-08-21 02:52:52.970 ERROR [Server Pool Management][Server Pool][test_pool]:During adding servers ([testhost1]) to server pool (test_pool), Cluster setup failed: (OVM-1011 OVM Manager communication with host_master for operation HA Setup for Oracle VM Agent 2.2.0 failed:
errcode=50006, errmsg=Do 'clusterm_init_root_sr' on servers ('testhost1') failed.

From here, I realized that this error was caused by storage repo could not created on that OVS server testhost1. So I logged on testhost1 for a check. As the storage repo was one NFS share, so I tried do a showmount -e <nfs server>, and found it's not working. And then I tried to check the tracert to <nfs server>, and it's not going through.

From another host, showmount -e <nfs server> worked. So the problem was on OVS server testhost1. After more debugging, I found that one NIC was on the host but not pingable. Later I had a check of the switch, and found the NIC was unplugged. I plugged in the NIC and tried again with adding back OVS server, creating VM, and all went smoothly.

PS:

Suppose you want to know the NFS clients which mount one share from the NFS server, then on any client that has access to the NFS server, do the following:

[root@centos-doxer ~]# showmount -a nfs-server.example.com|grep to_be
10.182.120.188:/export/IDM_BR/share01_to_be_removed
test02.example:/export/IDM_BR/share01_to_be_removed

-a or --all

List both the client hostname or IP address and mounted directory in host:dir format. This info should not be considered reliable.

PPS:

If you met below error when trying to poweron VM:

2016-08-19 02:58:28.474 NOTIFICATION [PowerOn][testvm]: Start - /OVS/running_pool/testvm
2016-08-19 02:58:39.802 NOTIFICATION [PowerOn][testvm]: Result - failed:<error: (113, 'No route to host')>
StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteVM.py", line 168, in start_vm
raise e
2016-08-19 02:58:39.802 WARNING [PowerOn][testvm]: Result - failed:<error: (113, 'No route to host')>
StackTrace:
File "/opt/ovs-agent-2.3/OVSSiteVM.py", line 168, in start_vm
raise e

Or below error when trying to import template:

Register an empty virtual machine

register the virtual machine genernal information

get the virtual machine detail information...

Invalid virtual machine type(HVM/PVM).

Then the cause should be the existence of problematic OVS server. You should check ocfs2.conf or remove maintenance OVS server then have a test.

resolved - Kernel panic - not syncing: Attempted to kill init

Today when I tried to poweron one VM hosted on XEN server, the following error messages prompted:

Write protecting the kernel read-only data: 6784k
Kernel panic - not syncing: Attempted to kill init! [failed one]
Pid: 1, comm: init Not tainted 2.6.32-300.29.1.el5uek #1
Call Trace:
[<ffffffff810579a2>] panic+0xa5/0x162
[<ffffffff8109b997>] ? atomic_add_unless+0x2e/0x47
[<ffffffff8109bdf9>] ? __put_css_set+0x29/0x179
[<ffffffff8145744c>] ? _write_lock_irq+0x10/0x20
[<ffffffff81062a65>] ? exit_ptrace+0xa7/0x118
[<ffffffff8105b076>] do_exit+0x7e/0x699
[<ffffffff8105b731>] sys_exit_group+0x0/0x1b
[<ffffffff8105b748>] sys_exit_group+0x17/0x1b
[<ffffffff81011db2>] system_call_fastpath+0x16/0x1b

This is quite weird as it's ok yesterday:

Write protecting the kernel read-only data: 6784k
blkfront: xvda: barriers enabled (tag) [normal one]
xvda: detected capacity change from 0 to 15126289920
xvda: xvda1 xvda2 xvda3
blkfront: xvdb: barriers enabled (tag)
xvdb: detected capacity change from 0 to 16777216000
xvdb: xvdb1
Setting capacity to 32768000
xvdb: detected capacity change from 0 to 16777216000
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: Disabled at runtime.
type=1404 audit(1406281405.511:2): selinux=0 auid=4294967295 ses=4294967295

After some checking, I found that this OVS server was hosting more than 40 VMs, and the VCPUs was tight. So I turned off some unused VMs and then issue resolved.

Resolved - Your boot partition is on a disk using the GPT partitioning scheme but this machine cannot boot using GPT

Today when I tried to install Oracle VM Server on one server, the following error occurred:

Your boot partition is on a disk using the GPT partitioning scheme but this machine cannot boot using GPT. This can happen if there is not enough space on your hard drive(s) for the installation.

So to went on with the installation, I had to think of a way to erase GPT partition table on the drive.

To do this, the first step is to fall into linux rescue mode when booting from CDROM(another way is when installing OVS, Use Alt-F2 to access a different terminal screen to the installer. Use fdisk from the command line to manually repartition the disk using a dos partition table.):

rescue

Later, check with fdisk -l, I could see that /dev/sda was the only disk that needed erasing GPT label. So I used dd if=/dev/zero of=/dev/sda bs=512 count=1 to erase GPT table:

 

fdisk_dd

 

After this, run fdisk -l again, I saw that the partition table was now gone:

fdisk_dd_2

Later, re-initializing installation of OVS server. When the following message prompted, select "No":

select_no

And select "yes" when below message prompted so that we can make new partition table:

select_yes

The steps after this was normal ones, and the installation went smoothly.

PS:

  • You can disable internal USB device by going to BIOS:

To disable them not participating into system disks when doing system install, you can do below:

OEL5

OEL6/7

  • If the disk is more than 2T, then there's no way to soft convert from GPT to MBR, so you'll need decrease the disk size from BIOS booting process. Here's an example of using LSI Raid Controller MegaRAID BIOS Config Utility Drive 252 to reconfig disks from Oracle iLOM GUI console(you can check here for MegaCLI). (tips - the first sector contains MBR<446 byptes> and partition table<64 byptes>, 3 main partitions and 1 extend partition. The extend partition can have many logical partitions)

Alt + A to enable/disable shortcut select.

When short cut disabled - TAB to move between items, Enter will work as expected when shortcut disabled.

When shortcut enabled - Space key acts as Enter when shortcut enabled.

If you want to go into BIOS, press F2.

Press Ctrl + H to goto MegaRaid WebBIOS(in newer version, press Ctrl + R). Make sure progress is 100% for newly created VD.

屏幕快照 2016-04-11 17.37.27

2-disks

3-clear configuration

4-add configuration

5-manual configuration

6-partition

7-raid-hole

8-raid-1

9-raid-1

10-another-one

11-set-boot

12-raid5

13-overview-1

14-overview-2

Below is for new version(use small strip size for DB):

webBIOS-megaraid

 

Resolved - failed Exception check srv hostname/IP failedException Invalid hostname/IP configuration ocfs2 config failed Obsolete nodes found

Today when I tried to add two OVS servers into one server pool, errors were met. The first one was like below:

2014-06-03 04:26:08.965 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:09.485 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:09.497 NOTIFICATION [Server Pool Management][Server][hostname1.example.com]:Check agent (hostname1.example.com) connectivity.
2014-06-03 04:26:12.463 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:12.985 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:12.997 NOTIFICATION [Server Pool Management][Server][hostname1.example.com]:Check agent (hostname1.example.com) connectivity.
2014-06-03 04:26:13.004 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:13.522 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:13.535 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:13.980 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:16.307 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:16.831 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:16.844 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:17.284 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:17.290 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:17.814 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:17.827 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:18.272 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:18.279 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:18.799 NOTIFICATION Regisering server:hostname1.example.com...
2014-06-03 04:26:21.749 NOTIFICATION Register Server: hostname1.example.com success
2014-06-03 04:26:21.751 NOTIFICATION Getting host info for server:hostname1.example.com ...
2014-06-03 04:26:23.894 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) starting.
failed:<Exception: check srv('hostname1.example.com') hostname/IP failed! => <Exception: Invalid hostname/IP configuration: hostname=hostname1;ip=10.200.225.127>
2014-06-03 04:26:33.348 ERROR [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:During adding servers ([hostname1.example.com]) to server pool (DC1_DMZ_Service_Mid), Cluster setup failed: (OVM-1011 OVM Manager communication with materhost for operation HA Setup for Oracle VM Agent 2.2.0 failed:
failed:<Exception: check srv('hostname1.example.com') hostname/IP failed! => <Exception: Invalid hostname/IP configuration: hostname=hostname1;ip=10.200.225.127>

Also there's error message like below:

2014-06-03 04:59:11.003 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:11.524 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:11.536 NOTIFICATION [Server Pool Management][Server][hostname1-fe.example.com]:Check agent (hostname1-fe.example.com) connectivity.
2014-06-03 04:59:15.484 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:16.005 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:16.016 NOTIFICATION [Server Pool Management][Server][hostname1-fe.example.com]:Check agent (hostname1-fe.example.com) connectivity.
2014-06-03 04:59:16.025 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:16.546 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:16.559 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:17.014 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:18.950 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:19.470 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:19.483 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:19.926 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:19.955 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:20.476 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:20.490 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:20.943 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:20.947 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:21.471 NOTIFICATION Regisering server:hostname1-fe.example.com...
2014-06-03 04:59:24.439 NOTIFICATION Register Server: hostname1-fe.example.com success
2014-06-03 04:59:24.439 NOTIFICATION Getting host info for server:hostname1-fe.example.com ...
2014-06-03 04:59:26.577 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) starting.
failed:<Exception: check srv('hostname1-fe.example.com') ocfs2 config failed! => <Exception: Obsolete nodes found: >
2014-06-03 04:59:37.100 ERROR [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:During adding servers ([hostname1-fe.example.com]) to server pool (DC1_DMZ_Service_Mid), Cluster setup failed: (OVM-1011 OVM Manager communication with materhost for operation HA Setup for Oracle VM Agent 2.2.0 failed:
failed:<Exception: check srv('hostname1-fe.example.com') ocfs2 config failed! => <Exception: Obsolete nodes found: >

Then I was confused about "Obsolete nodes found" it complained. I could confirm that I've removed hostname1.example.com, and even after I checked OVM DB in OVS.OVS_SERVER, there's no record about hostname1.example.com.

Then after some searching, these errors were caused by obsolete info in OCFS2(Oracle Cluster File System). We should edit file /etc/ocfs2/cluster.conf and remove obsolete entries.

-bash-3.2# vi /etc/ocfs2/cluster.conf
node:
        ip_port     = 7777
        ip_address  = 10.200.169.190
        number      = 0
        name        = hostname1
        cluster     = ocfs2

node:
        ip_port     = 7777
        ip_address  = 10.200.169.191
        number      = 1
        name        = hostname2
        cluster     = ocfs2

cluster:
        node_count  = 2
        name        = ocfs2

So hostname2 was no longer needed or the IP address of hostname2 was changed, then you should remove entries related to hostname2, and modify node_count to 1. Later bounce ocfs2/o2cb services:

service ocfs2 restart

service o2cb restart

Later, I tried add OVS server again, and it worked! (Before adding that OVS server back, we need first remove its ovs-agent db: service ovs-agent stop; mv /etc/ovs-agent/db /var/tmp/db.bak.5; service ovs-agent start, and then configure ovs-agent service ovs-agent configure. You can also use /opt/ovs-agent-2.3/utils/cleanup.py to clean up too.)

PS:

Here is more about ocfs2 and o2cb http://www.doxer.org/o2cb-for-ocfs2/

Oracle VM operations - poweron, poweroff, status, stat -r

Here's the script:
#!/usr/bin/perl
#1.OVM must be running before operations
#2.run ovm_vm_operation.pl status before running ovm_vm_operation.pl poweroff or poweron
use Net::SSH::Perl;
$host = $ARGV[0];
$operation = $ARGV[1];
$user = 'root';
$password = 'password';

$newname=$ARGV[2];
$newcpu=$ARGV[3];
$newmemory=$ARGV[4];
$newpool=$ARGV[5];
$newtmpl=$ARGV[6];
$newbridge=$ARGV[7];
$newbridge2=$ARGV[8];
$newvif='vif0';
$newvif2='VIF1';

if($host eq "help") {
print "$0 OVM-name status|poweron|poweroff|reboot|stat-r|stat-r-all|pool|new vmname 1 4096 poolname tmplname FE BE\n";
exit;
}

$ssh = Net::SSH::Perl->new($host);
$ssh->login($user,$password);

if($operation eq "status") {
($stdout,$stderr,$exit) = $ssh->cmd("ovm -uadmin -ppassword vm ls|grep -v VM_test");
open($host_fd,'>',"/var/tmp/${host}.status");
select $host_fd;
print $stdout;
close $host_fd;
} elsif($operation eq "poweroff") {
open($poweroff_fd,'<',"/var/tmp/${host}.status");
foreach(<$poweroff_fd>){
if($_ =~ "Server_Pool|OVM|Powered") {
next;
}
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+([a-zA-Z]{1,})\s+(.*)/){
$ssh->cmd("ovm -uadmin -ppassword vm poweroff -n $1 -s $6 -f");
sleep 12;
}
}
} elsif($operation eq "reboot") {
open($poweroff_fd,'<',"/var/tmp/${host}.status");
foreach(<$poweroff_fd>){
if($_ =~ "Server_Pool|OVM|Powered") {
next;
}
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+([a-zA-Z]{1,})\s+(.*)/){
$ssh->cmd("ovm -uadmin -ppassword vm reboot -n $1 -s $6");
sleep 12;
}
}
} elsif($operation eq "poweron") {
open($poweron_fd,'<',"/var/tmp/${host}.status");
foreach(<$poweron_fd>){
if($_ =~ "Server_Pool|OVM|Running|used|poweroff") {
next;
}
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+([a-zA-Z]{1,})\s+Off(.*)/){
$ssh->cmd("ovm -uadmin -ppassword vm poweron -n $1 -s $6");
#print "ovm -uadmin -ppassword vm poweron -n $1 -s $6";
sleep 15;
}
}
} elsif($operation eq "stat-r") {
open($poweroff_fd,'<',"/var/tmp/${host}.status");
foreach(<$poweroff_fd>){
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+(Shutting\sDown|Initializing|Error|Unknown|Rebooting|Deleting)\s+(.*)/){
#print "ovm -uadmin -ppassword vm stat -r -n $1 -s $6";
$ssh->cmd("ovm -uadmin -ppassword vm stat -r -n $1 -s $6");
sleep 1;
}
}
} elsif($operation eq "stat-r-all") {
open($poweroff_fd,'<',"/var/tmp/${host}.status");
foreach(<$poweroff_fd>){
$ssh->cmd("ovm -uadmin -ppassword vm stat -r -n $1 -s $6");
sleep 1;
}
} elsif($operation eq "pool") {
($stdoutp,$stderrp,$exitp) = $ssh->cmd("ovm -uadmin -ppassword svrp ls|grep Inactive");
open($host_fdp,'>',"/var/tmp/${host}-poolstatus");
select $host_fdp;
print $stdoutp;
close $host_fdp;
} elsif($operation eq "new") {
($stdoutp,$stderrp,$exitp) = $ssh->cmd("ovm -uadmin -ppassword tmpl ls -s $newpool | grep $newtmpl");
if($stdoutp =~ /$newtmpl/){
($stdoutp2,$stderrp2,$exitp2) = $ssh->cmd("ovm -uadmin -ppassword vm new -m template -s $newpool -t $newtmpl -n $newname -c password");
if($stdoutp2 =~ /is being created/){
print "Creating VM $newname in pool $newpool on OVMM $host now!"."\n";
while(1){
($stdoutp3,$stderrp3,$exitp3) = $ssh->cmd("ovm -uadmin -ppassword vm stat -n $newname -s $newpool");
if($stdoutp3 =~ /Powered Off/){
print "Done VM creation."."\n";
last;
}
sleep 300
}

print "Setting Cpu/Memory now."."\n";
($stdoutp32,$stderrp32,$exitp32) = $ssh->cmd("ovm -uadmin -ppassword vm conf -n $newname -s $newpool -x $newmemory -m $newmemory -c $newcpu -P");
sleep 2;

print "Creating NICs now."."\n";
($stdoutp4,$stderrp4,$exitp4) = $ssh->cmd("ovm -uadmin -ppassword vm nic conf -n $newname -s $newpool -N $newvif -i VIF0 -b $newbridge");
sleep 2;
($stdoutp5,$stderrp5,$exitp5) = $ssh->cmd("ovm -uadmin -ppassword vm nic add -n $newname -s $newpool -N $newvif2 -b $newbridge2");
sleep 2;

print "Powering on VM now."."\n";
($stdoutp6,$stderrp6,$exitp6) = $ssh->cmd("ovm -uadmin -ppassword vm poweron -n $newname -s $newpool");
sleep 30;

while(1){
($stdoutp7,$stderrp7,$exitp7) = $ssh->cmd("ovm -uadmin -ppassword vm info -n $newname -s $newpool");
if($stdoutp7 =~ /Running on: sl/){
print "VM is now Running, you can configure VM on hypervisor now:"."\n";
print $stdoutp7."\n";
last;
}
sleep 30;
}

#($stdoutp8,$stderrp8,$exitp8) = $ssh->cmd("ovm -uadmin -ppassword vm ls -l | grep $newname");
#print "You can configure VM on hypervisor now:"."\n";
#print $stdoutp8."\n";
} else {
print $stdoutp2."\n";
exit;
}
} else {
print "No template named $newtmpl in pool $newpool\n";
exit;
}
}

You can use the following to make the script run in parallel:

for i in <all OVMs>;do (./ovm_vm_operation.pl $i status &);done