Archive

Posts Tagged ‘troubleshooting’

Resolved – failed Exception check srv hostname/IP failedException Invalid hostname/IP configuration ocfs2 config failed Obsolete nodes found

June 3rd, 2014 Comments off

Today when I tried to add two OVS servers into one server pool, errors were met. The first one was like below:

2014-06-03 04:26:08.965 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:09.485 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:09.497 NOTIFICATION [Server Pool Management][Server][hostname1.example.com]:Check agent (hostname1.example.com) connectivity.
2014-06-03 04:26:12.463 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:12.985 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:12.997 NOTIFICATION [Server Pool Management][Server][hostname1.example.com]:Check agent (hostname1.example.com) connectivity.
2014-06-03 04:26:13.004 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:13.522 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:13.535 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:13.980 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:16.307 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:16.831 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:16.844 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:17.284 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:17.290 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:17.814 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:17.827 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:18.272 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:18.279 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:18.799 NOTIFICATION Regisering server:hostname1.example.com...
2014-06-03 04:26:21.749 NOTIFICATION Register Server: hostname1.example.com success
2014-06-03 04:26:21.751 NOTIFICATION Getting host info for server:hostname1.example.com ...
2014-06-03 04:26:23.894 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) starting.
failed:<Exception: check srv('hostname1.example.com') hostname/IP failed! => <Exception: Invalid hostname/IP configuration: hostname=hostname1;ip=10.200.225.127>
2014-06-03 04:26:33.348 ERROR [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:During adding servers ([hostname1.example.com]) to server pool (DC1_DMZ_Service_Mid), Cluster setup failed: (OVM-1011 OVM Manager communication with materhost for operation HA Setup for Oracle VM Agent 2.2.0 failed:
failed:<Exception: check srv('hostname1.example.com') hostname/IP failed! => <Exception: Invalid hostname/IP configuration: hostname=hostname1;ip=10.200.225.127>

Also there's error message like below:

2014-06-03 04:59:11.003 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:11.524 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:11.536 NOTIFICATION [Server Pool Management][Server][hostname1-fe.example.com]:Check agent (hostname1-fe.example.com) connectivity.
2014-06-03 04:59:15.484 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:16.005 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:16.016 NOTIFICATION [Server Pool Management][Server][hostname1-fe.example.com]:Check agent (hostname1-fe.example.com) connectivity.
2014-06-03 04:59:16.025 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:16.546 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:16.559 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:17.014 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:18.950 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:19.470 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:19.483 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:19.926 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:19.955 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:20.476 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:20.490 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:20.943 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:20.947 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:21.471 NOTIFICATION Regisering server:hostname1-fe.example.com...
2014-06-03 04:59:24.439 NOTIFICATION Register Server: hostname1-fe.example.com success
2014-06-03 04:59:24.439 NOTIFICATION Getting host info for server:hostname1-fe.example.com ...
2014-06-03 04:59:26.577 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) starting.
failed:<Exception: check srv('hostname1-fe.example.com') ocfs2 config failed! => <Exception: Obsolete nodes found: >
2014-06-03 04:59:37.100 ERROR [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:During adding servers ([hostname1-fe.example.com]) to server pool (DC1_DMZ_Service_Mid), Cluster setup failed: (OVM-1011 OVM Manager communication with materhost for operation HA Setup for Oracle VM Agent 2.2.0 failed:
failed:<Exception: check srv('hostname1-fe.example.com') ocfs2 config failed! => <Exception: Obsolete nodes found: >

Then I was confused about "Obsolete nodes found" it complained. I could confirm that I've removed hostname1.example.com, and even after I checked OVM DB in OVS.OVS_SERVER, there's no record about hostname1.example.com.

Then after some searching, these errors were caused by obsolete info in OCFS2(Oracle Cluster File System). We should edit file /etc/ocfs2/cluster.conf and remove obsolete entries.

-bash-3.2# vi /etc/ocfs2/cluster.conf
node:
        ip_port     = 7777
        ip_address  = 10.200.169.190
        number      = 0
        name        = hostname1
        cluster     = ocfs2

node:
        ip_port     = 7777
        ip_address  = 10.200.169.191
        number      = 1
        name        = hostname2
        cluster     = ocfs2

cluster:
        node_count  = 2
        name        = ocfs2

So hostname2 was no longer needed or the IP address of hostname2 was changed, then you should remove entries related to hostname2, and modify node_count to 1. Later bounce ocfs2/o2cb services:

service ocfs2 restart

service o2cb restart

Later, I tried add OVS server again, and it worked! (Before adding that OVS server back, we need first remove its ovs-agent db: service ovs-agent stop; mv /etc/ovs-agent/db /var/tmp/db.bak.5; service ovs-agent start, and then configure ovs-agent service ovs-agent configure. You can also use /opt/ovs-agent-2.3/utils/cleanup.py to clean up too.)

PS:

Here is more about ocfs2 and o2cb http://www.doxer.org/o2cb-for-ocfs2/

resolved – kernel panic not syncing: Fatal exception Pid: comm: not Tainted

November 13th, 2013 Comments off

We're install IDM OAM today and the linux server panic every time we run the startup script. Server panic info was like this:

Pid: 4286, comm: emdctl Not tainted 2.6.32-300.29.1.el5uek #1
Process emdctl (pid: 4286, threadinfo ffff88075bf20000, task ffff88073d0ac480)
Stack:
ffff88075bf21958 ffffffffa02b1769 ffff88075bf21948 ffff8807cdcce500
<0> ffff88075bf95cc8 ffff88075bf95ee0 ffff88075bf21998 ffffffffa01fd5c6
<0> ffffffffa02b1732 ffff8807bc2543f0 ffff88075bf95cc8 ffff8807bc2543f0
Call Trace:
[<ffffffffa02b1769>] nfs3_xdr_writeargs+0x37/0x7a [nfs]
[<ffffffffa01fd5c6>] rpcauth_wrap_req+0x7f/0x8b [sunrpc]
[<ffffffffa02b1732>] ? nfs3_xdr_writeargs+0x0/0x7a [nfs]
[<ffffffffa01f612a>] call_transmit+0x199/0x21e [sunrpc]
[<ffffffffa01fc8ba>] __rpc_execute+0x85/0x270 [sunrpc]
[<ffffffffa01fcae2>] rpc_execute+0x26/0x2a [sunrpc]
[<ffffffffa01f5546>] rpc_run_task+0x57/0x5f [sunrpc]
[<ffffffffa02abd86>] nfs_write_rpcsetup+0x20b/0x22d [nfs]
[<ffffffffa02ad1e8>] nfs_flush_one+0x97/0xc3 [nfs]
[<ffffffffa02a86b4>] nfs_pageio_doio+0x37/0x60 [nfs]
[<ffffffffa02a87c5>] nfs_pageio_complete+0xe/0x10 [nfs]
[<ffffffffa02ac264>] nfs_writepages+0xa7/0xe4 [nfs]
[<ffffffffa02ad151>] ? nfs_flush_one+0x0/0xc3 [nfs]
[<ffffffffa02acd2e>] nfs_write_mapping+0x63/0x9e [nfs]
[<ffffffff810f02fe>] ? __pmd_alloc+0x5d/0xaf
[<ffffffffa02acd9c>] nfs_wb_all+0x17/0x19 [nfs]
[<ffffffffa029f6f7>] nfs_do_fsync+0x21/0x4a [nfs]
[<ffffffffa029fc9c>] nfs_file_flush+0x67/0x70 [nfs]
[<ffffffff81117025>] filp_close+0x46/0x77
[<ffffffff81059e6b>] put_files_struct+0x7c/0xd0
[<ffffffff81059ef9>] exit_files+0x3a/0x3f
[<ffffffff8105b240>] do_exit+0x248/0x699
[<ffffffff8100e6a1>] ? xen_force_evtchn_callback+0xd/0xf
[<ffffffff8106898a>] ? freezing+0x13/0x15
[<ffffffff8105b731>] sys_exit_group+0x0/0x1b
[<ffffffff8106bd03>] get_signal_to_deliver+0x303/0x328
[<ffffffff8101120a>] do_notify_resume+0x90/0x6d7
[<ffffffff81459f06>] ? kretprobe_table_unlock+0x1c/0x1e
[<ffffffff8145ac6f>] ? kprobe_flush_task+0x71/0x7c
[<ffffffff8103164c>] ? paravirt_end_context_switch+0x17/0x31
[<ffffffff81123e8f>] ? path_put+0x22/0x27
[<ffffffff8101207e>] int_signal+0x12/0x17
Code: 55 48 89 e5 0f 1f 44 00 00 48 8b 06 0f c8 89 07 48 8b 46 08 0f c8 89 47 04 c9 48 8d 47 08 c3 55 48 89 e5 0f 1f 44 00 00 48 0f ce <48> 89 37 c9 48 8d 47 08 c3 55 48 89 e5 53 0f 1f 44 00 00 f6 06
RIP [<ffffffffa02b03c3>] xdr_encode_hyper+0xc/0x15 [nfs]
RSP <ffff88075bf21928>
---[ end trace 04ad5382f19cf8ad ]---
Kernel panic - not syncing: Fatal exception
Pid: 4286, comm: emdctl Tainted: G D 2.6.32-300.29.1.el5uek #1
Call Trace:
[<ffffffff810579a2>] panic+0xa5/0x162
[<ffffffff81450075>] ? threshold_create_device+0x242/0x2cf
[<ffffffff8100ed2f>] ? xen_restore_fl_direct_end+0x0/0x1
[<ffffffff814574b0>] ? _spin_unlock_irqrestore+0x16/0x18
[<ffffffff810580f5>] ? release_console_sem+0x194/0x19d
[<ffffffff810583be>] ? console_unblank+0x6a/0x6f
[<ffffffff8105766f>] ? print_oops_end_marker+0x23/0x25
[<ffffffff814583a6>] oops_end+0xb7/0xc7
[<ffffffff8101565d>] die+0x5a/0x63
[<ffffffff81457c7c>] do_trap+0x115/0x124
[<ffffffff81013731>] do_alignment_check+0x99/0xa2
[<ffffffff81012cb5>] alignment_check+0x25/0x30
[<ffffffffa02b03c3>] ? xdr_encode_hyper+0xc/0x15 [nfs]
[<ffffffffa02b06be>] ? xdr_encode_fhandle+0x15/0x17 [nfs]
[<ffffffffa02b1769>] nfs3_xdr_writeargs+0x37/0x7a [nfs]
[<ffffffffa01fd5c6>] rpcauth_wrap_req+0x7f/0x8b [sunrpc]
[<ffffffffa02b1732>] ? nfs3_xdr_writeargs+0x0/0x7a [nfs]
[<ffffffffa01f612a>] call_transmit+0x199/0x21e [sunrpc]
[<ffffffffa01fc8ba>] __rpc_execute+0x85/0x270 [sunrpc]
[<ffffffffa01fcae2>] rpc_execute+0x26/0x2a [sunrpc]
[<ffffffffa01f5546>] rpc_run_task+0x57/0x5f [sunrpc]
[<ffffffffa02abd86>] nfs_write_rpcsetup+0x20b/0x22d [nfs]
[<ffffffffa02ad1e8>] nfs_flush_one+0x97/0xc3 [nfs]
[<ffffffffa02a86b4>] nfs_pageio_doio+0x37/0x60 [nfs]
[<ffffffffa02a87c5>] nfs_pageio_complete+0xe/0x10 [nfs]
[<ffffffffa02ac264>] nfs_writepages+0xa7/0xe4 [nfs]
[<ffffffffa02ad151>] ? nfs_flush_one+0x0/0xc3 [nfs]
[<ffffffffa02acd2e>] nfs_write_mapping+0x63/0x9e [nfs]
[<ffffffff810f02fe>] ? __pmd_alloc+0x5d/0xaf
[<ffffffffa02acd9c>] nfs_wb_all+0x17/0x19 [nfs]
[<ffffffffa029f6f7>] nfs_do_fsync+0x21/0x4a [nfs]
[<ffffffffa029fc9c>] nfs_file_flush+0x67/0x70 [nfs]
[<ffffffff81117025>] filp_close+0x46/0x77
[<ffffffff81059e6b>] put_files_struct+0x7c/0xd0
[<ffffffff81059ef9>] exit_files+0x3a/0x3f
[<ffffffff8105b240>] do_exit+0x248/0x699
[<ffffffff8100e6a1>] ? xen_force_evtchn_callback+0xd/0xf
[<ffffffff8106898a>] ? freezing+0x13/0x15
[<ffffffff8105b731>] sys_exit_group+0x0/0x1b
[<ffffffff8106bd03>] get_signal_to_deliver+0x303/0x328
[<ffffffff8101120a>] do_notify_resume+0x90/0x6d7
[<ffffffff81459f06>] ? kretprobe_table_unlock+0x1c/0x1e
[<ffffffff8145ac6f>] ? kprobe_flush_task+0x71/0x7c
[<ffffffff8103164c>] ? paravirt_end_context_switch+0x17/0x31
[<ffffffff81123e8f>] ? path_put+0x22/0x27
[<ffffffff8101207e>] int_signal+0x12/0x17

We tried a lot(application coredump, kdump etc) but still not got solution until we notice that there were a lot of nfs related message in the kernel panic info(marked as red above).

As our linux server was not using NFS or autofs, so we tried upgrade nfs client(nfs-utils) and disabled autofs:

yum update nfs-utils

chkconfig autofs off

After this, the startup for IDM succeeded, and no server panic found anymore!