Home > Clouding, IT Architecture > resolved – xend error: (98, ‘Address already in use’)

resolved – xend error: (98, ‘Address already in use’)

November 4th, 2015

Today one OVS server met issue with ovs-agent and need reboot. As there were VMs running on it, so I tried live migrating xen based VMs using "xm migrate -l", but below error occurred:

-bash-3.2# xm migrate -l vm1 server1
Error: can't connect: (111, 'Connection refused')
Usage: xm migrate  

Migrate a domain to another machine.

Options:

-h, --help           Print this help.
-l, --live           Use live migration.
-p=portnum, --port=portnum
                     Use specified port for migration.
-n=nodenum, --node=nodenum
                     Use specified NUMA node on target.
-s, --ssl            Use ssl connection for migration.

As xen migration use xend-relocation-server of xend-relocation-port, so this "Connection refused" issue was most likely related to this. And below is the configuration of /etc/xen/xend-config.sxp:

-bash-3.2# egrep -v '^#|^$' /etc/xen/xend-config.sxp
(xend-unix-server yes)
(xend-relocation-server yes)
(xend-relocation-ssl-server no)
(xend-unix-path /var/lib/xend/xend-socket)
(xend-relocation-port 8002)
(xend-relocation-server-ssl-key-file /etc/ovs-agent/cert/key.pem)
(xend-relocation-server-ssl-cert-file /etc/ovs-agent/cert/certificate.pem)
(xend-relocation-address '')
(xend-relocation-hosts-allow '')
(vif-script vif-bridge)
(dom0-min-mem 0)
(enable-dom0-ballooning no)
(dom0-cpus 0)
(vnc-listen '0.0.0.0')
(vncpasswd '')
(xend-domains-lock-path /opt/ovs-agent-2.3/utils/dlm.py)
(domain-shutdown-hook /opt/ovs-agent-2.3/utils/hook_vm_shutdown.py)

And to check the progresses related with these:

-bash-3.2# lsof -i :8002
COMMAND   PID     USER   FD   TYPE    DEVICE SIZE NODE NAME
xend    12095 root    5u  IPv4 146473964       TCP *:teradataordbms (LISTEN)

-bash-3.2# ps auxww|egrep '/opt/ovs-agent-2.3/utils/dlm.py|/opt/ovs-agent-2.3/utils/hook_vm_shutdown.py'
root  3501  0.0  0.0   3924   740 pts/0    S+   08:37   0:00 egrep /opt/ovs-agent-2.3/utils/dlm.py|/opt/ovs-agent-2.3/utils/hook_vm_shutdown.py
root 19007  0.0  0.0  12660  5840 ?        D    03:44   0:00 python /opt/ovs-agent-2.3/utils/dlm.py --lock --name vm1 --uuid 56f17372-0a86-4446-8603-d82423c54367
root 27446  0.0  0.0  12664  5956 ?        D    05:11   0:00 python /opt/ovs-agent-2.3/utils/dlm.py --lock --name vm2 --uuid eb1a4e84-3572-4543-8b1d-685b856d98c7

When processes went into D state(uninterruptable sleep), it'll be troublesome, as these processes can only be killed by reboot the whole system. However, on this server, we had many VMs running, and now live migration/relocation was blocked by issue caused by itself, and deadlock surfaced. And seems reboot was the only way to "resolve" the issue.

Firstly, I tried bounce xend(/etc/init.d/xend restart), but met below error indicated in /var/log/message:

[2015-11-04 04:39:43 24026] INFO (SrvDaemon:227) Xend stopped due to signal 15.
[2015-11-04 04:39:43 24115] INFO (SrvDaemon:332) Xend Daemon started
[2015-11-04 04:39:43 24115] INFO (SrvDaemon:336) Xend changeset: unavailable.
[2015-11-04 04:40:14 24115] ERROR (SrvDaemon:349) Exception starting xend ((98, 'Address already in use'))
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvDaemon.py", line 339, in run
    relocate.listenRelocation()
  File "/usr/lib/python2.4/site-packages/xen/xend/server/relocate.py", line 159, in listenRelocation
    hosts_allow = hosts_allow)
  File "/usr/lib/python2.4/site-packages/xen/web/tcp.py", line 36, in __init__
    connection.SocketListener.__init__(self, protocol_class)
  File "/usr/lib/python2.4/site-packages/xen/web/connection.py", line 89, in __init__
    self.sock = self.createSocket()
  File "/usr/lib/python2.4/site-packages/xen/web/tcp.py", line 49, in createSocket
    sock.bind((self.interface, self.port))
  File "", line 1, in bind
error: (98, 'Address already in use')

And later, I realized that we can change xend-relocation-port to have a try. So I made below changes to /etc/xen/xend-config.sxp:

(xend-relocation-port 8003)

And later, bounced xend:

/etc/init.d/xend stop; /etc/init.d/xend start

PS: xend bouncing will not affect running VMs, as I had compared qemu output(ps -ef|grep qemu). A tip here is that when xen related commands(xm list, and so on) stopped working, checking for "qemu" simulator processes will help you get the VM list.

After this, "xm migrate -l vm1 server1" still failed with the same can't connect: (111, 'Connection refused'). And I resolved this by specifying port:(you may need stop iptables too):

-bash-3.2# xm migrate -l -p 8002 vm1 server1

Now the live migration went on smoothly, and after all VMs were migrated, I changed xend-relocation-port back to 8002 and reboot the server to fix the D state(uninterruptable sleep) issue.

PS:

If you find error "Error: can't connect: (111, 'Connection refused')" even after above WA, then you can change back from 8003 to 8002, or even from 8003 to 8004, restart iptables, and try again.

Good Luck!


Categories: Clouding, IT Architecture Tags:
Comments are closed.