lvm volume resize by extending virtual disk image file

June 27th, 2016

Below should be ran from Dom0 hosting DomU:

[root@Dom0 ~]# virt-filesystems -a System.img
/dev/sda1
/dev/vg01/lv_root

[root@Dom0 ~]# virt-filesystems --long --parts --blkdevs -h -a System.img
Name Type MBR Size Parent
/dev/sda1 partition 83 500M /dev/sda
/dev/sda2 partition 8e 12G /dev/sda
/dev/sda device - 12G -

[root@Dom0 ~]# truncate -s 20G System_new.img

[root@Dom0 ~]# virt-resize --expand /dev/sda2 System.img System_new.img

[root@Dom0 ~]# mv System.img System.img.bak;mv System_new.img System.img

[root@Dom0 ~]# xm create vm.cfg -c #the first run may get issue "device cannot be connected", you can just run it again, the issue should be gone

Below should be ran from DomU:

[root@DomU ~]# vgs
VG #PV #LV #SN Attr VSize VFree
vg01 1 2 0 wz--n- 20.51g 8.00g

[root@DomU ~]# lvextend -L +8g /dev/mapper/vg01-lv_root
[root@DomU ~]# resize2fs /dev/mapper/vg01-lv_root

[root@DomU ~]# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg01-lv_root
20G 10G 10G 50% /

Categories: Clouding, IT Architecture, Oracle Cloud Tags:

linux dhcp config

June 8th, 2016
yum install dhcp

vi /etc/dhcpd.conf

    ddns-update-style ad-hoc;
    default-lease-time 600;
    max-lease-time 7200;

    subnet 192.168.144.0 netmask 255.255.248.0 {
        range 192.168.149.112 192.168.149.113;
        option domain-name-servers 10.135.82.180, 10.135.82.188;
        option routers 192.168.144.1;
    }

    subnet 192.168.128.0 netmask 255.255.248.0 {
    }

    host andy {
        hardware ethernet 00:16:3E:86:51:DC;
        fixed-address 10.1.1.100;
    }

/etc/init.d/dhcpd start


[root@dhcpd ~]# ls -l /var/lib/dhcp/dhcpd.leases

[root@dhcpd ~]# lsof -i :67
    COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
    dhcpd 28207 root 5u IPv4 1019206 UDP *:bootps

[root@client ~]# dhclient eth0 -s 192.168.149.109

PS:

1. Here's an article about how DHCP works. You can get more info here and here.
2.For security reason, the DHCP offer packages sent from DHCP server we build may
  get blocked. So we need ask Network Support to config DHCP relay on their DHCP 
  server to relay specified requests to use our own DHCP server. 
  
  As indicated below: 

      Jun  8 09:22:50 host1 dhclient: No DHCPOFFERS received.

Categories: IT Architecture, Linux, Systems Tags:

TCP wrappers /etc/hosts.allow /etc/hosts.deny

June 2nd, 2016

A simple example on linux box:

[root@test ~]# cat /etc/hosts.allow
sshd : ALL EXCEPT host1.example.com
snmpd : ALL EXCEPT host1.example.com
ALL : localhost

[root@test ~]# cat /etc/hosts.deny
ALL:ALL

And here's explaining:

Service "sshd/snmpd" will accept connections from all hosts except host1.example.com. All services will accept connections from localhost. Other services will deny connections from all hosts.

 

Categories: IT Architecture, Linux, Systems, Unix Tags:

resolved – The viewport is not between 320 and 420 pixels wide

April 28th, 2016

Today when I tried to enable page-level ads of google adsense, I met below error:

This page cannot display anchor ads for the following reason(s):

  • The viewport is not between 320 and 420 pixels wide.
  • The current browser is not supported.

The cause by this is that the wordpress theme currently used didn't support viewport(more info about viewport is here). To enable viewport, we can install one plugin - Definitely allow mobile zooming. After activating them, you can try again.

 

Categories: Misc Tags:

resolved – Windows cannot access the specified device, path, or file. You may not have the appropriate permission to access the item

April 22nd, 2016

I've been having this strange problem since this morning after starting office laptop from Hibernate. Restart is also not fixing it.

For any program, file, it just says that Windows cannot find the file ...etc . Simple search shows that it comes from anti virus. So, it is coming from Mcafee Host IPS. As soon as I turn that off from Mcaffee tray menu > Quick settings > Host IPS off, programs start working.

cannot_access

 

And here's a quick fix for this (Right click McAfee icon, then Quick settings, uncheck 'Host IPS - off'):

IPS_off

 

Categories: IT Architecture, Systems, Windows Tags:

resolved – net/core/dev.c:1894 skb_gso_segment+0x298/0x370()

April 19th, 2016

Today on one of our servers, there were a lot of errors in /var/log/messages like below:

║Apr 14 21:50:25 test01 kernel: WARNING: at net/core/dev.c:1894
║skb_gso_segment+0x298/0x370()
║Apr 14 21:50:25 test01 kernel: Hardware name: SUN FIRE X4170 M3
║Apr 14 21:50:25 test01 kernel: : caps=(0x60014803, 0x0) len=255
║data_len=215 ip_summed=1
║Apr 14 21:50:25 test01 kernel: Modules linked in: dm_nfs nfs fscache
║auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd
║ @ sunrpc 8021q garp bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser
║ @ rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio
║ @ ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp dm_round_robin libiscsi
║ @ dm_multipath scsi_transport_iscsi xenfs xen_privcmd dm_mirror video sbs sbshc
║acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport sr_mod cdrom
║ixgbe hwmon dca snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
║snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore
║snd_page_alloc iTCO_wdt iTCO_vendor_support pcspkr ghes i2c_i801 hed i2c_core
║dm_region_hash dm_log dm_mod usb_storage ahci libahci sg shpchp megaraid_sas
║sd_mod crc_t10dif ext3 jbd mbcache
║Apr 14 21:50:25 test01 kernel: Pid: 0, comm: swapper Tainted: G W
║ 2.6.39-400.264.4.el5uek #1
║Apr 14 21:50:25 test01 kernel: Call Trace:
║Apr 14 21:50:25 test01 kernel: <IRQ> [<ffffffff8143dab8>] ?
║skb_gso_segment+0x298/0x370
║Apr 14 21:50:25 test01 kernel: [<ffffffff8106f300>]
║warn_slowpath_common+0x90/0xc0
║Apr 14 21:50:25 test01 kernel: [<ffffffff8106f42e>]
║warn_slowpath_fmt+0x6e/0x70
║Apr 14 21:50:25 test01 kernel: [<ffffffff810d73a7>] ?
║irq_to_desc+0x17/0x20
║Apr 14 21:50:25 test01 kernel: [<ffffffff812faf0c>] ?
║notify_remote_via_irq+0x2c/0x40
║Apr 14 21:50:25 test01 kernel: [<ffffffff8100a820>] ?
║xen_clocksource_read+0x20/0x30
║Apr 14 21:50:25 test01 kernel: [<ffffffff812faf4c>] ?
║xen_send_IPI_one+0x2c/0x40
║Apr 14 21:50:25 test01 kernel: [<ffffffff81011f10>] ?
║xen_smp_send_reschedule+0x10/0x20
║Apr 14 21:50:25 test01 kernel: [<ffffffff81056e0b>] ?
║ttwu_queue_remote+0x4b/0x60
║Apr 14 21:50:25 test01 kernel: [<ffffffff81509a7e>] ?
║_raw_spin_unlock_irqrestore+0x1e/0x30
║Apr 14 21:50:25 test01 kernel: [<ffffffff8143dab8>]
║skb_gso_segment+0x298/0x370
║Apr 14 21:50:25 test01 kernel: [<ffffffff8143dba6>]
║dev_gso_segment+0x16/0x50
║Apr 14 21:50:25 test01 kernel: [<ffffffff8143dfb5>]
║dev_hard_start_xmit+0x3d5/0x530
║Apr 14 21:50:25 test01 kernel: [<ffffffff8145a074>]
║sch_direct_xmit+0xc4/0x1d0
║Apr 14 21:50:25 test01 kernel: [<ffffffff8143e811>]
║dev_queue_xmit+0x161/0x410
║Apr 14 21:50:25 test01 kernel: [<ffffffff815099de>] ?
║_raw_spin_lock+0xe/0x20
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045820c>]
║br_dev_queue_push_xmit+0x6c/0xa0 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff81076e77>] ?
║local_bh_enable+0x27/0xa0
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045e7ba>]
║br_nf_dev_queue_xmit+0x2a/0x90 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045f668>]
║br_nf_post_routing+0x1f8/0x2e0 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff81467428>]
║nf_iterate+0x78/0x90
║Apr 14 21:50:25 test01 kernel: [<ffffffff8146777c>]
║nf_hook_slow+0x7c/0x130
║Apr 14 21:50:25 test01 kernel: [<ffffffffa04581a0>] ?
║br_forward_finish+0x70/0x70 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa04581a0>] ?
║br_forward_finish+0x70/0x70 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa0458130>] ?
║br_flood_deliver+0x20/0x20 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa0458186>]
║br_forward_finish+0x56/0x70 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045eba4>]
║br_nf_forward_finish+0xb4/0x180 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045f36f>]
║br_nf_forward_ip+0x26f/0x370 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff81467428>]
║nf_iterate+0x78/0x90
║Apr 14 21:50:25 test01 kernel: [<ffffffff8146777c>]
║nf_hook_slow+0x7c/0x130
║Apr 14 21:50:25 test01 kernel: [<ffffffffa0458130>] ?
║br_flood_deliver+0x20/0x20 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff81467428>] ?
║nf_iterate+0x78/0x90
║Apr 14 21:50:25 test01 kernel: [<ffffffffa0458130>] ?
║br_flood_deliver+0x20/0x20 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa04582c8>]
║__br_forward+0x88/0xc0 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa0458356>]
║br_forward+0x56/0x60 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa04591fc>]
║br_handle_frame_finish+0x1ac/0x240 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045ee1b>]
║br_nf_pre_routing_finish+0x1ab/0x350 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff8115bfe9>] ?
║kmem_cache_alloc_trace+0xc9/0x1a0
║Apr 14 21:50:25 test01 kernel: [<ffffffffa045fc55>]
║br_nf_pre_routing+0x305/0x370 [bridge]
║Apr 14 21:50:25 test01 kernel: [<ffffffff8100122a>] ?
║xen_hypercall_xen_version+0xa/0x20
║Apr 14 21:50:25 test01 kernel: [<ffffffff81467428>]
║nf_iterate+0x78/0x90
║Apr 14 21:50:25 test01 kernel: [<ffffffff8146777c>]
║nf_hook_slow+0x7c/0x130

To fix this, we should disable LRO(large receive offload) first:

for i in eth0 eth1 eth2 eth3;do /sbin/ethtool -K $i lro off;done

And if the NICs are of Intel 10G, the we should disable GRO(generic receive offload) too:

for i in eth0 eth1 eth2 eth3;do /sbin/ethtool -K $i gro off;done

Here's the command to disable both of LRO/GRO:

for i in eth0 eth1 eth2 eth3;do /sbin/ethtool -K $i gro off;/sbin/ethtool -K $i lro off;done

 

create shared iscsi LUNs from local disk on Linux

January 19th, 2016

We can use iscsitarget to share local disks as iscsi LUNs for clients. Below are brief steps.

First, install some packages:

yum install kernel-devel iscsi-initiator-utils -y #'kernel-uek-devel' if you are using oracle linux with UEK
cd iscsitarget-1.4.20.2 #download the package from here
make
make install

And below are some useful tips about iscsitarget:

The iSCSI target consists of a kernel module (/lib/modules/`uname -r`/extra/iscsi/iscsi_trgt.ko)

The kernel modules will be installed in the module directory of the kernel

The daemon(/usr/sbin/ietd) and the control tool(/usr/sbin/ietadm)

/etc/init.d/iscsi-target status

/etc/iet/{ietd.conf, initiators.allow, targets.allow}

Later, we can modify IET(iSCSI Enterprise Target) config file:

vi /etc/iet/ietd.conf

    Target iqn.2001-04.org.doxer:server1-shared-local-disks
        Lun 0 Path=/dev/sdb1,Type=fileio,ScsiId=0,ScsiSN=doxerorg
        Lun 1 Path=/dev/sdb2,Type=fileio,ScsiId=1,ScsiSN=doxerorg
        Lun 2 Path=/dev/sdb3,Type=fileio,ScsiId=2,ScsiSN=doxerorg
        Lun 3 Path=/dev/sdb4,Type=fileio,ScsiId=3,ScsiSN=doxerorg
        Lun 4 Path=/dev/sdb5,Type=fileio,ScsiId=4,ScsiSN=doxerorg
        Lun 5 Path=/dev/sdb6,Type=fileio,ScsiId=5,ScsiSN=doxerorg
        Lun 6 Path=/dev/sdb7,Type=fileio,ScsiId=6,ScsiSN=doxerorg
        Lun 7 Path=/dev/sdb8,Type=fileio,ScsiId=7,ScsiSN=doxerorg

chkconfig iscsi-target on
/etc/init.d/iscsi-target start

Assume the server sharing local disks for iscsi LUN is with IP 192.168.10.212, and we can do below on client hosts to scan for iscsi LUNs:

[root@client01 ~]# iscsiadm -m discovery -t st -p 192.168.10.212
Starting iscsid:                                           [  OK  ]
192.168.10.212:3260,1 iqn.2001-04.org.doxer:server1-shared-local-disks

[root@client01 ~]# iscsiadm -m node -T iqn.2001-04.org.doxer:server1-shared-local-disks -p 192.168.10.212 -l
Logging in to [iface: default, target: iqn.2001-04.org.doxer:server1-shared-local-disks, portal: 192.168.10.212,3260] (multiple)
Login to [iface: default, target: iqn.2001-04.org.doxer:server1-shared-local-disks, portal: 192.168.10.212,3260] successful.

[root@client01 ~]# iscsiadm -m session --rescan #or iscsiadm -m session -r SID --rescan
Rescanning session [sid: 1, target: iqn.2001-04.org.doxer:server1-shared-local-disks, portal: 192.168.10.212,3260]

[root@client01 ~]# iscsiadm -m session -P 3

You can also scan for LUNs on the server with local disk shared, but you should make sure iscsi-target service boot up between network & iscsi:

mv /etc/rc3.d/S39iscsi-target /etc/rc3.d/S12iscsi-target

PS:

  1. iSCSI target port default to 3260. You can check iscsi connection info in /var/lib/iscsi/send_targets/ - <iscsi portal ip, port>, and /var/lib/iscsi/nodes/ - <target iqn>/<iscsi portal ip, port>.
  2. If there are multiple targets to log on, we can use "iscsiadm -m node --loginall=all".  "iscsiadm -m node -T iqn.2001-04.org.doxer:server1-shared-local-disks -p 192.168.10.212 -u" to log out.
  3. More info is here (includes windows iscsi operation), and here is about create iSCSI on Oracle ZFS appliance.

 

Categories: Hardware, NAS, SAN, Storage Tags:

resolved – /etc/rc.local not executed on boot in linux

November 11th, 2015

When you find your scripts in /etc/rc.local not executed along with system boots, then one possibility is that the previous subsys script takes too long to execute, as /etc/rc.local is usually the last one to execute, i.e. S99local. To prove which is the culprit subsys that gets stuck, you can edit /etc/rc.d/rc(which is from /etc/inittab):

[root@host1 tmp] vi /etc/rc.d/rc
# Now run the START scripts.
for i in /etc/rc$runlevel.d/S* ; do
        check_runlevel "$i" || continue

        # Check if the subsystem is already up.
        subsys=${i#/etc/rc$runlevel.d/S??}
        [ -f /var/lock/subsys/$subsys -o -f /var/lock/subsys/$subsys.init ] \
                && continue

        # If we're in confirmation mode, get user confirmation
        if [ -f /var/run/confirm ]; then
                confirm $subsys
                test $? = 1 && continue
        fi

        update_boot_stage "$subsys"
        # Bring the subsystem up.
        if [ "$subsys" = "halt" -o "$subsys" = "reboot" ]; then
                export LC_ALL=C
                exec $i start
        fi
        if LC_ALL=C egrep -q "^..*init.d/functions" $i \
                        || [ "$subsys" = "single" -o "$subsys" = "local" ]; then
                echo $i>>/var/tmp/process.txt
                $i start
                echo $i>>/var/tmp/process_end.txt
        else
                echo $i>>/var/tmp/process_self.txt
                action $"Starting $subsys: " $i start
                echo $i>>/var/tmp/process_self_end.txt
        fi
done

Then you can reboot the system, and check files /var/tmp/{process.txt,process_end.txt,process_self.txt,process_self_end.txt}. In one of the host, I found below entries:

[root@host1 tmp]# tail process.txt
/etc/rc3.d/S85gpm
/etc/rc3.d/S90crond
/etc/rc3.d/S90xfs
/etc/rc3.d/S91vncserver
/etc/rc3.d/S95anacron
/etc/rc3.d/S95atd
/etc/rc3.d/S95emagent_public
/etc/rc3.d/S97rhnsd
/etc/rc3.d/S98avahi-daemon
/etc/rc3.d/S98gcstartup

[root@host1 tmp]# tail process_end.txt
/etc/rc3.d/S85gpm
/etc/rc3.d/S90crond
/etc/rc3.d/S90xfs
/etc/rc3.d/S91vncserver
/etc/rc3.d/S95anacron
/etc/rc3.d/S95atd
/etc/rc3.d/S95emagent_public
/etc/rc3.d/S97rhnsd
/etc/rc3.d/S98avahi-daemon

So from here, we can see /etc/rc3.d/S98gcstartup tried start, but it took too long to finish. To make sure scripts in /etc/rc.local get executed and also the stuck script /etc/rc3.d/S98gcstartup get executed also, we can do this:

[root@host1 tmp]# mv /etc/rc3.d/S98gcstartup /etc/rc3.d/s98gcstartup
[root@host1 tmp]# vi /etc/rc.local

#!/bin/sh

touch /var/lock/subsys/local

#put your scripts here - begin

#put your scripts here - end

#put the stuck script here and make sure it's the last line
/etc/rc3.d/s98gcstartup start

After this, reboot the host and check whether scripts in /etc/rc.local got executed.

Categories: IT Architecture, Kernel, Linux, Systems, Unix Tags:

resolved – xend error: (98, ‘Address already in use’)

November 4th, 2015

Today one OVS server met issue with ovs-agent and need reboot. As there were VMs running on it, so I tried live migrating xen based VMs using "xm migrate -l", but below error occurred:

-bash-3.2# xm migrate -l vm1 server1
Error: can't connect: (111, 'Connection refused')
Usage: xm migrate  

Migrate a domain to another machine.

Options:

-h, --help           Print this help.
-l, --live           Use live migration.
-p=portnum, --port=portnum
                     Use specified port for migration.
-n=nodenum, --node=nodenum
                     Use specified NUMA node on target.
-s, --ssl            Use ssl connection for migration.

As xen migration use xend-relocation-server of xend-relocation-port, so this "Connection refused" issue was most likely related to this. And below is the configuration of /etc/xen/xend-config.sxp:

-bash-3.2# egrep -v '^#|^$' /etc/xen/xend-config.sxp
(xend-unix-server yes)
(xend-relocation-server yes)
(xend-relocation-ssl-server no)
(xend-unix-path /var/lib/xend/xend-socket)
(xend-relocation-port 8002)
(xend-relocation-server-ssl-key-file /etc/ovs-agent/cert/key.pem)
(xend-relocation-server-ssl-cert-file /etc/ovs-agent/cert/certificate.pem)
(xend-relocation-address '')
(xend-relocation-hosts-allow '')
(vif-script vif-bridge)
(dom0-min-mem 0)
(enable-dom0-ballooning no)
(dom0-cpus 0)
(vnc-listen '0.0.0.0')
(vncpasswd '')
(xend-domains-lock-path /opt/ovs-agent-2.3/utils/dlm.py)
(domain-shutdown-hook /opt/ovs-agent-2.3/utils/hook_vm_shutdown.py)

And to check the progresses related with these:

-bash-3.2# lsof -i :8002
COMMAND   PID     USER   FD   TYPE    DEVICE SIZE NODE NAME
xend    12095 root    5u  IPv4 146473964       TCP *:teradataordbms (LISTEN)

-bash-3.2# ps auxww|egrep '/opt/ovs-agent-2.3/utils/dlm.py|/opt/ovs-agent-2.3/utils/hook_vm_shutdown.py'
root  3501  0.0  0.0   3924   740 pts/0    S+   08:37   0:00 egrep /opt/ovs-agent-2.3/utils/dlm.py|/opt/ovs-agent-2.3/utils/hook_vm_shutdown.py
root 19007  0.0  0.0  12660  5840 ?        D    03:44   0:00 python /opt/ovs-agent-2.3/utils/dlm.py --lock --name vm1 --uuid 56f17372-0a86-4446-8603-d82423c54367
root 27446  0.0  0.0  12664  5956 ?        D    05:11   0:00 python /opt/ovs-agent-2.3/utils/dlm.py --lock --name vm2 --uuid eb1a4e84-3572-4543-8b1d-685b856d98c7

When processes went into D state(uninterruptable sleep), it'll be troublesome, as these processes can only be killed by reboot the whole system. However, on this server, we had many VMs running, and now live migration/relocation was blocked by issue caused by itself, and deadlock surfaced. And seems reboot was the only way to "resolve" the issue.

Firstly, I tried bounce xend(/etc/init.d/xend restart), but met below error indicated in /var/log/message:

[2015-11-04 04:39:43 24026] INFO (SrvDaemon:227) Xend stopped due to signal 15.
[2015-11-04 04:39:43 24115] INFO (SrvDaemon:332) Xend Daemon started
[2015-11-04 04:39:43 24115] INFO (SrvDaemon:336) Xend changeset: unavailable.
[2015-11-04 04:40:14 24115] ERROR (SrvDaemon:349) Exception starting xend ((98, 'Address already in use'))
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvDaemon.py", line 339, in run
    relocate.listenRelocation()
  File "/usr/lib/python2.4/site-packages/xen/xend/server/relocate.py", line 159, in listenRelocation
    hosts_allow = hosts_allow)
  File "/usr/lib/python2.4/site-packages/xen/web/tcp.py", line 36, in __init__
    connection.SocketListener.__init__(self, protocol_class)
  File "/usr/lib/python2.4/site-packages/xen/web/connection.py", line 89, in __init__
    self.sock = self.createSocket()
  File "/usr/lib/python2.4/site-packages/xen/web/tcp.py", line 49, in createSocket
    sock.bind((self.interface, self.port))
  File "", line 1, in bind
error: (98, 'Address already in use')

And later, I realized that we can change xend-relocation-port to have a try. So I made below changes to /etc/xen/xend-config.sxp:

(xend-relocation-port 8003)

And later, bounced xend:

/etc/init.d/xend stop; /etc/init.d/xend start

PS: xend bouncing will not affect running VMs, as I had compared qemu output(ps -ef|grep qemu). A tip here is that when xen related commands(xm list, and so on) stopped working, checking for "qemu" simulator processes will help you get the VM list.

After this, "xm migrate -l vm1 server1" still failed with the same can't connect: (111, 'Connection refused'). And I resolved this by specifying port:(you may need stop iptables too):

-bash-3.2# xm migrate -l -p 8002 vm1 server1

Now the live migration went on smoothly, and after all VMs were migrated, I changed xend-relocation-port back to 8002 and reboot the server to fix the D state(uninterruptable sleep) issue.

PS:

If you find error "Error: can't connect: (111, 'Connection refused')" even after above WA, then you can change back from 8003 to 8002, or even from 8003 to 8004, restart iptables, and try again.

Categories: Clouding, IT Architecture Tags:

resolved – mountd Cannot export /scratch, possibly unsupported filesystem or fsid= required

November 2nd, 2015

Today when I tried to access one autofs exported filesystem on one client host, it reported error:

[root@server01 ~]# cd /net/client01/scratch/
-bash: cd: scratch/: No such file or directory

From server side, we can see it's exported and writable:

[root@client01 ~]# cat /etc/exports
/scratch *(rw,no_root_squash)

[root@client01 ~]# df -h /scratch
Filesystem            Size  Used Avail Use% Mounted on
nas01:/export/generic/share_scratch
                      200G  103G   98G  52% /scratch

So I tried mount manually on client side, but still error reported:

[root@server01 ~]# mount client01:/scratch /media
mount: client01:/scratch failed, reason given by server: Permission denied

Here's log on server side:

[root@client01 scratch]# tail -f /var/log/messages
Nov  2 03:41:58 client01 mountd[2195]: Caught signal 15, un-registering and exiting.
Nov  2 03:41:58 client01 kernel: nfsd: last server has exited, flushing export cache
Nov  2 03:41:59 client01 kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Nov  2 03:41:59 client01 kernel: NFSD: starting 90-second grace period
Nov  2 03:42:11 client01 mountd[16046]: authenticated mount request from 192.162.100.137:1002 for /scratch (/scratch)
Nov  2 03:42:11 client01 mountd[16046]: Cannot export /scratch, possibly unsupported filesystem or fsid= required
Nov  2 03:43:05 client01 mountd[16046]: authenticated mount request from 192.162.100.137:764 for /scratch (/scratch)
Nov  2 03:43:05 client01 mountd[16046]: Cannot export /scratch, possibly unsupported filesystem or fsid= required
Nov  2 03:44:11 client01 mountd[16046]: authenticated mount request from 192.165.28.40:670 for /scratch (/scratch)
Nov  2 03:44:11 client01 mountd[16046]: Cannot export /scratch, possibly unsupported filesystem or fsid= required

After some debugging, the reason was that the exported FS was already NFS filesystem on server side, and NFS FS cannot be exported again:

[root@client01 ~]# df -h /scratch
Filesystem            Size  Used Avail Use% Mounted on
nas01:/export/generic/share_scratch
                      200G  103G   98G  52% /scratch

To WA this, just do normal mount of the NFS share instead of using autofs export:

[root@server01 ~]# mount nas01:/export/generic/share_scratch /media
Categories: Hardware, IT Architecture, NAS, Storage Tags: