Archive

Author Archive

Enable NIS client on linux host

July 2nd, 2014 1 comment

After you set up NIS server, you need set up NIS client. Here's the steps for enabling NIS client on linux box.

Ensure required packages are installed

rpm -qa|egrep 'yp-tools|ypbind|portmap'

Edit /etc/sysconfig/network

NISDOMAIN=example.com

Edit /etc/yp.conf
domain example.com server 10.229.169.88
domain example.com server 10.229.192.99

Set NIS domain-name

domainname example.com
ypdomainname example.com

Set /etc/nsswitch.conf

passwd: files nis
shadow: files nis
group: files nis
hosts: files dns nis
bootparams: nisplus [NOTFOUND=return] files
ethers: files
netmasks: files
networks: files
protocols: files
rpc: files
services: files
netgroup: nisplus
publickey: nisplus
automount: files nisplus
aliases: files nisplus
sudoers: files nis

Make sure the portmap service is running:

service portmap start

chkconfig portmap on

Start ypbind service:

service ypbind start
chkconfig ypbind on

Test it out:

rpcinfo -u localhost ypbind

ypcat passwd|egrep 'username'

If you want to set up sudo privileges for NIS users, then you can refer to this article resolved – /etc/sudoers: syntax error near line 10

PS:

If there's firewall between Linux NIS clients and NIS servers, then you should not startup ypbind(chkconfig ypbind off; service ypbind stop), if you startup ypbind, then the box will try to connect to NIS servers without stopping. Your linux box will get stuck and will take a long time for you to log on even as root. This is rule of thumb.

Categories: IT Architecture, Linux, Systems, Unix Tags:

resolved – /etc/sudoers: syntax error near line 10

July 2nd, 2014 No comments

When using /usr/sbin/visudo, after modification, errors occurred:

>>> /etc/sudoers: syntax error near line 10 <<<

Here's line 10:

User_Alias Users_SDITAS = username1, username2

Then I changed it as following:

User_Alias USERS_SDITAS = username1, username2

And now everything is ok. So this means that the alias name must all be uppercase.

PS:
1. Here's the explanation about User_Alias Users_SDITAS = username1, username2

The first part is the user,
The second is the terminal from where the user can use sudo command,
The third part is which users he may act as,
The last one, is which commands he may run when using sudo.
For example, root ALL=(ALL) ALL, means the root user can execute from ALL terminals, acting as ALL (any) users, and run ALL (any) command. And USERS_SDITAS ALL=(oracle) NOPASSWD:SETENV: CMD_MIGRATIONDC1DC3 means users in group USERS_SDITAS can execute from ALL terminals, acting as oracle user, and run commands in group CMD_MIGRATIONDC1DC3. (sudo -E -u oracle <command>, -E will pass invoking users env variables to target user if SETENV tag is added to sudo commands in /etc/sudoers. You'll get error message "sudo: sorry, you are not allowed to preserve the environment" if you did not add SETENV tag in /etc/sudoers. You can run sudo -l or sudo -ll to get a list of privilege commands for you or for others if you run sudo -l -U <username> )

2. One sample of /etc/sudoers configuration in linux(use visudo to edit, as visudo can check for errors after modification. You may need set "echo 'export PATH=/usr/bin:$PATH' >> /etc/profile" in some circumstances so that sudo will be /usr/bin/sudo):

Defaults logfile=/var/log/sudo.log

Defaults always_set_home #switched to target user's home directory when running sudo. Note that HOME is already set when the the env_reset option is enabled, so always_set_home is only effective for configurations where either env_reset is disabled(Defaults !env_reset) or HOME is present in the env_keep list(Defaults env_keep += HOME). This flag is off by default.
Host_Alias HOSTS_MIGRATIONDC1DC3 = slcn06vmf0012, slcn06vmf0013
Cmnd_Alias CMD_MIGRATIONDC1DC3 = /u01/local/wls/user_projects/domains/base_domain/bin/tasctl, /u01/shared/wls/Oracle_SDI1/sdictl/sdictl.sh
User_Alias USERS_SDITAS =username1, username2
USERS_SDITAS ALL=(ALL) NOPASSWD: /bin/su - oracle #users in USERS_SDITAS group can now sudo su - oracle without asking for a password
oracle ALL=(ALL) NOPASSWD:SETENV: CMD_MIGRATIONDC1DC3 #oracle user can run all commands in commands group CMD_MIGRATIONDC1DC3.

3. To check  whether some NIS users are using/bin/false shell(means they can not log on the host by ssh), use the following commands:

ypcat passwd|awk -F: '{if($1 ~ /^username1$|^username2$/) print}'|grep false

Categories: IT Architecture, Linux, Systems, Unix Tags: ,

Resolved – Your boot partition is on a disk using the GPT partitioning scheme but this machine cannot boot using GPT

June 12th, 2014 1 comment

Today when I tried to install Oracle VM Server on one server, the following error occurred:

Your boot partition is on a disk using the GPT partitioning scheme but this machine cannot boot using GPT. This can happen if there is not enough space on your hard drive(s) for the installation.

So to went on with the installation, I had to think of a way to erase GPT partition table on the drive.

To do this, the first step is to fall into linux rescue mode when booting from CDROM:

rescue

Later, check with fdisk -l, I could see that /dev/sda was the only disk that needed erasing GPT label. So I used dd if=/dev/zero of=/dev/sda bs=512 count=1 to erase GPT table:

 

fdisk_dd

 

After this, run fdisk -l again, I saw that the partition table was now gone:

fdisk_dd_2

Later, re-initializing installation of OVS server. When the following message prompted, select "No":

select_no

And select "yes" when below message prompted so that we can make new partition table:

select_yes

The steps after this was normal ones, and the installation went smoothly.

Resolved – rm cannot remove some files with error message “Device or resource busy”

June 11th, 2014 No comments

If you meet problem when remove one file on linux with below error message:

[root@test-host ~]# rm -rf /u01/shared/*
rm: cannot remove `/u01/shared/WLS/oracle_common/soa/modules/oracle.soa.mgmt_11.1.1/.nfs0000000000004abf00000001': Device or resource busy
rm: cannot remove `/u01/shared/WLS/oracle_common/modules/oracle.jrf_11.1.1/.nfs0000000000005c7a00000002': Device or resource busy
rm: cannot remove `/u01/shared/WLS/OracleHome/soa/modules/oracle.soa.fabric_11.1.1/.nfs0000000000006bcf00000003': Device or resource busy

Then it means that some progresses were still referring to these files. You have to stop these processes before remove these files. You can use linux command lsof to find the processes using specific files:

[root@test-host ~]# lsof |grep nfs0000000000004abf00000001
java 2956 emcadm mem REG 0,21 1095768 19135 /u01/shared/WLS/oracle_common/soa/modules/oracle.soa.mgmt_11.1.1/.nfs0000000000004abf00000001 (slce49sn-nas:/export/C9QA123_DC1/tas_central_shared)
java 2956 emcadm 88r REG 0,21 1095768 19135 /u01/shared/WLS/oracle_common/soa/modules/oracle.soa.mgmt_11.1.1/.nfs0000000000004abf00000001 (slce49sn-nas:/export/C9QA123_DC1/tas_central_shared)

So from here you can see that processe with PID 2956 is still using file /u01/shared/WLS/oracle_common/soa/modules/oracle.soa.mgmt_11.1.1/.nfs0000000000004abf00000001.

However, some systems have no lsof installed by default. Then you can install it or by using the alternative one "fuser":

[root@test-host ~]# fuser -cu /u01/shared/WLS/oracle_common
/u01/shared/WLS/oracle_common: 2956m(emcadm) 7358c(aime)

Then you can see also that progresses with PIDs 2956 and 7358 are referring to the directory /u01/shared/WLS/oracle_common.

so you'll need stop the process first by killing it(or stop it using the processes own stop() method if defined):

kill -9 2956

After that, you can try remove the files again, should be ok this time.

Categories: IT Architecture, Kernel, Linux, Systems, Unix Tags:

Resolved – failed Exception check srv hostname/IP failedException Invalid hostname/IP configuration ocfs2 config failed Obsolete nodes found

June 3rd, 2014 No comments

Today when I tried to add two OVS servers into one server pool, errors were met. The first one was like below:

2014-06-03 04:26:08.965 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:09.485 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:09.497 NOTIFICATION [Server Pool Management][Server][hostname1.example.com]:Check agent (hostname1.example.com) connectivity.
2014-06-03 04:26:12.463 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:12.985 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:12.997 NOTIFICATION [Server Pool Management][Server][hostname1.example.com]:Check agent (hostname1.example.com) connectivity.
2014-06-03 04:26:13.004 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:13.522 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:13.535 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:13.980 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:16.307 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:16.831 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:16.844 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:17.284 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:17.290 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:17.814 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:17.827 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:18.272 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:18.279 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:18.799 NOTIFICATION Regisering server:hostname1.example.com...
2014-06-03 04:26:21.749 NOTIFICATION Register Server: hostname1.example.com success
2014-06-03 04:26:21.751 NOTIFICATION Getting host info for server:hostname1.example.com ...
2014-06-03 04:26:23.894 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) starting.
failed:<Exception: check srv('hostname1.example.com') hostname/IP failed! => <Exception: Invalid hostname/IP configuration: hostname=hostname1;ip=10.200.225.127>
2014-06-03 04:26:33.348 ERROR [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:During adding servers ([hostname1.example.com]) to server pool (DC1_DMZ_Service_Mid), Cluster setup failed: (OVM-1011 OVM Manager communication with materhost for operation HA Setup for Oracle VM Agent 2.2.0 failed:
failed:<Exception: check srv('hostname1.example.com') hostname/IP failed! => <Exception: Invalid hostname/IP configuration: hostname=hostname1;ip=10.200.225.127>

Also there's error message like below:

2014-06-03 04:59:11.003 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:11.524 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:11.536 NOTIFICATION [Server Pool Management][Server][hostname1-fe.example.com]:Check agent (hostname1-fe.example.com) connectivity.
2014-06-03 04:59:15.484 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:16.005 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:16.016 NOTIFICATION [Server Pool Management][Server][hostname1-fe.example.com]:Check agent (hostname1-fe.example.com) connectivity.
2014-06-03 04:59:16.025 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:16.546 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:16.559 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:17.014 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:18.950 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:19.470 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:19.483 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:19.926 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:19.955 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:20.476 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:20.490 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:20.943 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:20.947 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:21.471 NOTIFICATION Regisering server:hostname1-fe.example.com...
2014-06-03 04:59:24.439 NOTIFICATION Register Server: hostname1-fe.example.com success
2014-06-03 04:59:24.439 NOTIFICATION Getting host info for server:hostname1-fe.example.com ...
2014-06-03 04:59:26.577 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) starting.
failed:<Exception: check srv('hostname1-fe.example.com') ocfs2 config failed! => <Exception: Obsolete nodes found: >
2014-06-03 04:59:37.100 ERROR [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:During adding servers ([hostname1-fe.example.com]) to server pool (DC1_DMZ_Service_Mid), Cluster setup failed: (OVM-1011 OVM Manager communication with materhost for operation HA Setup for Oracle VM Agent 2.2.0 failed:
failed:<Exception: check srv('hostname1-fe.example.com') ocfs2 config failed! => <Exception: Obsolete nodes found: >

Then I was confused about "Obsolete nodes found" it complained. I could confirm that I've removed hostname1.example.com, and even after I checked OVM DB in OVS.OVS_SERVER, there's no record about hostname1.example.com.

Then after some searching, these errors were caused by obsolete info in OCFS2(Oracle Cluster File System). We should edit file /etc/ocfs2/cluster.conf and remove obsolete entries.

-bash-3.2# vi /etc/ocfs2/cluster.conf
node:
        ip_port     = 7777
        ip_address  = 10.200.169.190
        number      = 0
        name        = hostname1
        cluster     = ocfs2

node:
        ip_port     = 7777
        ip_address  = 10.200.169.191
        number      = 1
        name        = hostname2
        cluster     = ocfs2

cluster:
        node_count  = 2
        name        = ocfs2

So hostname2 was no longer needed or the IP address of hostname2 was changed, then you should remove entries related to hostname2, and modify node_count to 1. Later bounce ocfs2/o2cb services:

service ocfs2 restart

service o2cb restart

Later, I tried add OVS server again, and it worked! (Before adding that OVS server back, we need first remove its ovs-agent db: service ovs-agent stop; mv /etc/ovs-agent/db /var/tmp/db.bak.5; service ovs-agent start, and then configure ovs-agent service ovs-agent configure. You can also use /opt/ovs-agent-2.3/utils/cleanup.py to clean up too.)

 

SSH port forwarding

May 30th, 2014 No comments

As we know, SSH encrypts traffic between ssh client and ssh server. SSH forwarding can encrypt and forward data traffic of other TCP ports. This is AKA tunneling:

SSH is a client/server application that allows secure connectivity to servers. In practice, it is usually used just like Telnet. The advantage of SSH over Telnet is that it encrypts all data before sending it. While not originally designed to be a tunnel in the sense that VPN or GRE would be considered a tunnel, SSH can be used to access remote devices in addition to the one to which you have connected. While this does not have a direct application on Cisco routers, the concept is similar to that of VPN and GRE tunnels, and thus worth mentioning. I use SSH to access my home netork instead of a VPN. Here's PPTP vpn configuration on linux if you're interested. 

Also, when there's firewall blocking other TCP ports but allow SSH port 22, then you can use SSH to forward these TCP ports so that you can communicate to the blocked TCP ports.

SSH Local Port forwarding

img1LDAP server allows only localhost to visit it's 389 port. So how can we connect from another host to its 389 port?

On LdapClientHost:

ssh -L 7001:localhost:389 <user@LdapServerHost> #with format as ssh -L <local port>:<remote host>:<remote port> <SSH hostname>

Below is for putty setting equivalent to above command:

a b

After this, you can connect to LdapClientHost:7001, then the data flow will be like:

  1. App on LdapClientHost sends data to LdapClientHost:7001;
  2. SSH client on LdapClientHost will encrypt & forward data received on port 7001 to SSH server on LdapServerHost;
  3.  SSH Server will decrypt & forward data to LDAP:389. When got data back from LDAP:389, SSH Server will forward data back forth according to the same way as data comes in.

Maybe you'll ask whether we can connect from another host say LdapClientHost2 to LdapClientHost:7001 so that we can use the tunnel? The answer is no, as SSH Local Port forwarding will bind to loopback interface, you'll get "Connection refused" response when you connect from other hosts. But one good thing is that SSH has "-g" opthin which will allows remote hosts to connect to local forwarded ports:

ssh -g -L 7001:localhost:389 <user@LdapServerHost>

For example, on OVS server server_01, there's one VM named testvm_01, which is on vnc port 5926. I tried to connect from my PC using tightvnc viewer with server_01:5926 but failed as it's blocked by firewall. So on another host proxy_vm01, I run the following command:

ssh -g -L 7001:localhost:5926 root@server_01

Later, I connect from my PC using tightvnc viewer with centos-doxer:7001, then I can connect to testvm_01's console.

Anoter note is that, you will want ssh not to disconnect by itself after some time. So you'll need to modify ssh configuration file. Here's more about it: avoid putty ssh connection sever or disconnect or make ssh on linux not to disconnect after some certain time.

SSH Remote Port forwarding

img2On LdapServerHost:

ssh -R 7001:localhost:389 LdapClientHost #with format as ssh -R <local port>:<remote host>:<remote port> <SSH hostname>

This is called SSH Remote Port forwarding as of now, SSH will connect from LDAP server to LDAP client. And the dataflow is the same except that ssh client is now on LDAP server and ssh server is now on LDAP client:

  1. App on LdapClientHost sends data to LdapClientHost:7001;
  2. SSH server on LdapClientHost will encrypt & forward data received on port 7001 to SSH client on LdapServerHost;
  3. SSH client will decrypt & forward data to LDAP:389. When got data back from LDAP:389, SSH client will forward data back forth according to the same way as data comes in.

img3

On SSH Client(C):

ssh -g -L 7001:<B>:389 <D>

Then configure 7001 port on A and C. Please note that traffic between A)<-> (C) and (B)<->(D) are not encrypted by SSH.

One thing is that LDAP Server(B) is using private IP, and so you'll need to set NAT on SSH Server(D). You can take the following article for reference: NAT forwarding for ssh and vncviewer and NAT binding one priviate ip and one public ip together using linux as router.

SSH Dynamic port forwarding

img4

Sometimes there's no fixed service port, for example when we surf the internet, or talking using MSN. But we need protect our data when we using insecure network such as public WIFI. Here's when SSH Dynamic port forwarding comes into use.

ssh -D 7001 <SSH Server> #with format ssh -D <local port> <SSH Server>

After this, SSH will create a SOCKS proxy service. You can set proxy on MSN or browser to use localhost:7001 as SOCKS proxy, and you can browse internet for sites that are blocked on SSH client.

Here's what -D means:

-D [bind_address:]port
Specifies a local 'dynamic' application-level port forwarding. This works by allocating a socket to listen to port on the local side, optionally bound to the specified bind_address. Whenever a connection is made to this port, the connection is forwarded over the secure channel, and the application protocol is then used to determine where to connect to from the remote machine. Currently the SOCKS4 and SOCKS5 protocols are supported, and ssh will act as a SOCKS server. Only root can forward privileged ports. Dynamic port forwardings can also be specified in the configuration file.

SSH X port forwarding

img5
We can got GUI on Linux/Unix/Solaris/HP through VNC or X windows, here we'll take X windows for example.

In here, X client will be Linux/Unix/Solaris/HP servers, and X Server will be our client host(such as your PC). First, you'll need specify X server's location on X client:

export DISPLAY=myDesktop:1.0 #with format export DISPLAY=<X Server IP>:<display #>.<virtual #>

Then run X app on X client(Linux/Unix/Solaris/HP servers), and the GUI will show on X Server(such as your PC).

All goes smooth when there comes a firewall before Linux/Unix/Solaris/HP servers and X protocol is blocked. We now can use SSH port forwarding except for use VNC. And SSH port forwarding has an advantage of security upon VNC.

On X Server(your PC for example):

ssh -X <SSH Server>

Now you can run X app on remote servers, and GUI will show on client host. You can use XMing for example as Xserver when your PC is running Windows, and as for SSH client, putty or Cygwin are all ok. A more guide is Use xming, xshell, putty, tightvnc to display linux gui on windows desktop (x11 forwarding when behind firewall) which you'll find useful if you only want X windows.

resolved – check backend OHS httpd servers for BIG ip F5 LTM VIP

May 23rd, 2014 No comments

Assume you want to check the OHS or httpd servers one LTM VIP example.vip.com is routing traffic to. Then here's the steps:

  1. get the ip address of VIP example.vip.com;
  2. log on LTM's BUI. Local traffic-> virtual servers -> virtual server list, search ip
  3. click "edit" below column "resource"
  4. note down default pool
  5. search pool name in local traffic -> virtual servers -> pools -> pool list
  6. click the number below column members. Then you'll find the OHS servers and ports the VIP will route traffic to.

test telnet from VLAN on cisco router device

May 22nd, 2014 No comments

If you want to test telnet connection from one vlan to specific destination IP, here is the howto:

test-router# telnet 10.200.244.14 80 source vlan 125
Trying 10.200.244.14...
Connected to 10.200.244.14.
Escape character is '^]'.

Good luck.

Resolved – input_userauth_request: invalid user root

May 15th, 2014 2 comments

Today when I tried to ssh to one linux box but it failed, and /var/log/secure gave the following messages:

May 15 04:05:07 testbox sshd[22925]: User root from 10.120.120.188 not allowed because not listed in AllowUsers
May 15 04:05:07 testbox sshd[22928]: input_userauth_request: invalid user root
May 15 04:05:07 testbox unix_chkpwd[22929]: password check failed for user (root)
May 15 04:05:07 testbox sshd[22925]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.120.120.188 user=root
May 15 04:05:09 testbox sshd[22925]: Failed password for invalid user root from 10.120.120.188 port 50362 ssh2
May 15 04:05:10 testbox unix_chkpwd[22930]: password check failed for user (root)
May 15 04:05:11 testbox sshd[22928]: Connection closed by 10.120.120.188

Then I had a check of /etc/ssh/sshd_config and modified the following:

[root@testbox ~]# egrep 'PermitRoot|AllowUser' /etc/ssh/sshd_config
PermitRootLogin yes #change this to yes
#AllowUsers testuser #comment out this

Later, restart sshd, service sshd restart, and later ssh worked.

Categories: IT Architecture, Linux, Systems Tags: ,

resolved – fsinfo ERROR: Stale NFS file handle POST

May 15th, 2014 No comments

Today when I tried mount NFS share from one NFS server, it timeout with "mount.nfs: Connection timed out".

I tried to search something in /var/log/messages but no useful info there was found. So I used tcpdump on NFS client:

[root@dcs-hm1-qa132 ~]# tcpdump -nn -vvv host 10.120.33.90 #server is 10.120.33.90, client is 10.120.33.130
23:49:11.598407 IP (tos 0x0, ttl 64, id 26179, offset 0, flags [DF], proto TCP (6), length 96)
10.120.33.130.1649240682 > 10.120.33.90.2049: 40 null
23:49:11.598741 IP (tos 0x0, ttl 62, id 61186, offset 0, flags [DF], proto TCP (6), length 80)
10.120.33.90.2049 > 10.120.33.130.1649240682: reply ok 24 null
23:49:11.598812 IP (tos 0x0, ttl 64, id 26180, offset 0, flags [DF], proto TCP (6), length 148)
10.120.33.130.1666017898 > 10.120.33.90.2049: 92 fsinfo fh Unknown/0100010000000000000000000000000000000000000000000000000000000000
23:49:11.599176 IP (tos 0x0, ttl 62, id 61187, offset 0, flags [DF], proto TCP (6), length 88)
10.120.33.90.2049 > 10.120.33.130.1666017898: reply ok 32 fsinfo ERROR: Stale NFS file handle POST:
23:49:11.599254 IP (tos 0x0, ttl 64, id 26181, offset 0, flags [DF], proto TCP (6), length 148)
10.120.33.130.1682795114 > 10.120.33.90.2049: 92 fsinfo fh Unknown/010001000000000000002FFF000002580000012C0007B0C00000000A00000000
23:49:11.599627 IP (tos 0x0, ttl 62, id 61188, offset 0, flags [DF], proto TCP (6), length 88)
10.120.33.90.2049 > 10.120.33.130.1682795114: reply ok 32 fsinfo ERROR: Stale NFS file handle POST:

The reason of "ERROR: Stale NFS file handle POST" may caused by the following reasons:

1.The NFS server is no longer available
2.Something in the network is blocking
3.In a cluster during failover of NFS resource the major & minor numbers on the secondary server taking over is different from that of the primary.

To resolve the issue, you can try bounce NFS service on NFS server using /etc/init.d/nfs restart.

Categories: Hardware, NAS, Storage Tags:

resolved – show kitchen sink buttons when wordpress goes to fullscreen mode

April 11th, 2014 No comments

When you click the full-screen button of wordpress TinyMCE, wordpress will go to "Distraction-Free Writing mode", which benefits as the name suggests. However, you'll also find the toolbox of TinyMCE will only show a limited number of buttons and the second line of the toolbox(kitchen sink) will not show at all(I tried install plugin such as ultimate TinyMCE or advanced TinyMCE, but the issue remained):

full-screenPreviously, you can type ALT+SHIFT+G to go to another type of fullscreen mode, which has all buttons include kitchen sink ones. However, seems now the updated version of wordpress has disabled this feature.

To resolve this issue, we can insert the following code in functions.php of your theme:

function my_mce_fullscreen($buttons) {
$buttons[] = 'fullscreen';
return $buttons;
}
add_filter('mce_buttons', 'my_mce_fullscreen');

Later, the TinyMCE will have two full-screen button:

full-screen buttonsMake sure to click the SECOND full-screen button. When you do so, the editor will transform to the following appearance:

full-screen with kitchen sinkI assume this is what you're trying for, right?

 

 

Categories: Misc Tags:

add horizontal line button in wordpress

April 11th, 2014 No comments

There're three methods for you to add a horizontal line button in wordpress:

Firstly, switch to "Text" mode, and enters <hr />.

Secondly, add the following in functions.php of your wordpress theme:

function enable_more_buttons($buttons) {
$buttons[] = 'hr';
return $buttons;
}
add_filter("mce_buttons", "enable_more_buttons");

horizontal line

Thirdly, you can install plugin "Ultimate TinyMCE", and in its setting, you can enable horizontal line button there in one click! This is my recommendation.

ultimate tinymce

Categories: Misc Tags: ,

linux tips

April 10th, 2014 No comments
Linux Performance & Troubleshooting
For Linux Performance & Troubeshooting, please refer to another post - Linux tips - Performance and Troubleshooting
Linux system tips
ls -lu(access time, like cat file) -lt(modification time, like vi, ls -l defaults to use this) -lc(change time, chmod), stat ./aa.txt <UTC>
ctrl +z #bg and stopped
%1     #fg and running
%1 & #bg and running
man dd > dd.txt #or "man dd | col -b > dd.txt"
cat > listbkup.rman << EOF
CONNECT TARGET /
LIST BACKUP;
EOF
pgrep -flu oracle  # processes owned by the user oracle
watch free -m #refresh every 2 seconds
pmap -x 30420 #memory mapping.
openssl s_client -connect localhost:636 -showcerts #verify ssl certificates, or 443
echo | openssl s_client -connect your_url_without_https:443 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | openssl x509 -noout -dates #get expire date
echo | openssl s_client -connect your_url_without_https:443 < /dev/null 2>/dev/null | openssl x509 -text -in /dev/stdin | grep "Signature Algorithm" #get Signature Algorithm.
Signature Algorithm: sha1WithRSAEncryption for SHA1
Signature Algorithm: sha256WithRSAEncryption for SHA2
unable to load certificate for not connectable ones
And here's a script to fullfill this:

#!/bin/sh
for CERT in \
url1:443 \
url2:443 \
url3:443
do
echo "===begin cert ${CERT}==="
echo | openssl s_client -connect ${CERT} < /dev/null 2>/dev/null | openssl x509 -text -in /dev/stdin | grep "Signature Algorithm"|head -n1
done

openssl s_client -connect url:443
openssl x509 -text -in cacert.pem -noout
openssl x509 -dates -in cacert.pem -noout
openssl x509 -purpose -in cacert.pem -noout
openssl req -text -in robots.req.pem  -verify -noout
openssl genrsa -out private.pem 1024
openssl rsa -in private.pem -out public.pem -outform PEM -pubout #get public key from private key
echo 'too many secrets' > file.txt
openssl rsautl -encrypt -inkey public.pem -pubin -in file.txt -out file.ssl #it will only encrypt data up to the key size
openssl rsautl -decrypt -inkey private.pem -in file.ssl -out decrypted.txt
You can generate a self-signed certificate with steps as described here (private key, CSR, generate certificate). More on how SSL works is described here and here. (Signing a message, means authentifying that you have yourself assured the authenticity of the message (most of the time it means you are the author, but not neccesarily). The message can be a text message, or someone else's certificate. To sign a message, you create its hash, and then encrypt the hash with your private key, you then add the encrypted hash and your signed certificate with the message. The recipient will recreate the message hash, decrypts the encrypted hash using your well known public key stored in your signed certificate, check that both hash are equals and finally check the certificate. When an encrypted session is established, the Encryption strength((40-bit, 56-bit, 128-bit, 256-bit)) is determined by the capability of the web browser, SSL certificate, web server, and client computer operating system.)
blockdev --getbsz /dev/xvda1 #get blocksize of FS
dumpe2fs /dev/xvda1 |grep 'Block size'
sed -i.bak2014 's/HISTSIZE=1000/HISTSIZE=100000/' /etc/profile
echo 71_testhost_emgc | sed -re 's/.*_(slce.*)_.*/\1/g' #output testhost
sed -re 's/User_Alias USERS_SDITAS.*/&, xiaozliu/g' a.txt #'&' refers to matched content
echo 'export HISTTIMEFORMAT=%h/%d - %H:%M:%S' >> /etc/profile
Strings
ovm svr ls|sort -rn -k 4 #sort by column 4
cat a1|sort|uniq -c |sort #SUS
ovm svr ls|uniq -f3 #skip the first three columns, this will list only 1 server per pool
for i in <all OVMs>;do (test.sh $i &);done #instead of using nohup &
ovm vm ls|egrep "`echo testhost{0\|,1\|,2\|,3\|,4}|tr -d '[:space:]'`"
cat a|awk '{print $5}'|tr '\n' ' '
awk '{print NF" "FILENAME;exit}' file.txt #get column count of file along with filenames
getopt #getopts is builtin
date -d '1970-1-1 1276059000 sec utc'
date -d '2010-09-11 23:20' +%s
find . -name '*txt'|xargs tar cvvf a.tar
find . -maxdepth 1
for i in `find /usr/sbin/ -type f ! -perm -u+x`;do chmod +x $i;done #files that has no execute permisson for owner
cd /u01/app/oracle/admin/andy/adump; find . -type f -name "*.aud" -mtime +5 -print -exec rm -rf {} \;
find ./test -type f -mtime +2 -print #files modified two days ago, older
find ./test -type f -mtime -2 -print #files modified in recent two days, recent
cd /u01/app/oracle/admin/andy/adump; find . -type f -name "*.aud" -mtime +5 -print -exec rm {} \;
cd /u01/app/oracle/diag/rdbms/andy/andy2/trace/; find . -type f -name "*.trc" -or -name "*.trm" -mtime +5 -print -exec rm {} \;
rm -rf /u01/app/oracle/diag/rdbms/andy/andy2/alert/log_*xml
find ./* -prune -print #-prune,do not cascade
find . -fprint file #put result to file
tar tvf a.tar  --wildcards "*ipp*" #globbing patterns
tar xvf bfiles.tar --wildcards --no-anchored 'b*'
tar --show-defaults
tar cvf a.tar --totals *.txt #show speed
tar --append --file=collection.tar rock #add rock to collection.tar
tar --update -v -f collection.tar blues folk rock classical #only append new or updated ones, not replace
tar --delete --file=collection.tar blues #not on tapes
tar -c -f archive.tar --mode='a+rw'
tar -C sourcedir -cf - . | tar -C targetdir -xf - #copy directories
tar -c -f jams.tar grape prune -C food cherry #-C,change dir, foot file cherry under foot directory
find . -size -400 -print > small-files
tar -c -v -z -T small-files -f little.tgz
tar -cf src.tar --exclude='*.o' src #multiple --exclude can be specified
expr 5 - 1
rpm2cpio ./ash-1.0.1-1.x86_64.rpm |cpio -ivd
eval $cmd
exec menu.viewcards #same to .
ls . | xargs -0 -i cp ./{} /etc #-i,use \n as separator, just like find -exec. -0 for space in filename. find -print0 use space to separate, not enter.(-i or -I {} for revoking filenames in the middle)
ls | xargs -t -i mv {} {}.old #mv source should exclude /,or unexpected errors may occur
mv --strip-trailing-slashes source destination
ls |xargs file /dev/fd/0 #replace -
find . -type d |xargs -i du -sh {} |awk '$1 ~ /G/'
ovm svr ls|awk '$NF ~ /QA_GA_DC2$/'
ypcat passwd|awk -F: '{if($1 ~ /^user1$|^user2$/) print}'|grep false
syminq -pdevfile |awk '!/^#/ {print $1,$4,$5}' #ignore lines started with #
sed -i '/virt[0-9]\{5\}/!d' /var/tmp/*.status #only show SDI names.
ls -l -I "*out*" #not include out
for i in `ls -I shared -I oracle`;do du -sh $i;done #exclude shared and oracle directories
find . -type f -name "*20120606" -exec rm {} \; #do not need rm -rf. find . -type f -exec bash -c "ls -l '{}'" \;
ps -ef|grep init|sed -n '1p'
pstree -aAhlup [ PID | USER ]
cut -d ' ' -f1,3 /etc/mtab #first and third
seq 15 21 #print 15 to 21
seq -s" " 15 21 #or echo {15..21}. use space as separator

 

Categories: IT Architecture, Linux, Systems Tags:

perl tips

April 2nd, 2014 No comments
##arrays
#!/usr/bin/perl -w
my @animals = ("dog", "pig", "cat");
print "The last element of array \$animals is : ".$animals[$#animals]."\n";
print "@animals"."\n"; #will print values of array, delimitered by space
print $#animals."\n"; #the last key number of array, $#animals+1 is the number of array
if(@animals>2){
print "more than 2 animals found\n";
}
else{
print "less than 2 animals found\n"
}
foreach(@animals){
print $_."\n";
}
##hashes
my %fruit_color=("apple", "red", "banana", "yellow");
print "Color of banana is : ".$fruit_color{"banana"}."\n";

for $char (keys %fruit_color)
{
print("$char => $fruit_color{$char}\n");
}

##references
my $variables = {
scalar  =>  {
description => "single item",
sigil => '$',
},
array   =>  {
description => "ordered list of items",
sigil => '@',
},
hash    =>  {
description => "key/value pairs",
sigil => '%',
},
};
print "Scalars begin with a $variables->{'scalar'}->{'sigil'}\n";

##Files and I/O
open (my $passwd, "<", "/etc/passwd2") or die ("can  a not open");
while (<$passwd>) {
print $_ if $_ =~ "test";
}
close $passwd or die "$passwd: $!";
my $next = "doing a first";
$next =~ s/first/second/;
print $next."\n";

my $email = "testaccount\@doxer.org";
if ($email =~ /([^@]+)@(.+)/) {
print "Username is : $1\n";
print "Hostname is : $2\n";
}

##Subroutines
sub multiply{
my ($num1, $num2) = @_;
my $result = $num1 * $num2;
return $result;
}

my $result2 = multiply(3, 5);
print "3 * 5 = $result2\n";

##or
! system('date') or die("failed it"); #if a subroutine returns ok, it'll return 0
PS:
Categories: IT Architecture, Perl, Programming Tags:

resolved – /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory

April 1st, 2014 No comments

When I ran perl command today, I met problem below:

[root@test01 bin]# /usr/local/bin/perl5.8
-bash: /usr/local/bin/perl5.8: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory

Now let's check which package /lib/ld-linux.so.2 belongs to on a good linux box:

[root@test02 ~]# rpm -qf /lib/ld-linux.so.2
glibc-2.5-118.el5_10.2

So here's the resolution to the issue:

[root@test01 bin]# yum install -y glibc.x86_64 glibc.i686 glibc-devel.i686 glibc-devel.x86_64 glibc-headers.x86_64

Categories: IT Architecture, Kernel, Linux, Systems Tags:

resolved – sudo: sorry, you must have a tty to run sudo

April 1st, 2014 4 comments

The error message below sometimes will occur when you run a sudo <command>:

sudo: sorry, you must have a tty to run sudo

To resolve this, you may comment out "Defaults requiretty" in /etc/sudoers(revoked by running visudo). Here is more info about this method.

However, sometimes it's not convenient or even not possible to modify /etc/sudoers, then you can consider the following:

echo -e "<password>\n"|sudo -S <sudo command>

For -S parameter of sudo, you may refer to sudo man page:

-S' The -S (stdin) option causes sudo to read the password from the standard input instead of the terminal device. The password must be followed by a newline character.

So here -S bypass tty(terminal device) to read the password from the standard input. And by this, we can now pipe password to sudo.

Resolved – print() on closed filehandle $fh at ./perl.pl line 6.

March 19th, 2014 No comments

You may find that print sometimes won't work as expected in perl, for example:

[root@centos-doxer test]# cat perl.pl
#!/usr/bin/perl
use warnings;
open($fh,"test.txt");
select $fh;
close $fh;
print "test";

You may expect "test" to be printed, but actually you got error message:

print() on closed filehandle $fh at ./perl.pl line 6.

So how's this happened? Please see my explanation:

[root@centos-doxer test]# cat perl.pl
#!/usr/bin/perl
use warnings;
open($fh,"test.txt");
select $fh;
close $fh; #here you closed $fh filehandle, but you should now reset filehandle to STDOUT
print "test";

Now here's the updated script:

#!/usr/bin/perl
use warnings;
open($fh,"test.txt");
select $fh;
close $fh;
select STDOUT;
print "test";

This way, you'll get "test" as expected!

 

Categories: IT Architecture, Perl, Programming Tags:

set vnc not asking for OS account password

March 18th, 2014 No comments

As you may know, vncpasswd(belongs to package vnc-server) is used to set password for users when connecting to vnc using a vnc client(such as tightvnc). When you connect to vnc-server, it'll ask for the password:

vnc-0After you connect to the host using VNC, you may also find that the remote server will ask again for OS password(this is set by passwd):

vnc-01For some cases, you may not want the second one. So here's the way to cancel this behavior:

vnc-1vnc-2

 

 

Categories: IT Architecture, Linux, Systems Tags: ,

stuck in PXE-E51: No DHCP or proxyDHCP offers were received, PXE-M0F: Exiting Intel Boot Agent, Network boot canceled by keystroke

March 17th, 2014 No comments

If you installed your OS and tried booting up it but stuck with the following messages:

stuck_pxe

Then one possibility is that, the configuration for your host's storage array is not right. For instance, it should be JBOD but you had configured it to RAID6.

Please note that this is only one possibility for this error, you may search for PXE Error Codes you encoutered for more details.

PS:

  • Sometimes, DHCP snooping may prevent PXE functioning, you can read more http://en.wikipedia.org/wiki/DHCP_snooping.
  • STP(Spanning-Tree Protocol) makes each port wait up to 50 seconds before data is allowed to be sent on the port. This Delay in turn can cause problems with some applications/protocols (PXE, Bootworks, etc.). To alleviate the problem, Porfast was implemented on Cisco devices, the terminology might differ between different vendor devices. You can read more http://www.symantec.com/business/support/index?page=content&id=HOWTO6019
  • ARP caching http://www.networkers-online.com/blog/2009/02/arp-caching-and-timeout/

Oracle BI Publisher reports – send mail when filesystems getting full

March 17th, 2014 No comments

Let's assume you have one Oracle BI Publisher report for filesystem checking. And now you want to write script for checking that report page and send mail to system admins when filesystems are getting full. As the default output of Oracle BI Publisher report needs javascript to work, and as you may know javascript is evil that wget/curl can not get them, so after log on, the next step you need to do is to find the html version's url of that report for you to use in your script(and the html page has all records when javascript one has only part of them):

BI_report_login

BI_export_html

 

Let's assume that the html's url is "http://www.example.com:9703/report.html", and the display of it was like the following:

bi report

Then here goes the script that will check this page for hosts that has less than 10% available space and send mail to system admins:

#!/usr/bin/perl
use HTML::Strip;
#hosts that do not need reporting
my @remove_list = qw(host1.example.com host2.example.com);
system("rm -f spacereport.html");
system("wget -q --no-proxy --no-check-certificate --post-data 'id=admin&passwd=password' 'http://www.example.com:9703/report.html' -O spacereport.html");
open($fh,"spacereport.html");

#or just @spacereport=<$fh>;
foreach(<$fh>){
push(@spacereport,$_);
}

#change array to hash
$index=0;
map {$pos{$index++}=$_} @spacereport;

#get location of <table> and </table>
#sort numerically ascending
for $char (sort {$a<=>$b} (keys %pos))
{
if($pos{$char} =~ /<table class="c27">/)
{
$table_start=$char;
}

if($pos{$char} =~ /<\/table>/)
{
$table_end=$char;
}

}

#get contents between <table> and </table>
for($i=$table_start;$i<=$table_end;$i++){
push(@table_array,$spacereport[$i]);
}

$table_htmlstr=join("",@table_array);

#get clear text between <table> and </table>
my $hs=HTML::Strip->new();
my $clean_text = $hs->parse($table_htmlstr);
$hs->eof;

@array_filtered=split("\n",$clean_text);

#remove empty array element
@array_filtered=grep { !/^\s+$/ } @array_filtered;

#remove entries from showing
$remove_list_s=join('|',@remove_list);
@index_all = grep { $array_filtered[$_] =~ /$remove_list_s/ } 0..$#array_filtered;

for($i=0;$i<=$#index_all;$i++) {
@index_all_one = grep { $array_filtered[$_] =~ /$remove_list_s/ } 0..$#array_filtered;
splice(@array_filtered,$index_all_one[0],4);
}

system("rm -f space_mail_warning.txt");
open($fh_mail_warning,">","space_mail_warning.txt");
select $fh_mail_warning;
for($j=4;$j<=$#array_filtered;$j=$j+4){
#put lines that has free space lower than 10% to space_mail_warning.txt
if($array_filtered[$j+2] <= 10){
print "Host: ".$array_filtered[$j]."\n";
print "Part: ".$array_filtered[$j+1]."\n";
print "Free(%): ".$array_filtered[$j+2]."\n";
print "Free(GB): ".$array_filtered[$j+3]."\n";
print "============\n\n";
}
}
close $fh_mail_warning;

system("rm -f space_mail_info.txt");
open($fh_mail_info,">","space_mail_info.txt");
select $fh_mail_info;
for($j=4;$j<=$#array_filtered;$j=$j+4){
#put lines that has free space lower than 15% to space_mail_info.txt
if($array_filtered[$j+2] <= 15){
print "Host: ".$array_filtered[$j]."\n";
print "Part: ".$array_filtered[$j+1]."\n";
print "Free(%): ".$array_filtered[$j+2]."\n";
print "Free(GB): ".$array_filtered[$j+3]."\n";
print "============\n\n";
}
}
close $fh_mail_info;

#send mail
#select STDOUT;
if(-s "space_mail_warning.txt"){
system('cat space_mail_warning.txt | /bin/mailx -s "Space Warning - please work with component owners to free space" sysadmins@example.com');
} elsif(-s "space_mail_info.txt"){
system('cat space_mail_info.txt | /bin/mailx -s "Space Info - Space checking mail" sysadmins@example.com');
}

Categories: IT Architecture, Perl, Programming Tags:

wget and curl tips

March 14th, 2014 No comments

Imagine you want to download all files under http://www.example.com/2013/downloads, and not files under http://www.example.com/2013 except for directory 'downloads', then you can do this:

wget -r --level 100 -nd --no-proxy --no-parent --reject "index.htm*" --reject "*gif" 'http://www.example.com/2013/downloads/' #--level 100 is large enough, as I've seen no site has more than 100 levels of sub-directories so far.

wget -p -k --no-proxy --no-check-certificate --post-data 'id=username&passwd=password' <url> -O output.html

wget --no-proxy --no-check-certificate --save-cookies cookies.txt <url>

wget --no-proxy --no-check-certificate --load-cookies cookies.txt <url>

curl -k -u 'username:password' <url>

curl -k -L -d id=username -d passwd=password <url>

curl --data "loginform:id=username&loginform:passwd=password" -k -L <url>

curl -i -u username:password -H X-Oracle-UserId:myname@example.com -H X-Oracle-IdentityDomain:domainname -X GET "https://login.example.com:443/api/v1/users?userLogin"

Here's one curl example to get SSL certs info on LTM:

#!/bin/bash
path="/var/tmp"
path_root="/var/tmp"

agent="Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; InfoPath.2)"

curl -v -L -k -A "$agent" -c ${path}/cookie "https://ltm-url/tmui/login.jsp?msgcode=1&"

curl -v -L -k -A "$agent" -b ${path}/cookie -c ${path}/cookie -e "https://ltm-url/tmui/login.jsp?msgcode=1&" -d "username=myusername&passwd=mypassword" "https://ltm-url/tmui/logmein.html?msgcode=1&"

curl -v -L -k -A "$agent" -b ${path}/cookie -c ${path}/cookie -o ${path_root}/certs-env.html "https://ltm-url/tmui/Control/jspmap/tmui/locallb/ssl_certificate/list.jsp?&startListIndex=0&showAll=true"

Now you can have a check of /var/tmp/certs-env.html for SSL certs info of Big IP VIPs.

PS:

You can install root certificates to fix error like "ERROR: cannot verify certificate ... issued by ... Unable to locally verify the issuer's authority ... To connect to download.mozilla.org insecurely, use `--no-check-certificate'... Unable to establish SSL connection..". Here's the steps(cacert.pem file is what you are looking for. This file contains > 250 CA certs (don't know how to trust this number of ppl). You need to download this file, split it to individual certificates put them to /usr/ssl/certs (your CApath) and index them.):

# openssl version -a|grep OPENSSLDIR
# curl http://curl.haxx.se/ca/cacert.pem | \awk 'split_after==1{n++;split_after=0} /-----END CERTIFICATE-----/ {split_after=1} {print > "/etc/pki/tls/certs/cert" n ".pem"}'
# c_rehash

resolved – ssh Read from socket failed: Connection reset by peer and Write failed: Broken pipe

March 13th, 2014 No comments

If you met following errors when ssh to linux box:

Read from socket failed: Connection reset by peer

Write failed: Broken pipe

Then there's one possibility that the linux box's filesystem was corrupted. As in my case there's output to stdout:

EXT3-fs error ext3_lookup: deleted inode referenced

To resolve this, you need make linux go to single user mode and fsck -y <filesystem>. You can get corrupted filesystem names when booting:

[/sbin/fsck.ext3 (1) -- /usr] fsck.ext3 -a /dev/xvda2
/usr contains a file system with errors, check forced.
/usr: Directory inode 378101, block 0, offset 0: directory corrupted

/usr: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)

[/sbin/fsck.ext3 (1) -- /oem] fsck.ext3 -a /dev/xvda5
/oem: recovering journal
/oem: clean, 8253/1048576 files, 202701/1048233 blocks
[/sbin/fsck.ext3 (1) -- /u01] fsck.ext3 -a /dev/xvdb
u01: clean, 36575/14548992 files, 2122736/29081600 blocks
[FAILED]

So in this case, I did fsck -y /dev/xvda2 && fsck -y /dev/xvda5. Later reboot host, and then everything went well.

PS:

If two VMs are booted up in two hypervisors and these VMs shared the same filesystem(like NFS), then after fsck -y one FS and booted up the VM, the FS will corrupt soon as there're other copies of itself is using that FS. So you need first make sure that only one copy of VM is running on hypervisors of the same server pool.

Categories: IT Architecture, Kernel, Linux, Systems Tags:

tcpdump & wireshark tips

March 13th, 2014 No comments

tcpdump [ -AdDefIKlLnNOpqRStuUvxX ] [ -B buffer_size ] [ -c count ]

[ -C file_size ] [ -G rotate_seconds ] [ -F file ]
[ -i interface ] [ -m module ] [ -M secret ]
[ -r file ] [ -s snaplen ] [ -T type ] [ -w file ]
[ -W filecount ]
[ -E spi@ipaddr algo:secret,... ]
[ -y datalinktype ] [ -z postrotate-command ] [ -Z user ] [ expression ]

#general format of a tcp protocol line

src > dst: flags data-seqno ack window urgent options
Src and dst are the source and destination IP addresses and ports.
Flags are some combination of S (SYN), F (FIN), P (PUSH), R (RST), W (ECN CWR) or E (ECN-Echo), or a single '.'(means no flags were set)
Data-seqno describes the portion of sequence space covered by the data in this packet.
Ack is sequence number of the next data expected the other direction on this connection.
Window is the number of bytes of receive buffer space available the other direction on this connection.
Urg indicates there is 'urgent' data in the packet.
Options are tcp options enclosed in angle brackets (e.g., <mss 1024>).

tcpdump -D #list of the network interfaces available
tcpdump -e #Print the link-level header on each dump line
tcpdump -S #Print absolute, rather than relative, TCP sequence numbers
tcpdump -s <snaplen> #Snarf snaplen bytes of data from each packet rather than the default of 65535 bytes
tcpdump -i eth0 -S -nn -XX vlan
tcpdump -i eth0 -S -nn -XX arp
tcpdump -i bond0 -S -nn -vvv udp dst port 53
tcpdump -i bond0 -S -nn -vvv host testhost
tcpdump -nn -S -vvv "dst host host1.example.com and (dst port 1521 or dst port 6200)"

tcpdump -vv -x -X -s 1500 -i eth0 'port 25' #traffic on SMTP. -xX to print data in addition to header in both hex/ASCII. use -s 192 to watch NFS traffic(NFS requests are very large and much of the detail won't be printed unless snaplen is increased).

tcpdump -nn -S udp dst port 111 #note that telnet is based on tcp protocol, NOT udp. So if you want to test UDP connection(udp is connection-less), then you must start up the app, then use tcpdump to test.

tcpdump -nn -S udp dst portrange 1-1023

Wireshark Capture Filters (in Capture -> Options)

Wireshark DisplayFilters (in toolbar)

 

EVENT DIAGRAM
Host A sends a TCP SYNchronize packet to Host BHost B receives A's SYNHost B sends a SYNchronize-ACKnowledgementHost A receives B's SYN-ACKHost A sends ACKnowledge

Host B receives ACK.
TCP socket connection is ESTABLISHED.

3-way-handshake
TCP Three Way Handshake
(SYN,SYN-ACK,ACK)

TCP-CLOSE_WAIT

 

The upper part shows the states on the end-point initiating the termination.

The lower part the states on the other end-point.

So the initiating end-point (i.e. the client) sends a termination request to the server and waits for an acknowledgement in state FIN-WAIT-1. The server sends an acknowledgement and goes in state CLOSE_WAIT. The client goes into FIN-WAIT-2 when the acknowledgement is received and waits for an active close. When the server actively sends its own termination request, it goes into LAST-ACK and waits for an acknowledgement from the client. When the client receives the termination request from the server, it sends an acknowledgement and goes into TIME_WAIT and after some time into CLOSED. The server goes into CLOSED state once it receives the acknowledgement from the client.

PS:

You can refer to this article for a detailed explanation of tcp three-way handshake establishing/terminating a connection. And for tcpdump one, you can check below:

[root@host2 ~]# telnet host1 14100
Trying 10.240.249.139...
Connected to host1.us.oracle.com (10.240.249.139).
Escape character is '^]'.
^]
telnet> quit
Connection closed.

[root@host1 ~]# tcpdump -vvv -S host host2
tcpdump: WARNING: eth0: no IPv4 address assigned
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
03:16:39.188951 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto: TCP (6), length: 60) host1.us.oracle.com.14100 > host2.us.oracle.com.18890: S, cksum 0xa806 (correct), 3445765853:3445765853(0) ack 3946095098 win 5792 <mss 1460,sackOK,timestamp 854077220 860674218,nop,wscale 7> #2. host1 ack SYN package by host2, and add it by 1 as the number to identify this connection(3946095098). Then host1 send a SYN(3445765853).
03:16:41.233807 IP (tos 0x0, ttl 64, id 6650, offset 0, flags [DF], proto: TCP (6), length: 52) host1.us.oracle.com.14100 > host2.us.oracle.com.18890: F, cksum 0xdd48 (correct), 3445765854:3445765854(0) ack 3946095099 win 46 <nop,nop,timestamp 854079265 860676263> #5. host1 Ack F(3946095099), and then it send a F just as host2 did(3445765854 unchanged). 

[root@host2 ~]# tcpdump -vvv -S host host1
tcpdump: WARNING: eth0: no IPv4 address assigned
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
03:16:39.188628 IP (tos 0x10, ttl 64, id 31059, offset 0, flags [DF], proto: TCP (6), length: 60) host2.us.oracle.com.18890 > host1.us.oracle.com.14100: S, cksum 0x265b (correct), 3946095097:3946095097(0) win 5792 <mss 1460,sackOK,timestamp 860674218 854045985,nop,wscale 7> #1. host2 send a SYN package to host1(3946095097)
03:16:39.188803 IP (tos 0x10, ttl 64, id 31060, offset 0, flags [DF], proto: TCP (6), length: 52) host2.us.oracle.com.18890 > host1.us.oracle.com.14100: ., cksum 0xed44 (correct), 3946095098:3946095098(0) ack 3445765854 win 46 <nop,nop,timestamp 860674218 854077220> #3. host2 ack the SYN sent by host1, and add 1 to identify this connection. The tcp connection is now established(3946095098 unchanged, ack 3445765854).
03:16:41.233397 IP (tos 0x10, ttl 64, id 31061, offset 0, flags [DF], proto: TCP (6), length: 52) host2.us.oracle.com.18890 > host1.us.oracle.com.14100: F, cksum 0xe546 (correct), 3946095098:3946095098(0) ack 3445765854 win 46 <nop,nop,timestamp 860676263 854077220> #4. host2 send a F(in) with a Ack, F will inform host1 that no more data needs sent(3946095098 unchanged), and ack is uded to identify the connection previously established(3445765854 unchanged)
03:16:41.233633 IP (tos 0x10, ttl 64, id 31062, offset 0, flags [DF], proto: TCP (6), length: 52) host2.us.oracle.com.18890 > host1.us.oracle.com.14100: ., cksum 0xdd48 (correct), 3946095099:3946095099(0) ack 3445765855 win 46 <nop,nop,timestamp 860676263 854079265> #6. host2 ack host1's F(3445765855), and the empty flag to identify the connection(3946095099 unchanged).

psftp through a proxy

March 5th, 2014 No comments

You may know that, we can set proxy in putty for ssh to remote host, as shown below:

putty_proxyAnd if you want to scp files from remote site to your local box, you can use putty's psftp.exe. There're many options for psftp.exe:

C:\Users\test>d:\PuTTY\psftp.exe -h
PuTTY Secure File Transfer (SFTP) client
Release 0.62
Usage: psftp [options] [user@]host
Options:
-V print version information and exit
-pgpfp print PGP key fingerprints and exit
-b file use specified batchfile
-bc output batchfile commands
-be don't stop batchfile processing if errors
-v show verbose messages
-load sessname Load settings from saved session
-l user connect with specified username
-P port connect to specified port
-pw passw login with specified password
-1 -2 force use of particular SSH protocol version
-4 -6 force use of IPv4 or IPv6
-C enable compression
-i key private key file for authentication
-noagent disable use of Pageant
-agent enable use of Pageant
-batch disable all interactive prompts

Although there's proxy setting option for putty.exe, there's no proxy setting for psftp.exe! So what should you do if you want to copy files back to local box, and there's firewall blocking you from doing this directly, and you must use a proxy?

As you may notice, there's "-load sessname" option in psftp.exe:

-load sessname Load settings from saved session

This option means that, if you have session opened by putty.exe, then you can use psftp.exe -load <session name> to copy files from remote site. For example, suppose you opened one session named mysession in putty.exe in which you set proxy there, then you can use "psftp.exe -load mysession" to copy files from remote site(no need for username/password, as you must have entered that in putty.exe session):

C:\Users\test>d:\PuTTY\psftp.exe -load mysession
Using username "root".
Remote working directory is /root
psftp> ls
Listing directory /root
drwx------ 3 ec2-user ec2-user 4096 Mar 4 09:27 .
drwxr-xr-x 3 root root 4096 Dec 10 23:47 ..
-rw------- 1 ec2-user ec2-user 388 Mar 5 05:07 .bash_history
-rw-r--r-- 1 ec2-user ec2-user 18 Sep 4 18:23 .bash_logout
-rw-r--r-- 1 ec2-user ec2-user 176 Sep 4 18:23 .bash_profile
-rw-r--r-- 1 ec2-user ec2-user 124 Sep 4 18:23 .bashrc
drwx------ 2 ec2-user ec2-user 4096 Mar 4 09:21 .ssh
psftp> help
! run a local command
bye finish your SFTP session
cd change your remote working directory
chmod change file permissions and modes
close finish your SFTP session but do not quit PSFTP
del delete files on the remote server
dir list remote files
exit finish your SFTP session
get download a file from the server to your local machine
help give help
lcd change local working directory
lpwd print local working directory
ls list remote files
mget download multiple files at once
mkdir create directories on the remote server
mput upload multiple files at once
mv move or rename file(s) on the remote server
open connect to a host
put upload a file from your local machine to the server
pwd print your remote working directory
quit finish your SFTP session
reget continue downloading files
ren move or rename file(s) on the remote server
reput continue uploading files
rm delete files on the remote server
rmdir remove directories on the remote server
psftp>

Now you can get/put files as we used to now.

PS:

If you do not need proxy connecting to remote site, then you can use psftp.exe CLI to get remote files directly. For example:

d:\PuTTY\psftp.exe root@54.185.16.132 -i d:\PuTTY\aws.ppk -b d:\PuTTY\script.scr -bc -be -v

And in d:\PuTTY\script.scr is script for put/get files:

cd /backup
lcd c:\
mget *.tar.gz
close

Categories: IT Architecture, Linux, Systems Tags: ,

checking MTU or Jumbo Frame settings with ping

February 14th, 2014 No comments

You may set your linux box's MTU to jumbo frame sized 9000 bytes or larger, but if the switch your box connected to does not have jumbo frame enabled, then your linux box may met problems when sending & receiving packets.

So how can we get an idea of whether Jumbo Frame enabled on switch or linux box?

Of course you can log on switch and check, but we can also verify this from linux box that connects to switch.

On linux box, you can see the MTU settings of each interface using ifconfig:

[root@centos-doxer ~]# ifconfig eth0
eth0 Link encap:Ethernet HWaddr 08:00:27:3F:C5:08
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:50502 errors:0 dropped:0 overruns:0 frame:0
TX packets:4579 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:9835512 (9.3 MiB) TX bytes:1787223 (1.7 MiB)
Base address:0xd010 Memory:f0000000-f0020000

As stated above, 9000 here doesn't mean that Jumbo Frame enabled on your box to switch. As you can verify with below command:

[root@testbox ~]# ping -c 2 -M do -s 1472 testbox2
PING testbox2.example.com (192.168.29.184) 1472(1500) bytes of data. #so here 1500 bytes go through the network
1480 bytes from testbox2.example.com (192.168.29.184): icmp_seq=1 ttl=252 time=0.319 ms
1480 bytes from testbox2.example.com (192.168.29.184): icmp_seq=2 ttl=252 time=0.372 ms

--- testbox2.example.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.319/0.345/0.372/0.032 ms
[root@testbox ~]#
[root@testbox ~]#
[root@testbox ~]# ping -c 2 -M do -s 1473 testbox2
PING testbox2.example.com (192.168.29.184) 1473(1501) bytes of data. #so here 1501 bytes can not go through. From here we can see that MTU for this box is 1500, although ifconfig says it's 9000
From testbox.example.com (192.168.28.40) icmp_seq=1 Frag needed and DF set (mtu = 1500)
From testbox.example.com (192.168.28.40) icmp_seq=1 Frag needed and DF set (mtu = 1500)

--- testbox2.example.com ping statistics ---
0 packets transmitted, 0 received, +2 errors

Also, if your the switch is Cisco one, you can verify whether the switch port connecting server has enabled jumbo frame or not by sniffing CDP (Cisco discover protocol) packet. Here's one example:

-bash-4.1# tcpdump -i eth0 -nn -v -c 1 ether[20:2] == 0x2000 #ether[20:2] == 0x2000 means capture only packets that have a 2 byte value of hex 2000 starting at byte 20
tcpdump: WARNING: eth0: no IPv4 address assigned
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
03:44:14.221022 CDPv2, ttl: 180s, checksum: 692 (unverified), length 287
Device-ID (0x01), length: 46 bytes: 'ucf-c1z3-swi-5k01b.ucf.oracle.com(SSI16010QJH)'
Address (0x02), length: 13 bytes: IPv4 (1) 192.168.0.242
Port-ID (0x03), length: 16 bytes: 'Ethernet111/1/12'
Capability (0x04), length: 4 bytes: (0x00000228): L2 Switch, IGMP snooping
Version String (0x05), length: 66 bytes:
Cisco Nexus Operating System (NX-OS) Software, Version 5.2(1)N1(4)
Platform (0x06), length: 11 bytes: 'N5K-C5548UP'
Native VLAN ID (0x0a), length: 2 bytes: 123
AVVID trust bitmap (0x12), length: 1 byte: 0x00
AVVID untrusted ports CoS (0x13), length: 1 byte: 0x00
Duplex (0x0b), length: 1 byte: full
MTU (0x11), length: 4 bytes: 1500 bytes #so here MTU size was set to 1500 bytes
System Name (0x14), length: 18 bytes: 'ucf-c1z3-swi-5k01b'
System Object ID (not decoded) (0x15), length: 14 bytes:
0x0000: 060c 2b06 0104 0109 0c03 0103 883c
Management Addresses (0x16), length: 13 bytes: IPv4 (1) 10.131.144.17
Physical Location (0x17), length: 13 bytes: 0x00/snmplocation
1 packets captured
1 packets received by filter
0 packets dropped by kernel
110 packets dropped by interface

PS:

  1. As for "-M do" parameter for ping, you may refer to man ping for more info. And as for DF(don't fragment) and Path MTU Discovery mentioned in the manpage, you may read more on http://en.wikipedia.org/wiki/Path_MTU_discovery and http://en.wikipedia.org/wiki/IP_fragmentation
  2. Here's more on tcpdump tips http://dazdaztech.wordpress.com/2013/05/17/using-tcpdump-to-see-cdp-or-lldp-packets/ and http://the-welters.com/professional/tcpdump.html
  3. Maximum packet size is the MTU plus the data-link header length. Packets are not always transmitted at the Maximum packet size. As we can see from output of iptraf -z eth0.
  4. Here's more about MTU:

The link layer, which is typically Ethernet, sends information into the network as a series of frames. Even though the layers above may have pieces of information much larger than the frame size, the link layer breaks everything up into frames(which in payload encloses IP packet such as TCP/UDP/ICMP) to send them over the network. This maximum size of data in a frame is known as the maximum transfer unit (MTU). You can use network configuration tools such as ip or ifconfig to set the MTU.

The size of the MTU has a direct impact on the efficiency of the network. Each frame in the link layer has a small header, so using a large MTU increases the ratio of user data to overhead (header). When using a large MTU, however, each frame of data has a higher chance of being corrupted or dropped. For clean physical links, a high MTU usually leads to better performance because it requires less overhead; for noisy links, however, a smaller MTU may actually enhance performance because less data has to be re-sent when a single frame is corrupted.

Here's one image of layers of network frames:

layers-of-network-frames

 

Oracle VM operations – poweron, poweroff, status, stat -r

January 27th, 2014 No comments

Here's the script:
#!/usr/bin/perl
#1.OVM must be running before operations
#2.run ovm_vm_operation.pl status before running ovm_vm_operation.pl poweroff or poweron
use Net::SSH::Perl;
$host = $ARGV[0];
$operation = $ARGV[1];
$user = 'root';
$password = 'password';

$newname=$ARGV[2];
$newcpu=$ARGV[3];
$newmemory=$ARGV[4];
$newpool=$ARGV[5];
$newtmpl=$ARGV[6];
$newbridge=$ARGV[7];
$newbridge2=$ARGV[8];
$newvif='vif0';
$newvif2='VIF1';

if($host eq "help") {
print "$0 OVM-name status|poweron|poweroff|reboot|stat-r|stat-r-all|pool|new vmname 1 4096 poolname tmplname FE BE\n";
exit;
}

$ssh = Net::SSH::Perl->new($host);
$ssh->login($user,$password);

if($operation eq "status") {
($stdout,$stderr,$exit) = $ssh->cmd("ovm -uadmin -ppassword vm ls|grep -v VM_test");
open($host_fd,'>',"/var/tmp/${host}.status");
select $host_fd;
print $stdout;
close $host_fd;
} elsif($operation eq "poweroff") {
open($poweroff_fd,'<',"/var/tmp/${host}.status");
foreach(<$poweroff_fd>){
if($_ =~ "Server_Pool|OVM|Powered") {
next;
}
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+([a-zA-Z]{1,})\s+(.*)/){
$ssh->cmd("ovm -uadmin -ppassword vm poweroff -n $1 -s $6 -f");
sleep 12;
}
}
} elsif($operation eq "reboot") {
open($poweroff_fd,'<',"/var/tmp/${host}.status");
foreach(<$poweroff_fd>){
if($_ =~ "Server_Pool|OVM|Powered") {
next;
}
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+([a-zA-Z]{1,})\s+(.*)/){
$ssh->cmd("ovm -uadmin -ppassword vm reboot -n $1 -s $6");
sleep 12;
}
}
} elsif($operation eq "poweron") {
open($poweron_fd,'<',"/var/tmp/${host}.status");
foreach(<$poweron_fd>){
if($_ =~ "Server_Pool|OVM|Running|used|poweroff") {
next;
}
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+([a-zA-Z]{1,})\s+Off(.*)/){
$ssh->cmd("ovm -uadmin -ppassword vm poweron -n $1 -s $6");
#print "ovm -uadmin -ppassword vm poweron -n $1 -s $6";
sleep 15;
}
}
} elsif($operation eq "stat-r") {
open($poweroff_fd,'<',"/var/tmp/${host}.status");
foreach(<$poweroff_fd>){
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+(Shutting\sDown|Initializing|Error|Unknown|Rebooting|Deleting)\s+(.*)/){
#print "ovm -uadmin -ppassword vm stat -r -n $1 -s $6";
$ssh->cmd("ovm -uadmin -ppassword vm stat -r -n $1 -s $6");
sleep 1;
}
}
} elsif($operation eq "stat-r-all") {
open($poweroff_fd,'<',"/var/tmp/${host}.status");
foreach(<$poweroff_fd>){
$ssh->cmd("ovm -uadmin -ppassword vm stat -r -n $1 -s $6");
sleep 1;
}
} elsif($operation eq "pool") {
($stdoutp,$stderrp,$exitp) = $ssh->cmd("ovm -uadmin -ppassword svrp ls|grep Inactive");
open($host_fdp,'>',"/var/tmp/${host}-poolstatus");
select $host_fdp;
print $stdoutp;
close $host_fdp;
} elsif($operation eq "new") {
($stdoutp,$stderrp,$exitp) = $ssh->cmd("ovm -uadmin -ppassword tmpl ls -s $newpool | grep $newtmpl");
if($stdoutp =~ /$newtmpl/){
($stdoutp2,$stderrp2,$exitp2) = $ssh->cmd("ovm -uadmin -ppassword vm new -m template -s $newpool -t $newtmpl -n $newname -c password");
if($stdoutp2 =~ /is being created/){
print "Creating VM $newname in pool $newpool on OVMM $host now!"."\n";
while(1){
($stdoutp3,$stderrp3,$exitp3) = $ssh->cmd("ovm -uadmin -ppassword vm stat -n $newname -s $newpool");
if($stdoutp3 =~ /Powered Off/){
print "Done VM creation."."\n";
last;
}
sleep 300
}

print "Setting Cpu/Memory now."."\n";
($stdoutp32,$stderrp32,$exitp32) = $ssh->cmd("ovm -uadmin -ppassword vm conf -n $newname -s $newpool -x $newmemory -m $newmemory -c $newcpu -P");
sleep 2;

print "Creating NICs now."."\n";
($stdoutp4,$stderrp4,$exitp4) = $ssh->cmd("ovm -uadmin -ppassword vm nic conf -n $newname -s $newpool -N $newvif -i VIF0 -b $newbridge");
sleep 2;
($stdoutp5,$stderrp5,$exitp5) = $ssh->cmd("ovm -uadmin -ppassword vm nic add -n $newname -s $newpool -N $newvif2 -b $newbridge2");
sleep 2;

print "Powering on VM now."."\n";
($stdoutp6,$stderrp6,$exitp6) = $ssh->cmd("ovm -uadmin -ppassword vm poweron -n $newname -s $newpool");
sleep 30;

while(1){
($stdoutp7,$stderrp7,$exitp7) = $ssh->cmd("ovm -uadmin -ppassword vm info -n $newname -s $newpool");
if($stdoutp7 =~ /Running on: sl/){
print "VM is now Running, you can configure VM on hypervisor now:"."\n";
print $stdoutp7."\n";
last;
}
sleep 30;
}

#($stdoutp8,$stderrp8,$exitp8) = $ssh->cmd("ovm -uadmin -ppassword vm ls -l | grep $newname");
#print "You can configure VM on hypervisor now:"."\n";
#print $stdoutp8."\n";
} else {
print $stdoutp2."\n";
exit;
}
} else {
print "No template named $newtmpl in pool $newpool\n";
exit;
}
}

You can use the following to make the script run in parallel:

for i in <all OVMs>;do (./ovm_vm_operation.pl $i status &);done

avoid putty ssh connection sever or disconnect

January 17th, 2014 2 comments

After sometime, ssh will disconnect itself. If you want to avoid this, you can try run the following command:

while [ 1 ];do echo hi;sleep 60;done &

This will print message "hi" every 60 seconds on the standard output.

PS:

You can also set some parameters in /etc/ssh/sshd_config, you can refer to http://www.doxer.org/make-ssh-on-linux-not-to-disconnect-after-some-certain-time/

“Include snapshots” made NFS shares from ZFS appliance shrinking

January 17th, 2014 No comments

Today I met one weird issue when checking one NFS share mounted from ZFS appliance. The NFS filesystem mounted on client was shrinking when I removed files as the space on that filesystem was getting low. But what made me confused was that the filesystem's size would getting lower! Shouldn't the free space getting larger and the size keep unchanged?

After some debugging, I found that this was caused by ZFS appliance shares' "Include snapshots". When I uncheck "Include snapshots", the issue was gone!

zfs-appliance

Categories: Hardware, NAS, Storage Tags:

resolved – ESXi Failed to lock the file

January 13th, 2014 No comments

When I was power on one VM in ESXi, one error occurred:

An error was received from the ESX host while powering on VM doxer-test.
Cannot open the disk '/vmfs/volumes/4726d591-9c3bdf6c/doxer-test/doxer-test_1.vmdk' or one of the snapshot disks it depends on.
Failed to lock the file

And also:

unable to access file since it is locked

This apparently was caused by some storage issue. I firstly googled and found most of the posts were telling stories about ESXi working mechanism, and I tried some of them but with no luck.

Then I thought of that our storage datastore was using NFS/ZFS, and NFS has file lock issue as you know. So I mount the nfs share which datastore was using and removed one file named lck-c30d000000000000. After this, the VM booted up successfully! (or we can log on ESXi host, and remove lock file there also)

install java jdk on linux

January 7th, 2014 No comments

Here's the steps if you want to install java on linux:

wget <path to jre-7u25-linux-x64.rpm> -P /tmp
rpm -ivh /tmp/jre-7u25-linux-x64.rpm
mkdir -p /root/.mozilla/plugins
rm -f /root/.mozilla/plugins/libnpjp2.so
ln -s /usr/java/jre1.7.0_25/lib/amd64/libnpjp2.so /root/.mozilla/plugins/libnpjp2.so
ll /root/.mozilla/plugins/libnpjp2.so

PS:

  • You'll need to install jre i386 version if your firefox browser is 32 bits. And you can install jre6 from here. You should download packages like "jre-6u33-linux-i586-rpm.bin" and run chmod +x jre-6u33-linux-i586-rpm.bin && ./jre-6u33-linux-i586-rpm.bin after that. You may locate /usr/java/jre1.6.0_33/bin/javaws for opening remote console when prompted.
  • If you want to install java plugin of firefox on linux, or even install firefox under linux, then you can refer to this article.