linux process accounting set up

July 8th, 2014

Ensure package psacct is installed and make it boot with system:

rpm -qa|grep -i psacct
chkconfig psacct on
service psacct start

Here're some useful commands

[root@qg-dc2-tas_sdi ~]# ac -p #Display time totals for each user
emcadm 0.00
test1 2.57
aime 37.04
oracle 32819.22
root 12886.86
testuser 1.47
total 45747.15

[root@qg-dc2-tas_sdi ~]# lastcomm testuser #Display command executed by user testuser
top testuser pts/5 0.02 secs Fri Jul 4 03:59
df testuser pts/5 0.00 secs Fri Jul 4 03:59

[root@qg-dc2-tas_sdi ~]# lastcomm top #Search the accounting logs by command name
top testuser pts/5 0.03 secs Fri Jul 4 04:02

[root@qg-dc2-tas_sdi ~]# lastcomm pts/5 #Search the accounting logs by terminal name pts/5
top testuser pts/5 0.03 secs Fri Jul 4 04:02
sleep X testuser pts/5 0.00 secs Fri Jul 4 04:02

[root@qg-dc2-tas_sdi ~]# sa |head #Use sa command to print summarizes information(e.g. the number of times the command was called and the system resources used) about previously executed commands.
332 73.36re 0.03cp 8022k
33 8.76re 0.02cp 7121k ***other*
14 0.02re 0.01cp 26025k perl
7 0.00re 0.00cp 16328k ps
49 0.00re 0.00cp 2620k find
42 0.00re 0.00cp 13982k grep
32 0.00re 0.00cp 952k tmpwatch
11 0.01re 0.00cp 13456k sh
11 0.00re 0.00cp 2179k makewhatis*
8 0.01re 0.00cp 2683k sort

[root@qg-dc2-tas_sdi ~]# sa -u |grep testuser #Display output per-user
testuser 0.00 cpu 14726k mem sleep
testuser 0.03 cpu 4248k mem top
testuser 0.00 cpu 22544k mem sshd *
testuser 0.00 cpu 4170k mem id
testuser 0.00 cpu 2586k mem hostname

[root@qg-dc2-tas_sdi ~]# sa -m | grep testuser #Display the number of processes and number of CPU minutes on a per-user basis
testuser 22 8.18re 0.00cp 7654k

Categories: IT Architecture, Linux, Systems, Unix Tags:

Enable NIS client on linux host

July 2nd, 2014

After you set up NIS server, you need set up NIS client. Here's the steps for enabling NIS client on linux box.

Ensure required packages are installed

rpm -qa|egrep 'yp-tools|ypbind|portmap'

Edit /etc/sysconfig/network

NISDOMAIN=example.com

Edit /etc/yp.conf
domain example.com server 10.229.169.88
domain example.com server 10.229.192.99

Set NIS domain-name

domainname example.com
ypdomainname example.com

Set /etc/nsswitch.conf

passwd: files nis
shadow: files nis
group: files nis
hosts: files dns nis
bootparams: nisplus [NOTFOUND=return] files
ethers: files
netmasks: files
networks: files
protocols: files
rpc: files
services: files
netgroup: nisplus
publickey: nisplus
automount: files nisplus
aliases: files nisplus
sudoers: files nis

Make sure the portmap service is running:

service portmap start

chkconfig portmap on

Start ypbind service:

service ypbind start
chkconfig ypbind on

Test it out:

rpcinfo -u localhost ypbind

ypcat passwd|egrep 'username'

If you want to set up sudo privileges for NIS users, then you can refer to this article resolved – /etc/sudoers: syntax error near line 10

PS:

If there's firewall between Linux NIS clients and NIS servers, then you should not startup ypbind(chkconfig ypbind off; service ypbind stop), if you startup ypbind, then the box will try to connect to NIS servers without stopping. Your linux box will get stuck and will take a long time for you to log on even as root. This is rule of thumb.

Categories: IT Architecture, Linux, Systems, Unix Tags:

resolved – /etc/sudoers: syntax error near line 10

July 2nd, 2014

When using /usr/sbin/visudo, after modification, errors occurred:

>>> /etc/sudoers: syntax error near line 10 <<<

Here's line 10:

User_Alias Users_SDITAS = username1, username2

Then I changed it as following:

User_Alias USERS_SDITAS = username1, username2

And now everything is ok. So this means that the alias name must all be uppercase.

PS:
1. Here's the explanation about User_Alias Users_SDITAS = username1, username2

The first part is the user,
The second is the terminal from where the user can use sudo command,
The third part is which users he may act as,
The last one, is which commands he may run when using sudo.
For example, root ALL=(ALL) ALL, means the root user can execute from ALL terminals, acting as ALL (any) users, and run ALL (any) command. And USERS_SDITAS ALL=(oracle) NOPASSWD:SETENV: CMD_MIGRATIONDC1DC3 means users in group USERS_SDITAS can execute from ALL terminals, acting as oracle user, and run commands in group CMD_MIGRATIONDC1DC3. (sudo -E -u oracle <command>, -E will pass invoking users env variables to target user if SETENV tag is added to sudo commands in /etc/sudoers. You'll get error message "sudo: sorry, you are not allowed to preserve the environment" if you did not add SETENV tag in /etc/sudoers. You can run sudo -l or sudo -ll to get a list of privilege commands for you or for others if you run sudo -l -U <username> )

2. One sample of /etc/sudoers configuration in linux(use visudo to edit, as visudo can check for errors after modification. You may need set "echo 'export PATH=/usr/bin:$PATH' >> /etc/profile" in some circumstances so that sudo will be /usr/bin/sudo):

Defaults logfile=/var/log/sudo.log

Defaults always_set_home #switched to target user's home directory when running sudo. Note that HOME is already set when the the env_reset option is enabled, so always_set_home is only effective for configurations where either env_reset is disabled(Defaults !env_reset) or HOME is present in the env_keep list(Defaults env_keep += HOME). This flag is off by default.
Host_Alias HOSTS_MIGRATIONDC1DC3 = slcn06vmf0012, slcn06vmf0013
Cmnd_Alias CMD_MIGRATIONDC1DC3 = /u01/local/wls/user_projects/domains/base_domain/bin/tasctl, /u01/shared/wls/Oracle_SDI1/sdictl/sdictl.sh
User_Alias USERS_SDITAS =username1, username2
USERS_SDITAS ALL=(ALL) NOPASSWD: /bin/su - oracle #users in USERS_SDITAS group can now sudo su - oracle without asking for a password
oracle ALL=(ALL) NOPASSWD:SETENV: CMD_MIGRATIONDC1DC3 #oracle user can run all commands in commands group CMD_MIGRATIONDC1DC3.

3. To check  whether some NIS users are using/bin/false shell(means they can not log on the host by ssh), use the following commands:

ypcat passwd|awk -F: '{if($1 ~ /^username1$|^username2$/) print}'|grep false

Categories: IT Architecture, Linux, Systems, Unix Tags: ,

Resolved – Your boot partition is on a disk using the GPT partitioning scheme but this machine cannot boot using GPT

June 12th, 2014

Today when I tried to install Oracle VM Server on one server, the following error occurred:

Your boot partition is on a disk using the GPT partitioning scheme but this machine cannot boot using GPT. This can happen if there is not enough space on your hard drive(s) for the installation.

So to went on with the installation, I had to think of a way to erase GPT partition table on the drive.

To do this, the first step is to fall into linux rescue mode when booting from CDROM:

rescue

Later, check with fdisk -l, I could see that /dev/sda was the only disk that needed erasing GPT label. So I used dd if=/dev/zero of=/dev/sda bs=512 count=1 to erase GPT table:

 

fdisk_dd

 

After this, run fdisk -l again, I saw that the partition table was now gone:

fdisk_dd_2

Later, re-initializing installation of OVS server. When the following message prompted, select "No":

select_no

And select "yes" when below message prompted so that we can make new partition table:

select_yes

The steps after this was normal ones, and the installation went smoothly.

Resolved – rm cannot remove some files with error message “Device or resource busy”

June 11th, 2014

If you meet problem when remove one file on linux with below error message:

[root@test-host ~]# rm -rf /u01/shared/*
rm: cannot remove `/u01/shared/WLS/oracle_common/soa/modules/oracle.soa.mgmt_11.1.1/.nfs0000000000004abf00000001': Device or resource busy
rm: cannot remove `/u01/shared/WLS/oracle_common/modules/oracle.jrf_11.1.1/.nfs0000000000005c7a00000002': Device or resource busy
rm: cannot remove `/u01/shared/WLS/OracleHome/soa/modules/oracle.soa.fabric_11.1.1/.nfs0000000000006bcf00000003': Device or resource busy

Then it means that some progresses were still referring to these files. You have to stop these processes before remove these files. You can use linux command lsof to find the processes using specific files:

[root@test-host ~]# lsof |grep nfs0000000000004abf00000001
java 2956 emcadm mem REG 0,21 1095768 19135 /u01/shared/WLS/oracle_common/soa/modules/oracle.soa.mgmt_11.1.1/.nfs0000000000004abf00000001 (slce49sn-nas:/export/C9QA123_DC1/tas_central_shared)
java 2956 emcadm 88r REG 0,21 1095768 19135 /u01/shared/WLS/oracle_common/soa/modules/oracle.soa.mgmt_11.1.1/.nfs0000000000004abf00000001 (slce49sn-nas:/export/C9QA123_DC1/tas_central_shared)

So from here you can see that processe with PID 2956 is still using file /u01/shared/WLS/oracle_common/soa/modules/oracle.soa.mgmt_11.1.1/.nfs0000000000004abf00000001.

However, some systems have no lsof installed by default. Then you can install it or by using the alternative one "fuser":

[root@test-host ~]# fuser -cu /u01/shared/WLS/oracle_common
/u01/shared/WLS/oracle_common: 2956m(emcadm) 7358c(aime)

Then you can see also that progresses with PIDs 2956 and 7358 are referring to the directory /u01/shared/WLS/oracle_common.

so you'll need stop the process first by killing it(or stop it using the processes own stop() method if defined):

kill -9 2956

After that, you can try remove the files again, should be ok this time.

Categories: IT Architecture, Kernel, Linux, Systems, Unix Tags:

Resolved – failed Exception check srv hostname/IP failedException Invalid hostname/IP configuration ocfs2 config failed Obsolete nodes found

June 3rd, 2014

Today when I tried to add two OVS servers into one server pool, errors were met. The first one was like below:

2014-06-03 04:26:08.965 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:09.485 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:09.497 NOTIFICATION [Server Pool Management][Server][hostname1.example.com]:Check agent (hostname1.example.com) connectivity.
2014-06-03 04:26:12.463 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:12.985 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:12.997 NOTIFICATION [Server Pool Management][Server][hostname1.example.com]:Check agent (hostname1.example.com) connectivity.
2014-06-03 04:26:13.004 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:13.522 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:13.535 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:13.980 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:16.307 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:16.831 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:16.844 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:17.284 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:17.290 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:17.814 NOTIFICATION Checking agent hostname1.example.com is active or not?
2014-06-03 04:26:17.827 NOTIFICATION Judging the server hostname1.example.com has been managed or not...
2014-06-03 04:26:18.272 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:26:18.279 NOTIFICATION Getting agent version for agent:hostname1.example.com ...
2014-06-03 04:26:18.799 NOTIFICATION Regisering server:hostname1.example.com...
2014-06-03 04:26:21.749 NOTIFICATION Register Server: hostname1.example.com success
2014-06-03 04:26:21.751 NOTIFICATION Getting host info for server:hostname1.example.com ...
2014-06-03 04:26:23.894 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Add server (hostname1.example.com) to server pool (DC1_DMZ_Service_Mid) starting.
failed:<Exception: check srv('hostname1.example.com') hostname/IP failed! => <Exception: Invalid hostname/IP configuration: hostname=hostname1;ip=10.200.225.127>
2014-06-03 04:26:33.348 ERROR [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:During adding servers ([hostname1.example.com]) to server pool (DC1_DMZ_Service_Mid), Cluster setup failed: (OVM-1011 OVM Manager communication with materhost for operation HA Setup for Oracle VM Agent 2.2.0 failed:
failed:<Exception: check srv('hostname1.example.com') hostname/IP failed! => <Exception: Invalid hostname/IP configuration: hostname=hostname1;ip=10.200.225.127>

Also there's error message like below:

2014-06-03 04:59:11.003 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:11.524 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:11.536 NOTIFICATION [Server Pool Management][Server][hostname1-fe.example.com]:Check agent (hostname1-fe.example.com) connectivity.
2014-06-03 04:59:15.484 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:16.005 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:16.016 NOTIFICATION [Server Pool Management][Server][hostname1-fe.example.com]:Check agent (hostname1-fe.example.com) connectivity.
2014-06-03 04:59:16.025 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:16.546 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:16.559 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:17.014 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:18.950 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:19.470 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:19.483 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:19.926 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:19.955 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:20.476 NOTIFICATION Checking agent hostname1-fe.example.com is active or not?
2014-06-03 04:59:20.490 NOTIFICATION Judging the server hostname1-fe.example.com has been managed or not...
2014-06-03 04:59:20.943 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Check prerequisites to add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) succeed
2014-06-03 04:59:20.947 NOTIFICATION Getting agent version for agent:hostname1-fe.example.com ...
2014-06-03 04:59:21.471 NOTIFICATION Regisering server:hostname1-fe.example.com...
2014-06-03 04:59:24.439 NOTIFICATION Register Server: hostname1-fe.example.com success
2014-06-03 04:59:24.439 NOTIFICATION Getting host info for server:hostname1-fe.example.com ...
2014-06-03 04:59:26.577 NOTIFICATION [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:Add server (hostname1-fe.example.com) to server pool (DC1_DMZ_Service_Mid) starting.
failed:<Exception: check srv('hostname1-fe.example.com') ocfs2 config failed! => <Exception: Obsolete nodes found: >
2014-06-03 04:59:37.100 ERROR [Server Pool Management][Server Pool][DC1_DMZ_Service_Mid]:During adding servers ([hostname1-fe.example.com]) to server pool (DC1_DMZ_Service_Mid), Cluster setup failed: (OVM-1011 OVM Manager communication with materhost for operation HA Setup for Oracle VM Agent 2.2.0 failed:
failed:<Exception: check srv('hostname1-fe.example.com') ocfs2 config failed! => <Exception: Obsolete nodes found: >

Then I was confused about "Obsolete nodes found" it complained. I could confirm that I've removed hostname1.example.com, and even after I checked OVM DB in OVS.OVS_SERVER, there's no record about hostname1.example.com.

Then after some searching, these errors were caused by obsolete info in OCFS2(Oracle Cluster File System). We should edit file /etc/ocfs2/cluster.conf and remove obsolete entries.

-bash-3.2# vi /etc/ocfs2/cluster.conf
node:
        ip_port     = 7777
        ip_address  = 10.200.169.190
        number      = 0
        name        = hostname1
        cluster     = ocfs2

node:
        ip_port     = 7777
        ip_address  = 10.200.169.191
        number      = 1
        name        = hostname2
        cluster     = ocfs2

cluster:
        node_count  = 2
        name        = ocfs2

So hostname2 was no longer needed or the IP address of hostname2 was changed, then you should remove entries related to hostname2, and modify node_count to 1. Later bounce ocfs2/o2cb services:

service ocfs2 restart

service o2cb restart

Later, I tried add OVS server again, and it worked! (Before adding that OVS server back, we need first remove its ovs-agent db: service ovs-agent stop; mv /etc/ovs-agent/db /var/tmp/db.bak.5; service ovs-agent start, and then configure ovs-agent service ovs-agent configure. You can also use /opt/ovs-agent-2.3/utils/cleanup.py to clean up too.)