Archive

Archive for the ‘Kernel’ Category

solaris kernel bug – ack replied before sync/ack valid outbound packets dropped

April 9th, 2012 No comments

If you intermittently getting the following error “ldapserver.test.com:389; socket closed.” , and after some tcpdumping you may find the following:

From the network traffic analysing you may find the following incorrect package exchange chain exists:

testhost1 — > testhost2 (SYN)
testhost1 < — testhost2 (ACK) — on this point should be sent SYN ACK package
testhost1 — > testhost2 (RST) – respectively in case when it didn’t receive SYN/ACK – client initiate reset TCP connection

Actually this is a solaris kernel bug, more info you can refer to
The workaround is running this:
ndd -set /dev/ip ip_ire_arp_interval 999999999
After this, the packet drop to 1 per week per host.

More info about this kernel bug can be found here http://wesunsolve.net/bugid/id/6942436

Categories: Kernel, Unix Tags:

hostname is different between linux and solaris

February 21st, 2012 No comments

1. For linux, -a is a option for the command hostname:
-a, –alias
Display the alias name of the host (if used).
For example:
[root@linux ~]# hostname -a
linux localhost.localdomain localhost
[root@linux ~]# grep linux /etc/hosts
127.0.0.1 linux.doxer.org linux localhost.localdomain localhost

2.For solaris:

But for solaris, there’s no -a option, which means, if you run hostname -a on a solaris box, you’re actually setting the hostname to “-a”, which in turn will cause many problem especially ldap.

Categories: Kernel, Linux, Unix Tags:

Too many cron jobs and crond processes running

February 17th, 2012 No comments

I faced a problem that a ton of crond processes(cronjobs, or crontab) were running on the OS:

root@localhost# ps auxww|grep cron
vare 543 0.0 0.0 141148 5904 ? S 01:43 0:00 crond
root 4085 0.0 0.0 72944 976 ? Ss 2010 1:13 crond
vare 4522 0.0 0.0 141148 5904 ? S Feb16 0:00 crond
vare 5446 0.0 0.0 141148 5904 ? S 02:43 0:00 crond
vare 9202 0.0 0.0 141148 5904 ? S Feb16 0:00 crond
vare 10245 0.0 0.0 141148 5908 ? S 03:43 0:00 crond
vare 13989 0.0 0.0 141148 5904 ? S Feb16 0:00 crond
vare 15487 0.0 0.0 141148 5908 ? S 04:43 0:00 crond
vare 18796 0.0 0.0 141148 5904 ? S Feb16 0:00 crond
vare 20448 0.0 0.0 141148 5908 ? S 05:43 0:00 crond
root 23168 0.0 0.0 6024 596 pts/0 S+ 06:15 0:00 grep cron
vare 23474 0.0 0.0 141148 5904 ? S Feb16 0:00 crond
vare 27183 0.0 0.0 141148 5904 ? S Feb16 0:00 crond
vare 28358 0.0 0.0 141148 5904 ? S 00:43 0:00 crond
vare 32032 0.0 0.0 141148 5904 ? S Feb16 0:00 crond

…..(and more)

Now let’s see what cronjobs are running by user vare:
root@localhost# crontab -u vare -l
# run the VERA Deploy routine
43 * * * * cd /share/scripts > /dev/null 2>&1 ; sleep 5 ; /share/scripts/Application/VARE/Deploy > /dev/null 2>&1

After check the script /share/bbscripts/Application/VERA/Deploy, I can see that the script is changing directory to a NFS mount point<i.e. cd /share/scripts> and then do some checks<i.e. /share/scripts/Application/VARE/Deploy>. But as there’s problem during the process it’s changing to NFS mount point, so the script hung there and didn’t quit normally. As such, the number of crond was increasing.

Method to solve this specific problem(specific means you’ve to check your own script) is to first kill the hung processes of crond, then bounce autofs and then restart crond.

 

Categories: Kernel, Linux, Unix Tags:

using timex to check whether performance degradation caused by OS or VxVM

February 1st, 2012 No comments

To check for differences between operating system times to access disks and Volume Manager times to access disks, we can know whether to check for differences between operating system times to access disks and Volume Manager times to access disks. This is because they should both be about the same since both commands force a read of disk header information. If one of those is markedly greater then it indicates a problem in that area.

#echo | timex /usr/sbin/format #to avoid prompt for user input. Use time instead of timex for linux
real          13.03

user           0.10

sys            1.49
#timex vxdisk –o alldgs list
real           2.65

user           0.00

sys            0.00

Categories: Kernel, Linux, Unix Tags:

Linux hostname domainname dnsdomainname nisdomainname ypdomainname

December 20th, 2011 No comments

Here’s just an excerpt from online man page of “domainname”:

NAME
hostname – show or set the system’s host name
domainname – show or set the system’s NIS/YP domain name
dnsdomainname – show the system’s DNS domain name
nisdomainname – show or set system’s NIS/YP domain name
ypdomainname – show or set the system’s NIS/YP domain name
hostname will print the name of the system as returned by the gethost-
name(2) function.

domainname, nisdomainname, ypdomainname will print the name of the sys-
tem as returned by the getdomainname(2) function. This is also known as
the YP/NIS domain name of the system.

dnsdomainname will print the domain part of the FQDN (Fully Qualified
Domain Name). The complete FQDN of the system is returned with hostname
–fqdn.

Sometime you may find a weird thing that you can use ldap verification to log on a client, but you can not sudo to root. Now you should consider run domainname to check whether it’s set to (none). If it does, you should consider set the domainname just using domainname command.

Categories: Kernel, Linux, Unix Tags:

Extending tmpfs’ed /tmp on Solaris 10(and linux) without reboot

November 3rd, 2011 No comments

Thanks to Eugene.

If you need to extend /tmp that is using tmpfs on Solaris 10 global zone (works with zones too but needs adjustments) and don’t want to undertake a reboot, here’s a tried working solution.

PLEASE BE CAREFUL, ONE ERROR HERE WILL KILL THE LIVE KERNEL!

echo “$(echo $(echo ::fsinfo | mdb -k | grep /tmp | head -1 | awk ‘{print $1}’)::print vfs_t vfs_data \| ::print -ta struct tmount tm_anonmax | mdb -k | awk ‘{print $1}’)/Z 0×20000″ | mdb -kw

Note the 0×20000. This number means new size will be 1GB. It is calculated like this: as an example, 0×10000 in hex is 65535, or 64k. The size is set in pages, each page is 8k, so resulting allocation size is 64k * 8k = 512m. 0×20000 is 1GB, 0×40000 is 2GB etc.

If the server has zones, you will see more then one entry in ::fsinfo, and you need to feed exact struct address to mdb. This way you can change /tmp size for individual zones, but this can only be done from global zone.

Same approach can probably be applied to older Solaris releases but will definitely need adjustments. Oh, and in case you care, on Linux it’s as simple as “mount -o remount,size=1G /tmp” :)

 

Categories: Kernel, Unix Tags: