Archive

Archive for the ‘Systems’ Category

linux tips

April 10th, 2014 No comments
Linux Performance & Troubleshooting
For Linux Performance & Troubeshooting, please refer to another post - Linux tips – Performance and Troubleshooting
Linux system tips
ls -lu(access time, like cat file) -lt(modification time, like vi, ls -l defaults to use this) -lc(change time, chmod), stat ./aa.txt <UTC>
ctrl +z #bg and stopped
%1 & #bg and running
%1 #fg
pgrep -flu oracle  # processes owned by the user oracle
watch free -m #refresh every 2 seconds
pmap -x 30420 #memory mapping.
openssl s_client -connect localhost:636 -showcerts #verify ssl certificates, or 443
openssl x509 -in cacert.pem -noout -text
openssl x509 -in cacert.pem -noout -dates
openssl x509 -in cacert.pem -noout -purpose
openssl req -in robots.req.pem -text -verify -noout
blockdev –getbsz /dev/xvda1 #get blocksize of FS
dumpe2fs /dev/xvda1 |grep ‘Block size’
Strings
ovm svr ls|sort -rn -k 4 #sort by column 4
cat a1|sort|uniq -c |sort #SUS
ovm svr ls|uniq -f3 #skip the first three columns, this will list only 1 server per pool
for i in <all OVMs>;do (test.sh $i &);done #instead of using nohup &
ovm vm ls|egrep “`echo testhost{0\|,1\|,2\|,3\|,4}|tr -d ‘[:space:]‘`”
cat a|awk ‘{print $5}’|tr ‘\n’ ‘ ‘
getopt #getopts is builtin, more on http://wuliangxx.iteye.com/blog/750940
date -d ’1970-1-1 1276059000 sec utc’
date -d ’2010-09-11 23:20′ +%s
find . -name ‘*txt’|xargs tar cvvf a.tar
find . -maxdepth 1
for i in `find /usr/sbin/ -type f ! -perm -u+x`;do chmod +x $i;done #files that has no execute permisson for owner
find ./* -prune -print #-prune,do not cascade
find . -fprint file #put result to file
tar tvf a.tar  –wildcards “*ipp*” #globbing patterns
tar xvf bfiles.tar –wildcards –no-anchored ‘b*’
tar –show-defaults
tar cvf a.tar –totals *.txt #show speed
tar –append –file=collection.tar rock #add rock to collection.tar
tar –update -v -f collection.tar blues folk rock classical #only append new or updated ones, not replace
tar –delete –file=collection.tar blues #not on tapes
tar -c -f archive.tar –mode=’a+rw’
tar -C sourcedir -cf – . | tar -C targetdir -xf – #copy directories
tar -c -f jams.tar grape prune -C food cherry #-C,change dir, foot file cherry under foot directory
find . -size -400 -print > small-files
tar -c -v -z -T small-files -f little.tgz
tar -cf src.tar –exclude=’*.o’ src #multiple –exclude can be specified
expr 5 – 1
rpm2cpio ./ash-1.0.1-1.x86_64.rpm |cpio -ivd
eval $cmd
exec menu.viewcards #same to .
ls . | xargs -0 -i cp ./{} /etc #-i,use \n as separator, just like find -exec. -0 for space in filename. find -print0 use space to separate, not enter.(-i or -I {} for revoking filenames in the middle)
ls | xargs -t -i mv {} {}.old #mv source should exclude /,or unexpected errors may occur
mv –strip-trailing-slashes source destination
ls |xargs file /dev/fd/0 #replace -
ls -l -I “*out*” #not include out
find . -type d |xargs -i du -sh {} |awk ‘$1 ~ /G/’
find . -type f -name “*20120606″ -exec rm {} \; #do not need rm -rf. find . -type f -exec bash -c “ls -l ‘{}’” \;
ps -ef|grep init|sed -n ’1p’
cut -d ‘ ‘ -f1,3 /etc/mtab #first and third
seq 15 21 #print 15 to 21, or echo {15..21}
seq -s” ” 15 21 #use space as separator
Categories: Linux, tips Tags:

Linux tips – Performance and Troubleshooting

April 10th, 2014 No comments

System CPU

top
procinfo #yum install procinfo
gnome-system-monitor #can also see network flow rate
mpstat
sar

System Memory

top
free
slabtop
sar
/proc/meminfo #provides the most complete view of system memory usage
procinfo
gnome-system-monitor #can also see network flow rate

Process-specific CPU

time
strace #traces the system calls that a program makes while executing
ltrace #traces the calls(functions) that an application makes to libraries rather than to the kernel. Then use ldd to display which libraries are used, and use objdump to search each of those libraries for the given function.
ps
ld.so #ld

Process-specific Memory

ps
/proc/<pid> #you can refer to http://www.doxer.org/proc-filesystem-day-1/ for more info.

/proc/<PID>/status #provides information about the status of a given process PID
/proc/<PID>/maps #how the process’s virtual address space is used

ipcs #more info on http://www.doxer.org/resolved-semget-failed-with-status-28-failed-oracle-database-starting-up/ and http://www.doxer.org/resolvedload-manager-shared-memory-error-is-28-no-space-left-on-devicefor-apache-pmserver-etc-running-on-linux-solaris-unix/

Disk I/O

vmstat #provides totals rather than the rate of change during the sample
sar
lsof
time sh -c “dd if=/dev/zero of=System2.img bs=1M count=10240 && sync” #10G
time dd if=ddfile of=/dev/null bs=8k
dd if=/dev/zero of=vm1disk bs=1M seek=10240 count=0 #10G

Network

ethtool
ifconfig
ip
iptraf
gkrellm
netstat
gnome-system-monitor #can also see network flow rate
sar #network statistics
/etc/cron.d/sysstat #/var/log/sa/

General Ideas & options & outputs

Run Queue Statistics
In Linux, a process can be either runnable or blocked waiting for an event to complete.

A blocked process may be waiting for data from an I/O device or the results of a system call.

When these processes are runnable, but waiting to use the processor, they form a line called the run queue.
The load on a system is the total amount of running and runnable process.

Context Switches
To create the illusion that a given single processor runs multiple tasks simultaneously, the Linux kernel constantly switches between different processes.
The switch between different processes is called a context switch.
To guarantee that each process receives a fair share of processor time, the kernel periodically interrupts the running process and, if appropriate, the kernel scheduler decides to start another process rather than let the current process continue executing. It is possible that your system will context switch every time this periodic interrupt or timer occurs. (cat /proc/interrupts | grep timer, and do this again after e.g. 10s interval)

Interrupts
In addition, periodically, the processor receives an interrupt by hardware devices.
/proc/interrupts can be examined to show which interrupts are firing on which CPUs

CPU Utilization
At any given time, the CPU can be doing one of seven things:
Idle
Running user code #user time
System time #executing code in the Linux kernel on behalf of the application code
Executing user code that has been “nice”ed or set to run at a lower priority than normal processes
iowait #waiting for I/O (such as disk or network) to complete
irq #means it is in high-priority kernel code handling a hardware interrupt
softirq #executing kernel code that was also triggered by an interrupt, but it is running at a lower priority


Buffers and cache
Alternatively, if your system has much more physical memory than required by your applications, Linux will cache recently used files in physical memory so that subsequent accesses to that file do not require an access to the hard drive. This can greatly speed up applications that access the hard drive frequently, which, obviously, can prove especially useful for frequently launched applications. The first time the application is launched, it needs to be read from the disk; if the application remains in the cache, however, it needs to be read from the much quicker physical memory. This disk cache differs from the processor cache mentioned in the previous chapter. Other than oprofile, valgrind, and kcachegrind, most tools that report statistics about “cache” are actually referring to disk cache.

In addition to cache, Linux also uses extra memory as buffers. To further optimize applications, Linux sets aside memory to use for data that needs to be written to disk. These set-asides are called buffers. If an application has to write something to the disk, which would usually take a long time, Linux lets the application continue immediately but saves the file data into a memory buffer. At some point in the future, the buffer is flushed to disk, but the application can continue immediately.
Active Versus Inactive Memory
Active memory is currently being used by a process. Inactive memory is memory that is allocated but has not been used for a while. Nothing is essentially different between the two types of memory. When required, the Linux kernel takes a process’s least recently used memory pages and moves them from the active to the inactive list. When choosing which memory will be swapped to disk, the kernel chooses from the inactive memory list.
Kernel Usage of Memory (Slabs)
In addition to the memory that applications allocate, the Linux kernel consumes a certain amount for bookkeeping purposes. This bookkeeping includes, for example, keeping track of data arriving from network and disk I/O devices, as well as keeping track of which processes are running and which are sleeping. To manage this bookkeeping, the kernel has a series of caches that contains one or more slabs of memory. Each slab consists of a set of one or more objects. The amount of slab memory consumed by the kernel depends on which parts of the Linux kernel are being used, and can change as the type of load on the machine changes.

slabtop

slabtop shows in real-time how the kernel is allocating its various caches and how full they are. Internally, the kernel has a series of caches that are made up of one or more slabs. Each slab consists of a set of one or more objects. These objects can be active (or used) or inactive (unused). slabtop shows you the status of the different slabs. It shows you how full they are and how much memory they are using.


time

time measures three types of time. First, it measures the real or elapsed time, which is the amount of time between when the program started and finished execution. Next, it measures the user time, which is the amount of time that the CPU spent executing application code on behalf of the program. Finally, time measures system time, which is the amount of time the CPU spent executing system or kernel code on behalf of the application.


Disk I/O

When an application does a read or write, the Linux kernel may have a copy of the file stored into its cache or buffers and returns the requested information without ever accessing the disk. If the Linux kernel does not have a copy of the data stored in memory, however, it adds a request to the disk’s I/O queue. If the Linux kernel notices that multiple requests are asking for contiguous locations on the disk, it merges them into a single big request. This merging increases overall disk performance by eliminating the seek time for the second request. When the request has been placed in the disk queue, if the disk is not currently busy, it starts to service the I/O request. If the disk is busy, the request waits in the queue until the drive is available, and then it is serviced.

iostat

iostat provides a per-device and per-partition breakdown of how many blocks are written to and from a particular disk. (Blocks in iostat are usually sized at 512 bytes.)

lsof
lsof can prove helpful when narrowing down which applications are generating I/O


 top output

S(or STAT) – This is the current status of a process, where the process is either sleeping (S), running (R), zombied (killed but not yet dead) (Z), in an uninterruptable sleep (D), or being traced (T).

TIME – The total amount CPU time (user and system) that this process has used since it started executing.

top options

-b Run in batch mode. Typically, top shows only a single screenful of information, and processes that don’t fit on the screen never display. This option shows all the processes and can be very useful if you are saving top’s output to a file or piping the output to another command for processing.

I This toggles whether top will divide the CPU usage by the number of CPUs on the system. For example, if a process was consuming all of both CPUs on a two-CPU system, this toggles whether top displays a CPU usage of 100% or 200%.

1 (numeral 1) This toggles whether the CPU usage will be broken down to the individual usage or shown as a total.

mpstat options

-P { cpu | ALL } This option tells mpstat which CPUs to monitor. cpu is the number between 0 and the total CPUs minus 1.

The biggest benefit of mpstat is that it shows the time next to the statistics, so you can look for a correlation between CPU usage and time of day.

mpstat can be used to determine whether the CPUs are fully utilized and relatively balanced. By observing the number of interrupts each CPU is handling, it is possible to find an imbalance.

 sar options

-I {irq | SUM | ALL | XALL} This reports the rates that interrupts have been occurring in the system.
-P {cpu | ALL} This option specifies which CPU the statistics should be gathered from. If this isn’t specified, the system totals are reported.
-q This reports information about the run queues and load averages of the machine.
-u This reports information about CPU utilization of the system. (This is the default output.)
-w This reports the number of context switches that occurred in the system.
-o filename This specifies the name of the binary output file that will store the performance statistics.
-f filename This specifies the filename of the performance statistics.

-B – This reports information about the number of blocks that the kernel swapped to and from disk. In addition, for kernel versions after v2.5, it reports information about the number of page faults.
-W – This reports the number of pages of swap that are brought in and out of the system.
-r – This reports information about the memory being used in the system. It includes information about the total free memory, swap, cache, and buffers being used.
-R Report memory statistics

-d –  reports disk activities

-n DEV – Shows statistics about the number of packets and bytes sent and received by each device.
-n EDEV – Shows information about the transmit and receive errors for each device.
-n SOCK – Shows information about the total number of sockets (TCP, UDP, and RAW) in use.
-n ALL – Shows all the network statistics.

sar output

runq-sz This is the size of the run queue when the sample was taken.
plist-sz This is the number of processes present (running, sleeping, or waiting for I/O) when the sample was taken.
proc/s This is the number of new processes created per second. (This is the same as the forks statistic from vmstat.)

tps – Transfers per second. This is the number of reads and writes to the drive/partition per second.
rd_sec/s – Number of disk sectors read per second.
wr_sec/s – Number of disk sectors written per second.


vmstat options

-n print header info only once

-a This changes the default output of memory statistics to indicate the active/inactive amount of memory rather than information about buffer and cache usage.
-s (procps 3.2 or greater) This prints out the vm table. This is a grab bag of different statistics about the system since it has booted. It cannot be run in sample mode. It contains both memory and CPU statistics.

-d – This option displays individual disk statistics at a rate of one sample per interval. The statistics are the totals since system boot, rather than just those that occurred between this sample and the previous sample.
-p partition – This displays performance statistics about the given partition at a rate of one sample per interval. The statistics are the totals since system boot, rather than just those that occurred between this sample and the previous sample.

vmstat output
si – The rate of memory (in KB/s) that has been swapped in from disk during the last sample.
so – The rate of memory (in KB/s) that has been swapped out to disk during the last sample.
pages paged in – The amount of memory (in pages) read from the disk(s) into the system buffers. (On most IA32 systems, a page is 4KB.)
pages paged out – The amount of memory (in pages) written to the disk(s) from the system cache. (On most IA32 systems, a page is 4KB.)
pages swapped in – The amount of memory (in pages) read from swap into system memory.
pages swapped in/out – The amount of memory (in pages) written from system memory to the swap.

bo – This indicates the number of total blocks written to disk in the previous interval. (In vmstat, block size for a disk is typically 1,024 bytes.)
bi – This shows the number of blocks read from the disk in the previous interval. (In vmstat, block size for a disk is typically 1,024 bytes.)
wa – This indicates the amount of CPU time spent waiting for I/O to complete. The rate of disk blocks written per second.
reads: ms – The amount of time (in ms) spent reading from the disk.
writes: ms – The amount of time (in ms) spent writing to the disk.
IO: cur – The total number of I/O that are currently in progress. Note that there is a bug in recent versions of vmstat in which this is incorrectly divided by 1,000, which almost always yields a 0.
IO: s – This is the number of seconds spent waiting for I/O to complete.

iostat options
-d – This displays only information about disk I/O rather than the default display, which includes information about CPU usage as well.
-k – This shows statistics in kilobytes rather than blocks.
-x – This shows extended-performance I/O statistics.
device – If a device is specified, iostat shows only information about that device.

iostat output
tps – Transfers per second. This is the number of reads and writes to the drive/partition per second.
Blk_read/s – The rate of disk blocks read per second.
Blk_wrtn/s – The rate of disk blocks written per second.
Blk_read – The total number of blocks read during the interval.
Blk_wrtn – The total number of blocks written during the interval.
rrqm/s – The number of reads merged before they were issued to the disk.
wrqm/s – The number of writes merged before they were issued to the disk.
r/s – The number of reads issued to the disk per second.
w/s – The number of writes issued to the disk per second.
rsec/s – Disk sectors read per second.
wsec/s – Disk sectors written per second.
avgrq-sz – The average size (in sectors) of disk requests.
avgqu-sz – The average size of the disk request queue.
await – The average time (in ms) for a request to be completely serviced. This average includes the time that the request was waiting in the disk’s queue plus the amount of time it was serviced by the disk.
svctm – The average service time (in ms) for requests submitted to the disk. This indicates how long on average the disk took to complete a request. Unlike await, it does not include the amount of time spent waiting in the queue.

lsof options
+D directory – This causes lsof to recursively search all the files in the given directory and report on which processes are using them.
+d directory – This causes lsof to report on which processes are using the files in the given directory.

lsof output
FD – The file descriptor of the file, or tex for a executable, mem for a memory mapped file.
TYPE – The type of file. REG for a regular file.
DEVICE – Device number in major, minor number.
SIZE – The size of the file.
NODE – The inode of the file.


free options

-s delay – This option causes free to print out new memory statistics every delay seconds.


 strace options

strace [-p <pid>] -s 200 <program>#attach to a process. -s 200 to make the maximum string size to print (the default is 32) to 200. Note that filenames are not considered strings and are always printed in full.

-c – This causes strace to print out a summary of statistics rather than an individual list of all the system calls that are made.

ltrace options
-c – This option causes ltrace to print a summary of all the calls after the command has completed.
-S – ltrace traces system calls in addition to library calls, which is identical to the functionality strace provides.
-p pid – This traces the process with the given PID.


ps options
vsz The virtual set size is the amount of virtual memory that the application is using. Because Linux only allocated physical memory when an application tries to use it, this value may be much greater than the amount of physical memory the application is using.
rss The resident set size is the amount of physical memory the application is currently using.
pmep The percentage of the system memory that the process is consuming.
command This is the command name.

/proc/<PID>/status output
VmSize This is the process’s virtual set size, which is the amount of virtual memory that the application is using. Because Linux only allocates physical memory when an application tries to use it, this value may be much greater than the amount of physical memory the application is actually using. This is the same as the vsz parameter provided by ps.
VmLck This is the amount of memory that has been locked by this process. Locked memory cannot be swapped to disk.
VmRSS This is the resident set size or amount of physical memory the application is currently using. This is the same as the rss statistic provided by ps.

ipcs
Because shared memory is used by multiple processes, it cannot be attributed to any particular process. ipcs provides enough information about the state of the system-wide shared memory to determine which processes allocated the shared memory, which processes are using it, and how often they are using it. This information proves useful when trying to reduce shared memory usage.

ipcs options

lsof –u oracle | grep <shmid> #shmid is from output of ipcs -m. lists the processes under the oracle user attached to the shared memory segment

-t – This shows the time when the shared memory was created, when a process last attached to it, and when a process last detached from it.
-u – This provides a summary about how much shared memory is being used and whether it has been swapped or is in memory.
-l – This shows the system-wide limits for shared memory usage.
-p – This shows the PIDs of the processes that created and last used the shared memory segments.
-c – creator


ifconfig output #more on http://www.thegeekscope.com/linux-ifconfig-command-output-explained/

Errors – Frames with errors (possibly because of a bad network cable or duplex mismatch).
Dropped – Frames that were discarded (most likely because of low amounts of memory or buffers).
Overruns – Frames that may have been discarded by the network card because the kernel or network card was overwhelmed with frames. This should not normally happen.
Frame – These frames were dropped as a result of problems on the physical level. This could be the result of cyclic redundancy check (CRC) errors or other low-level problems.
Compressed – Some lower-level interfaces, such as Point-to-Point Protocol (PPP) or Serial Line Internet Protocol (SLIP) devices compress frames before they are sent over the network. This value indicates the number of these compressed frames. (Compressed packets are usually present during SLIP or PPP connections)

carrier – The number of packets discarded because of link media failure (such as a faulty cable)

ip options
-s [-s] link – If the extra -s is provided to ip, it provides a more detailed list of low-level Ethernet statistics.

iptraf options
-d interface – Detailed statistics for an interface including receive, transmit, and error rates
-s interface – Statistics about which IP ports are being used on an interface and how many bytes are flowing through them
-t <minutes> – Number of minutes that iptraf runs before exiting
-z interface – shows packet counts by size on the specified interface

netstat options
-p – Displays the PID/program name responsible for opening each of the displayed sockets
-c – Continually updates the display of information every second
–interfaces=<name> – Displays network statistics for the given interface
–statistics|-s – IP/UDP/ICMP/TCP statistics
–tcp|-t – Shows only information about TCP sockets
–udp|-u – Shows only information about UDP sockets.
–raw|-w – Shows only information about RAW sockets (IP and ICMP)
–listening|-l – Show only listening sockets. (These are omitted by default.)
–all|-a – Show both listening and non-listening (for TCP this means established connections) sockets. With the –interfaces option, show interfaces that are not marked
–numeric|-n – Show numerical addresses instead of trying to determine symbolic host, port or user names.
–extend|-e – Display additional information. Use this option twice for maximum detail.

netstat output

Active Internet connections (w/o servers)
Proto - The protocol (tcp, udp, raw) used by the socket.
Recv-Q - The count of bytes not copied by the user program connected to this socket.
Send-Q - The count of bytes not acknowledged by the remote host.
Local Address - Address and port number of the local end of the socket. Unless the --numeric (-n) option is specified, the socket address is resolved to its canonical host name (FQDN), and the port number is translated into the corresponding service name.
Foreign Address - Address and port number of the remote end of the socket. Analogous to "Local Address."
State - The state of the socket. Since there are no states in raw mode and usually no states used in UDP, this column may be left blank. Normally this can be one of several values: #more on http://www.doxer.org/tcp-flags-explanation-in-details-syn-ack-fin-rst-urg-psh-and-iptables-for-sync-flood/
    ESTABLISHED
        The socket has an established connection.
    SYN_SENT
        The socket is actively attempting to establish a connection.
    SYN_RECV
        A connection request has been received from the network.
    FIN_WAIT1
        The socket is closed, and the connection is shutting down.
    FIN_WAIT2
        Connection is closed, and the socket is waiting for a shutdown from the remote end.
    TIME_WAIT
        The socket is waiting after close to handle packets still in the network.
    CLOSED
        The socket is not being used.
    CLOSE_WAIT
        The remote end has shut down, waiting for the socket to close.
    LAST_ACK
        The remote end has shut down, and the socket is closed. Waiting for acknowledgement.
    LISTEN
        The socket is listening for incoming connections. Such sockets are not included in the output unless you specify the --listening (-l) or --all (-a) option.
    CLOSING
        Both sockets are shut down but we still don't have all our data sent.
    UNKNOWN
        The state of the socket is unknown.
User - The username or the user id (UID) of the owner of the socket.
PID/Program name - Slash-separated pair of the process id (PID) and process name of the process that owns the socket. --program causes this column to be included. You will also need superuser privileges to see this information on sockets you don't own. This identification information is not yet available for IPX sockets.

Example

[ezolt@scrffy ~/edid]$ vmstat 1 | tee /tmp/output
procs -----------memory---------- ---swap-- -----io----  --system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo    in    cs  us sy id wa
0  1 201060  35832  26532 324112    0    0     3     2     6     2  5  1  94  0
0  0 201060  35888  26532 324112    0    0    16     0  1138   358  0  0  99  0
0  0 201060  35888  26540 324104    0    0     0    88  1163   371  0  0 100  0

The number of context switches looks good compared to the number of interrupts. The scheduler is switching processes less than the number of timer interrupts that are firing. This is most likely because the system is nearly idle, and most of the time when the timer interrupt fires, the scheduler does not have any work to do, so it does not switch from the idle process.

[ezolt@scrffy manuscript]$ sar -w -c -q 1 2
Linux 2.6.8-1.521smp (scrffy)   10/20/2004

08:23:29 PM    proc/s
08:23:30 PM      0.00

08:23:29 PM   cswch/s
08:23:30 PM    594.00

08:23:29 PM   runq-sz  plist-sz   ldavg-1    ldavg-5  ldavg-15
08:23:30 PM         0       163      1.12       1.17      1.17

08:23:30 PM    proc/s
08:23:31 PM      0.00

08:23:30 PM   cswch/s
08:23:31 PM    812.87

08:23:30 PM   runq-sz  plist-sz   ldavg-1    ldavg-5  ldavg-15
08:23:31 PM         0       163      1.12       1.17      1.17

Average:       proc/s
Average:         0.00

Average:      cswch/s
Average:       703.98

Average:      runq-sz  plist-sz   ldavg-1    ldavg-5  ldavg-15
Average:            0       163      1.12       1.17      1.17

In this case, we ask sar to show us the total number of context switches and process creations that occur every second. We also ask sar for information about the load average. We can see in this example that this machine has 163 process that are in memory but not running. For the past minute, on average 1.12 processes have been ready to run.

bash-2.05b$ vmstat -a
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free  inact active   si   so    bi    bo   in    cs us sy id wa
 2  1 514004   5640 79816 1341208   33   31   204   247 1111  1548  8  5 73 14

The amount of inactive pages indicates how much of the memory could be swapped to disk and how much is currently being used. In this case, we can see that 1310MB of memory is active, and only 78MB is considered inactive. This machine has a large amount of memory, and much of it is being actively used.


bash-2.05b$ vmstat -s

      1552528  total memory
      1546692  used memory
      1410448  active memory
        11100  inactive memory
         5836  free memory
         2676  buffer memory
       645864  swap cache
      2097096  total swap
       526280  used swap
      1570816  free swap
     20293225 non-nice user cpu ticks
     18284715 nice user cpu ticks
     17687435 system cpu ticks
    357314699 idle cpu ticks
     67673539 IO-wait cpu ticks
       352225 IRQ cpu ticks
      4872449 softirq cpu ticks
    495248623 pages paged in
    600129070 pages paged out
     19877382 pages swapped in
     18874460 pages swapped out
   2702803833 interrupts
   3763550322 CPU context switches
   1094067854 boot time
     20158151 forks

It can be helpful to know the system totals when trying to figure out what percentage of the swap and memory is currently being used. Another interesting statistic is the pages paged in, which indicates the total number of pages that were read from the disk. This statistic includes the pages that are read starting an application and those that the application itself may be using.


[ezolt@wintermute tmp]$ ps -o etime,time,pcpu,cmd 10882
    ELAPSED     TIME %CPU CMD
      00:06 00:00:05 88.0 ./burn

This example shows a test application that is consuming 88 percent of the CPU and has been running for 6 seconds, but has only consumed 5 seconds of CPU time.


[ezolt@wintermute tmp]$ ps –o vsz,rss,tsiz,dsiz,majflt,minflt,cmd 10882
VSZ RSS TSIZ DSIZ MAJFLT MINFLT CMD
11124 10004 1 11122 66 2465 ./burn

The burn application has a very small text size (1KB), but a very large data size (11,122KB). Of the total virtual size (11,124KB), the process has a slightly smaller resident set size (10,004KB), which represents the total amount of physical memory that the process is actually using. In addition, most of the faults generated by burn were minor faults, so most of the memory faults were due to memory allocation rather than loading in a large amount of text or data from the program image on the disk.


[ezolt@wintermute tmp]$ cat /proc/4540/status
Name: burn
State: T (stopped)
Tgid: 4540
Pid: 4540
PPid: 1514
TracerPid: 0
Uid: 501 501 501 501
Gid: 501 501 501 501
FDSize: 256
Groups: 501 9 502
VmSize: 11124 kB
VmLck: 0 kB
VmRSS: 10004 kB
VmData: 9776 kB
VmStk: 8 kB
VmExe: 4 kB
VmLib: 1312 kB
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000

The VmLck size of 0KB means that the process has not locked any pages into memory, making them unswappable. The VmRSS size of 10,004KB means that the application is currently using 10,004KB of physical memory, although it has either allocated or mapped the VmSize or 11,124KB. If the application begins to use the memory that it has allocated but is not currently using, the VmRSS size increases but leaves the VmSize unchanged.

[ezolt@wintermute test_app]$ cat /proc/4540/maps
08048000-08049000 r-xp 00000000 21:03 393730 /tmp/burn
08049000-0804a000 rw-p 00000000 21:03 393730 /tmp/burn
0804a000-089d3000 rwxp 00000000 00:00 0
40000000-40015000 r-xp 00000000 21:03 1147263 /lib/ld-2.3.2.so
40015000-40016000 rw-p 00015000 21:03 1147263 /lib/ld-2.3.2.so
4002e000-4002f000 rw-p 00000000 00:00 0
4002f000-40162000 r-xp 00000000 21:03 2031811 /lib/tls/libc-2.3.2.so
40162000-40166000 rw-p 00132000 21:03 2031811 /lib/tls/libc-2.3.2.so
40166000-40168000 rw-p 00000000 00:00 0
bfffe000-c0000000 rwxp fffff000 00:00 0

The burn application is using two libraries: ld and libc. The text section (denoted by the permission r-xp) of libc has a range of 0x4002f000 through 0×40162000 or a size of 0×133000 or 1,257,472 bytes.
The data section (denoted by permission rw-p) of libc has a range of 40162000 through 40166000 or a size of 0×4000 or 16,384 bytes. The text size of libc is bigger than ld’s text size of 0×15000 or 86,016 bytes. The data size of libc is also bigger than ld’s text size of 0×1000 or 4,096 bytes. libc is the big library that burn is linking in.


[ezolt@wintermute tmp]$ ipcs -u

------ Shared Memory Status --------
segments allocated 21
pages allocated 1585
pages resident 720
pages swapped 412
Swap performance: 0 attempts 0 successes

------ Semaphore Status --------
used arrays = 0
allocated semaphores = 0

------ Messages: Status --------
allocated queues = 0
used headers = 0
used space = 0 bytes

In this case, we can see that 21 different segments or pieces of shared memory have been allocated. All these segments consume a total of 1,585 pages of memory; 720 of these exist in physical memory and 412 have been swapped to disk.

[ezolt@wintermute tmp]$ ipcs

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 0 root 777 49152 1
0x00000000 32769 root 777 16384 1
0x00000000 65538 ezolt 600 393216 2 dest

we ask ipcs for a general overview of all the shared memory segments in the system. This indicates who is using each memory segment. In this case, we see a list of all the shared segments. For one in particular, the one with a share memory ID of 65538, the user (ezolt) is the owner. It has a permission of 600 (a typical UNIX permission), which in this case, means that only ezolt can read and write to it. It has 393,216 bytes, and 2 processes are attached to it.

[ezolt@wintermute tmp]$ ipcs -p

------ Shared Memory Creator/Last-op --------
shmid owner cpid lpid
0 root 1224 11954
32769 root 1224 11954
65538 ezolt 1229 11954

Finally, we can figure out exactly which processes created the shared memory segments and which other processes are using them. For the segment with shmid 32769, we can see that the PID 1229 created it and 11954 was the last to use it.


[ezolt@wintermute procps-3.2.0]$ ./vmstat 1 3

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 0 197020 81804 29920 0 0 236 25 1017 67 1 1 93 4
1 1 0 172252 106252 29952 0 0 24448 0 1200 395 1 36 0 63
0 0 0 231068 50004 27924 0 0 19712 80 1179 345 1 34 15 49

During one of the samples, the system read 24,448 disk blocks. As mentioned previously, the block size for a disk is 1,024 bytes(or 4,096 bytes), so this means that the system is reading in data at about 23MB per second. We can also see that during this sample, the CPU was spending a significant portion of time waiting for I/O to complete. The CPU waits on I/O 63 percent of the time during the sample in which the disk was reading at ~23MB per second, and it waits on I/O 49 percent for the next sample, in which the disk was reading at ~19MB per second.

[ezolt@wintermute procps-3.2.0]$ ./vmstat -D
3 disks
5 partitions
53256 total reads
641233 merged reads
4787741 read sectors
343552 milli reading
14479 writes
17556 merged writes
257208 written sectors
7237771 milli writing
0 inprogress IO
342 milli spent IO

In this example, a large number of the reads issued to the system were merged before they were issued to the device. Although there were ~640,000 merged reads, only ~53,000 read commands were actually issued to the drives. The output also tells us that a total of 4,787,741 sectors have been read from the disk, and that since system boot, 343,552ms (or 344 seconds) were spent reading from the disk. The same statistics are available for write performance.

[ezolt@wintermute procps-3.2.0]$ ./vmstat -p hde3 1 3
hde3 reads read sectors writes requested writes
18999 191986 24701 197608
19059 192466 24795 198360
- 19161 193282 24795 198360

Shows that 60 (19,059 – 18,999) reads and 94 writes (24,795 – 24,795) have been issued to partition hde3. This view can prove particularly useful if you are trying to determine which partition of a disk is seeing the most usage.


 

[ezolt@localhost sysstat-5.0.2]$ ./iostat -x -dk 1 5 /dev/hda2
Linux 2.4.22-1.2188.nptl (localhost.localdomain) 05/01/2004
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 11.22 44.40 3.15 4.20 115.00 388.97 57.50 194.49
68.52 1.75 237.17 11.47 8.43

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 1548.00 0.00 100.00 0.00 13240.00 0.00 6620.00
132.40 55.13 538.60 10.00 100.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 1365.00 0.00 131.00 0.00 11672.00 0.00 5836.00
89.10 53.86 422.44 7.63 100.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 1483.00 0.00 84.00 0.00 12688.00 0.00 6344.00
151.0 39.69 399.52 11.90 100.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 2067.00 0.00 123.00 0.00 17664.00 0.00 8832.00
143.61 58.59 508.54 8.13 100.00

you can see that the average queue size is pretty high (~237 to 538) and, as a result, the amount of time that a request must wait (~422.44ms to 538.60ms) is much greater than the amount of time it takes to service the request (7.63ms to 11.90ms). These high average service times, along with the fact that the utilization is 100 percent, show that the disk is completely saturated.


[ezolt@wintermute sysstat-5.0.2]$ sar -n SOCK 1 2

Linux 2.4.22-1.2174.nptlsmp (wintermute.phil.org) 06/07/04
21:32:26 totsck tcpsck udpsck rawsck ip-frag
21:32:27 373 118 8 0 0
21:32:28 373 118 8 0 0
Average: 373 118 8 0 0

We can see the total number of open sockets and the TCP, RAW, and UDP sockets. sar also displays the number of fragmented IP packets.

PS:

resolved – /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory

April 1st, 2014 No comments

When I ran perl command today, I met problem below:

[root@test01 bin]# /usr/local/bin/perl5.8
-bash: /usr/local/bin/perl5.8: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory

Now let’s check which package /lib/ld-linux.so.2 belongs to on a good linux box:

[root@test02 ~]# rpm -qf /lib/ld-linux.so.2
glibc-2.5-118.el5_10.2

So here’s the resolution to the issue:

[root@test01 bin]# yum install -y glibc.x86_64 glibc.i686 glibc-devel.i686 glibc-devel.x86_64 glibc-headers.x86_64

Categories: Kernel, Linux, Systems Tags:

resolved – sudo: sorry, you must have a tty to run sudo

April 1st, 2014 2 comments

The error message below sometimes will occur when you run a sudo <command>:

sudo: sorry, you must have a tty to run sudo

To resolve this, you may comment out “Defaults requiretty” in /etc/sudoers(revoked by running visudo). Here is more info about this method: http://www.cyberciti.biz/faq/linux-unix-bsd-sudo-sorry-you-must-haveattytorun/

However, sometimes it’s not convenient or even not possible to modify /etc/sudoers, then you can consider the following:

echo -e “<password>\n”|sudo -S <sudo command>

For -S parameter of sudo, you may refer to sudo man page:

-S‘ The -S (stdin) option causes sudo to read the password from the standard input instead of the terminal device. The password must be followed by a newline character.

So here -S bypass tty(terminal device) to read the password from the standard input. And by this, we can now pipe password to sudo.

Categories: Linux, Programming, SHELL, Systems Tags: ,

set vnc not asking for OS account password

March 18th, 2014 No comments

As you may know, vncpasswd(belongs to package vnc-server) is used to set password for users when connecting to vnc using a vnc client(such as tightvnc). When you connect to vnc-server, it’ll ask for the password:

vnc-0After you connect to the host using VNC, you may also find that the remote server will ask again for OS password(this is set by passwd):

vnc-01For some cases, you may not want the second one. So here’s the way to cancel this behavior:

vnc-1vnc-2

 

 

Categories: Linux, Systems Tags: ,

stuck in PXE-E51: No DHCP or proxyDHCP offers were received, PXE-M0F: Exiting Intel Boot Agent, Network boot canceled by keystroke

March 17th, 2014 No comments

If you installed your OS and tried booting up it but stuck with the following messages:

stuck_pxe

Then one possibility is that, the configuration for your host’s storage array is not right. For instance, it should be JBOD but you had configured it to RAID6.

Please note that this is only one possibility for this error, you may search for PXE Error Codes you encoutered for more details.

PS:

  • Sometimes, DHCP snooping may prevent PXE functioning, you can read more http://en.wikipedia.org/wiki/DHCP_snooping.
  • STP(Spanning-Tree Protocol) makes each port wait up to 50 seconds before data is allowed to be sent on the port. This Delay in turn can cause problems with some applications/protocols (PXE, Bootworks, etc.). To alleviate the problem, Porfast was implemented on Cisco devices, the terminology might differ between different vendor devices. You can read more http://www.symantec.com/business/support/index?page=content&id=HOWTO6019
  • ARP caching http://www.networkers-online.com/blog/2009/02/arp-caching-and-timeout/
Categories: Hardware, Storage, Systems Tags:

wget and curl tips

March 14th, 2014 No comments

Imagine you want to download all files under http://www.example.com/2013/downloads, and not files under http://www.example.com/2013 except for directory ‘downloads’, then you can do this:

wget -r –level 100 -nd –no-proxy –no-parent –reject “index.htm*” –reject “*gif” ‘http://www.example.com/2013/downloads/’ #–level 100 is large enough, as I’ve seen no site has more than 100 levels of sub-directories so far.

wget -p -k –no-proxy –no-check-certificate –post-data ‘id=username&passwd=password’ <url> -O output.html

wget –no-proxy –no-check-certificate –save-cookies cookies.txt <url>

wget –no-proxy –no-check-certificate –load-cookies cookies.txt <url>

curl -k -u ‘username:password’ <url>

curl -k -L -d id=username -d passwd=password <url>

curl –data “loginform:id=username&loginform:passwd=password” -k -L <url>

 

Categories: Linux, Programming, SHELL Tags:

resolved – ssh Read from socket failed: Connection reset by peer and Write failed: Broken pipe

March 13th, 2014 No comments

If you met following errors when ssh to linux box:

Read from socket failed: Connection reset by peer

Write failed: Broken pipe

Then there’s one possibility that the linux box’s filesystem was corrupted. As in my case there’s output to stdout:

EXT3-fs error ext3_lookup: deleted inode referenced

To resolve this, you need make linux go to single user mode and fsck -y <filesystem>. You can get corrupted filesystem names when booting:

[/sbin/fsck.ext3 (1) -- /usr] fsck.ext3 -a /dev/xvda2
/usr contains a file system with errors, check forced.
/usr: Directory inode 378101, block 0, offset 0: directory corrupted

/usr: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)

[/sbin/fsck.ext3 (1) -- /oem] fsck.ext3 -a /dev/xvda5
/oem: recovering journal
/oem: clean, 8253/1048576 files, 202701/1048233 blocks
[/sbin/fsck.ext3 (1) -- /u01] fsck.ext3 -a /dev/xvdb
u01: clean, 36575/14548992 files, 2122736/29081600 blocks
[FAILED]

So in this case, I did fsck -y /dev/xvda2 && fsck -y /dev/xvda5. Later reboot host, and then everything went well.

PS:

If two VMs are booted up in two hypervisors and these VMs shared the same filesystem(like NFS), then after fsck -y one FS and booted up the VM, the FS will corrupt soon as there’re other copies of itself is using that FS. So you need first make sure that only one copy of VM is running on hypervisors of the same server pool.

Categories: Kernel, Linux Tags:

psftp through a proxy

March 5th, 2014 No comments

You may know that, we can set proxy in putty for ssh to remote host, as shown below:

putty_proxyAnd if you want to scp files from remote site to your local box, you can use putty’s psftp.exe. There’re many options for psftp.exe:

C:\Users\test>d:\PuTTY\psftp.exe -h
PuTTY Secure File Transfer (SFTP) client
Release 0.62
Usage: psftp [options] [user@]host
Options:
-V print version information and exit
-pgpfp print PGP key fingerprints and exit
-b file use specified batchfile
-bc output batchfile commands
-be don’t stop batchfile processing if errors
-v show verbose messages
-load sessname Load settings from saved session
-l user connect with specified username
-P port connect to specified port
-pw passw login with specified password
-1 -2 force use of particular SSH protocol version
-4 -6 force use of IPv4 or IPv6
-C enable compression
-i key private key file for authentication
-noagent disable use of Pageant
-agent enable use of Pageant
-batch disable all interactive prompts

Although there’s proxy setting option for putty.exe, there’s no proxy setting for psftp.exe! So what should you do if you want to copy files back to local box, and there’s firewall blocking you from doing this directly, and you must use a proxy?

As you may notice, there’s “-load sessname” option in psftp.exe:

-load sessname Load settings from saved session

This option means that, if you have session opened by putty.exe, then you can use psftp.exe -load <session name> to copy files from remote site. For example, suppose you opened one session named mysession in putty.exe in which you set proxy there, then you can use “psftp.exe -load mysession” to copy files from remote site(no need for username/password, as you must have entered that in putty.exe session):

C:\Users\test>d:\PuTTY\psftp.exe -load mysession
Using username “root”.
Remote working directory is /root
psftp> ls
Listing directory /root
drwx—— 3 ec2-user ec2-user 4096 Mar 4 09:27 .
drwxr-xr-x 3 root root 4096 Dec 10 23:47 ..
-rw——- 1 ec2-user ec2-user 388 Mar 5 05:07 .bash_history
-rw-r–r– 1 ec2-user ec2-user 18 Sep 4 18:23 .bash_logout
-rw-r–r– 1 ec2-user ec2-user 176 Sep 4 18:23 .bash_profile
-rw-r–r– 1 ec2-user ec2-user 124 Sep 4 18:23 .bashrc
drwx—— 2 ec2-user ec2-user 4096 Mar 4 09:21 .ssh
psftp> help
! run a local command
bye finish your SFTP session
cd change your remote working directory
chmod change file permissions and modes
close finish your SFTP session but do not quit PSFTP
del delete files on the remote server
dir list remote files
exit finish your SFTP session
get download a file from the server to your local machine
help give help
lcd change local working directory
lpwd print local working directory
ls list remote files
mget download multiple files at once
mkdir create directories on the remote server
mput upload multiple files at once
mv move or rename file(s) on the remote server
open connect to a host
put upload a file from your local machine to the server
pwd print your remote working directory
quit finish your SFTP session
reget continue downloading files
ren move or rename file(s) on the remote server
reput continue uploading files
rm delete files on the remote server
rmdir remove directories on the remote server
psftp>

Now you can get/put files as we used to now.

PS:

If you do not need proxy connecting to remote site, then you can use psftp.exe CLI to get remote files directly. For example:

d:\PuTTY\psftp.exe [email protected] -i d:\PuTTY\aws.ppk -b d:\PuTTY\script.scr -bc -be -v

And in d:\PuTTY\script.scr is script for put/get files:

cd /backup
lcd c:\
mget *.tar.gz
close

Categories: Linux, Systems Tags: ,

avoid putty ssh connection sever or disconnect

January 17th, 2014 2 comments

After sometime, ssh will disconnect itself. If you want to avoid this, you can try run the following command:

while [ 1 ];do echo hi;sleep 60;done &

This will print message “hi” every 60 seconds on the standard output.

PS:

You can also set some parameters in /etc/ssh/sshd_config, you can refer to http://www.doxer.org/learn-linux/make-ssh-on-linux-not-to-disconnect-after-some-certain-time/

Categories: Linux, SHELL, Unix Tags:

install java jdk on linux

January 7th, 2014 No comments

Here’s the steps if you want to install java on linux:

wget <path to jre-7u25-linux-x64.rpm> -P /tmp
rpm -ivh /tmp/jre-7u25-linux-x64.rpm
mkdir -p /root/.mozilla/plugins
rm -f /root/.mozilla/plugins/libnpjp2.so
ln -s /usr/java/jre1.7.0_25/lib/amd64/libnpjp2.so /root/.mozilla/plugins/libnpjp2.so
ll /root/.mozilla/plugins/libnpjp2.so

Categories: Java, Linux Tags: ,

add another root user and set password

January 7th, 2014 No comments

In linux, do the following to add another root user and set password:

mkdir -p /home/root2
useradd -u 0 -o -g root -G root -s /bin/bash -d /home/root2 root2
echo password | passwd –stdin root2

Categories: Linux Tags:

debugging nfs problem with snoop in solaris

December 3rd, 2013 No comments

Network analyzers are ultimately the most useful tools available when it comes to debugging NFS problems. The snoop network analyzer bundled with Solaris was introduced in Section 13.5. This section presents an example of how to use snoop to resolve NFS-related problems.

Consider the case where the NFS client rome attempts to access the contents of the filesystems exported by the server zeus through the /net automounter path:

rome% ls -la /net/zeus/export
total 5
dr-xr-xr-x   3 root     root           3 Jul 31 22:51 .
dr-xr-xr-x   2 root     root           2 Jul 31 22:40 ..
drwxr-xr-x   3 root     other        512 Jul 28 16:48 eng
dr-xr-xr-x   1 root     root           1 Jul 31 22:51 home
rome% ls /net/zeus/export/home
/net/zeus/export/home: Permission denied

 

The client is not able to open the contents of the directory /net/zeus/export/home, although the directory gives read and execute permissions to all users:

Code View: Scroll / Show All
rome% df -k /net/zeus/export/home
filesystem            kbytes    used   avail capacity  Mounted on
-hosts                     0       0       0     0%    /net/zeus/export/home

 

The df command shows the -hosts automap mounted on the path of interest. This means that the NFS filesystem rome:/export/home has not yet been mounted. To investigate the problem further, snoopis invoked while the problematic ls command is rerun:

Code View: Scroll / Show All
 rome# snoop -i /tmp/snoop.cap rome zeus
  1   0.00000      rome -> zeus      PORTMAP C GETPORT prog=100003 (NFS) vers=3 
proto=UDP
  2   0.00314      zeus -> rome      PORTMAP R GETPORT port=2049
  3   0.00019      rome -> zeus      NFS C NULL3
  4   0.00110      zeus -> rome      NFS R NULL3 
  5   0.00124      rome -> zeus      PORTMAP C GETPORT prog=100005 (MOUNT) vers=1 
proto=TCP
  6   0.00283      zeus -> rome      PORTMAP R GETPORT port=33168
  7   0.00094      rome -> zeus      TCP D=33168 S=49659 Syn Seq=1331963017 Len=0 
Win=24820 Options=<nop,nop,sackOK,mss 1460>
  8   0.00142      zeus -> rome      TCP D=49659 S=33168 Syn Ack=1331963018 
Seq=4025012052 Len=0 Win=24820 Options=<nop,nop,sackOK,mss 1460>
  9   0.00003      rome -> zeus      TCP D=33168 S=49659     Ack=4025012053 
Seq=1331963018 Len=0 Win=24820
 10   0.00024      rome -> zeus      MOUNT1 C Get export list
 11   0.00073      zeus -> rome      TCP D=49659 S=33168     Ack=1331963062 
Seq=4025012053 Len=0 Win=24776
 12   0.00602      zeus -> rome      MOUNT1 R Get export list 2 entries
 13   0.00003      rome -> zeus      TCP D=33168 S=49659     Ack=4025012173 
Seq=1331963062 Len=0 Win=24820
 14   0.00026      rome -> zeus      TCP D=33168 S=49659 Fin Ack=4025012173 
Seq=1331963062 Len=0 Win=24820
 15   0.00065      zeus -> rome      TCP D=49659 S=33168     Ack=1331963063 
Seq=4025012173 Len=0 Win=24820
 16   0.00079      zeus -> rome      TCP D=49659 S=33168 Fin Ack=1331963063 
Seq=4025012173 Len=0 Win=24820
 17   0.00004      rome -> zeus      TCP D=33168 S=49659     Ack=4025012174 
Seq=1331963063 Len=0 Win=24820
 18   0.00058      rome -> zeus      PORTMAP C GETPORT prog=100005 (MOUNT) vers=3 
proto=UDP
 19   0.00412      zeus -> rome      PORTMAP R GETPORT port=34582
 20   0.00018      rome -> zeus      MOUNT3 C Null
 21   0.00134      zeus -> rome      MOUNT3 R Null 
 22   0.00056      rome -> zeus      MOUNT3 C Mount /export/home
 23   0.23112      zeus -> rome      MOUNT3 R Mount Permission denied

 

Packet 1 shows the client rome requesting the port number of the NFS service (RPC program number 100003, Version 3, over the UDP protocol) from the server’s rpcbind (portmapper). Packet 2 shows the server’s reply indicating nfsd is running on port 2049. Packet 3 shows the automounter’s call to the server’s nfsd daemon to verify that it is indeed running. The server’s successful reply is shown in packet 4. Packet 5 shows the client’s request for the port number for RPC program number 100005, Version 1, over TCP (the RPC MOUNT program). The server replies with packet 6 with port=33168. Packets 7 through 9 are TCP hand shaking between our NFS client and the server’s mountd. Packet 10 shows the client’s call to the server’s mountd daemon (which implements the MOUNT program) currently running on port 33168. The client is requesting the list of exported entries. The server replies with packet 12 including the names of the two entries exported. Packets 18 and 19 are similar to packets 5 and 6, except that this time the client is asking for the port number of the MOUNT program version 3 running over UDP. Packet 20 and 21 show the client verifying that version 3 of the MOUNT service is up and running on the server. Finally, the client issues the Mount /export/home request to the server in packet 22, requesting the filehandle of the /export/home path. The server’s mountd daemon checks its export list, and determines that the host rome is not present in it and replies to the client with a “Permission Denied” error in packet 23.

The analysis indicates that the “Permission Denied” error returned to the ls command came from the MOUNT request made to the server, not from problems with directory mode bits on the client. Having gathered this information, we study the exported list on the server and quickly notice that the filesystem /export/home is exported only to the host verona:

rome$ showmount -e zeus
export list for zeus:
/export/eng  (everyone)
/export/home verona

 

We could have obtained the same information by inspecting the contents of packet 12, which contains the export list requested during the transaction:

Code View: Scroll / Show All
rome# snoop -i /tmp/cap -v -p 10,12
...
      Packet 10 arrived at 3:32:47.73
RPC:  ----- SUN RPC Header -----
RPC:  
RPC:  Record Mark: last fragment, length = 40
RPC:  Transaction id = 965581102
RPC:  Type = 0 (Call)
RPC:  RPC version = 2
RPC:  Program = 100005 (MOUNT), version = 1, procedure = 5
RPC:  Credentials: Flavor = 0 (None), len = 0 bytes
RPC:  Verifier   : Flavor = 0 (None), len = 0 bytes
RPC:  
MOUNT:----- NFS MOUNT -----
MOUNT:
MOUNT:Proc = 5 (Return export list)
MOUNT:
...
       Packet 12 arrived at 3:32:47.74
RPC:  ----- SUN RPC Header -----
RPC:  
RPC:  Record Mark: last fragment, length = 92
RPC:  Transaction id = 965581102
RPC:  Type = 1 (Reply)
RPC:  This is a reply to frame 10
RPC:  Status = 0 (Accepted)
RPC:  Verifier   : Flavor = 0 (None), len = 0 bytes
RPC:  Accept status = 0 (Success)
RPC:  
MOUNT:----- NFS MOUNT -----
MOUNT:
MOUNT:Proc = 5 (Return export list)
MOUNT:Directory = /export/eng
MOUNT:Directory = /export/home
MOUNT: Group = verona
MOUNT:

 

For simplicity, only the RPC and NFS Mount portions of the packets are shown. Packet 10 is the request for the export list, packet 12 is the reply. Notice that every RPC packet contains the transaction ID (XID), the message type (call or reply), the status of the call, and the credentials. Notice that the RPC header includes the string “This is a reply to frame 10″. This is not part of the network packet. Snoopkeeps track of the XIDs it has processed and attempts to match calls with replies and retransmissions. This feature comes in very handy during debugging. The Mount portion of packet 12 shows the list of directories exported and the group of hosts to which they are exported. In this case, we can see that /export/home was only exported with access rights to the host verona. The problem can be fixed by adding the host rome to the export list on the server.

PS:

explain solaris snoop network analyzer with example

December 2nd, 2013 No comments

Here’s the code:

# snoop -i /tmp/capture -v -p 3
ETHER:  ----- Ether Header -----
ETHER:  
ETHER:  Packet 3 arrived at 15:08:43.35
ETHER:  Packet size = 82 bytes
ETHER:  Destination = 0:0:c:7:ac:56, Cisco
ETHER:  Source      = 8:0:20:b9:2b:f6, Sun
ETHER:  Ethertype = 0800 (IP)
ETHER:  
IP:   ----- IP Header -----
IP:   
IP:   Version = 4
IP:   Header length = 20 bytes
IP:   Type of service = 0x00
IP:         xxx. .... = 0 (precedence)
IP:         ...0 .... = normal delay
IP:         .... 0... = normal throughput
IP:         .... .0.. = normal reliability
IP:   Total length = 68 bytes
IP:   Identification = 35462
IP:   Flags = 0x4
IP:         .1.. .... = do not fragment
IP:         ..0. .... = last fragment
IP:   Fragment offset = 0 bytes
IP:   Time to live = 255 seconds/hops
IP:   Protocol = 17 (UDP)
IP:   Header checksum = 4503
IP:   Source address = 131.40.52.223, caramba
IP:   Destination address = 131.40.52.27, mickey
IP:   No options
IP:   
UDP:  ----- UDP Header -----
UDP:  
UDP:  Source port = 55559
UDP:  Destination port = 2049 (Sun RPC)
UDP:  Length = 48 
UDP:  Checksum = 3685 
UDP:  
RPC:  ----- SUN RPC Header -----
RPC:  
RPC:  Transaction id = 969440111
RPC:  Type = 0 (Call)
RPC:  RPC version = 2
RPC:  Program = 100003 (NFS), version = 3, procedure = 0
RPC:  Credentials: Flavor = 0 (None), len = 0 bytes
RPC:  Verifier   : Flavor = 0 (None), len = 0 bytes
RPC:  
NFS:  ----- Sun NFS -----
NFS:  
NFS:  Proc = 0 (Null procedure)
NFS:

And let’s analyze this:

The Ethernet header displays the source and destination addresses as well as the type of information embedded in the packet. The IP layer displays the IP version number, flags, options, and address of the sender and recipient of the packet. The UDP header displays the source and destination ports, along with the length and checksum of the UDP portion of the packet. Embedded in the UDP frame is the RPC data. Every RPC packet has a transaction ID used by the sender to identify replies to its requests, and by the server to identify duplicate calls. The previous example shows a request from the host caramba to the server mickey. The RPC version = 2 refers to the version of the RPC protocol itself, the program number 100003 and Version 3 apply to the NFS service. NFS procedure 0 is always the NULL procedure, and is most commonly invoked with no authentication information. The NFS NULL procedure does not take any arguments, therefore none are listed in the NFS portion of the packet.

PS:

  1. Here’s more usage about snoop on solaris:

The amount of traffic on a busy network can be overwhelming, containing many irrelevant packets to the problem at hand. The use of filters reduces the amount of noise captured and displayed, allowing you to focus on relevant data. A filter can be applied at the time the data is captured, or at the time the data is displayed. Applying the filter at capture time reduces the amount of data that needs to be stored and processed during display. Applying the filter at display time allows you to further refine the previously captured information. You will find yourself applying different display filters to the same data set as you narrow the problem down, and isolate the network packets of interest.

Snoop uses the same syntax for capture and display filters. For example, the host filter instructs snoop to only capture packets with source or destination address matching the specified host:

Code View: Scroll / Show All
# snoop host caramba
Using device /dev/hme (promiscuous mode)
     caramba -> schooner     NFS C GETATTR3 FH=B083
    schooner -> caramba      NFS R GETATTR3 OK
     caramba -> schooner     TCP D=2049 S=1023     Ack=3647506101 Seq=2611574902 Len=0 Win=24820

 

In this example the host filter instructs snoop to capture packets originating at or addressed to the host caramba. You can specify the IP address or the hostname, and snoop will use the name service switch to do the conversion. Snoop assumes that the hostname specified is an IPv4 address. You can specify an IPv6 address by using the inet6 qualifier in front of the host filter:

Code View: Scroll / Show All
# snoop inet6 host caramba
Using device /dev/hme (promiscuous mode)
     caramba -> 2100::56:a00:20ff:fea0:3390    ICMPv6 Neighbor advertisement
2100::56:a00:20ff:fea0:3390 -> caramba         ICMPv6 Echo request (ID: 1294 Sequence number: 0)
     caramba -> 2100::56:a00:20ff:fea0:3390    ICMPv6 Echo reply (ID: 1294 Sequence number: 0)

 

You can restrict capture of traffic addressed to the specified host by using the to or dst qualifier in front of the host filter:

# snoop to host caramba
Using device /dev/hme (promiscuous mode)
    schooner -> caramba      RPC R XID=1493500696 Success
    schooner -> caramba      RPC R XID=1493500697 Success
    schooner -> caramba      RPC R XID=1493500698 Success

 

Similarly you can restrict captured traffic to only packets originating from the specified host by using the from or src qualifier:

Code View: Scroll / Show All
# snoop from host caramba
Using device /dev/hme (promiscuous mode)
     caramba -> schooner     NFS C GETATTR3 FH=B083
     caramba -> schooner     TCP D=2049 S=1023     Ack=3647527137 Seq=2611841034 Len=0 Win=24820

 

Note that the host keyword is not required when the specified hostname does not conflict with the name of another snoop primitive.The previous snoop from host caramba command could have been invoked without the host keyword and it would have generated the same output:

Code View: Scroll / Show All
 
					# snoop from caramba 
Using device /dev/hme (promiscuous mode)
     caramba -> schooner     NFS C GETATTR3 FH=B083
     caramba -> schooner     TCP D=2049 S=1023     Ack=3647527137 Seq=2611841034 Len=0 Win=24820

 

For clarity, we use the host keyword throughout this book. Two or more filters can be combined by using the logical operators and and or :

# snoop -o /tmp/capture -c 20 from host caramba and rpc nfs 3
Using device /dev/hme (promiscuous mode)
20 20 packets captured

 

Snoop captures all NFS Version 3 packets originating at the host caramba. Here, snoop is invoked with the -c and -o options to save 20 filtered packets into the /tmp/capture file. We can later apply other filters during display time to further analyze the captured information. For example, you may want to narrow the previous search even further by only listing TCP traffic by using the proto filter:

# snoop -i /tmp/capture proto tcp
Using device /dev/hme (promiscuous mode)
  1   0.00000     caramba -> schooner    NFS C GETATTR3 FH=B083
  2   2.91969     caramba -> schooner    NFS C GETATTR3 FH=0CAE
  9   0.37944     caramba -> rea         NFS C FSINFO3 FH=0156
 10   0.00430     caramba -> rea         NFS C GETATTR3 FH=0156
 11   0.00365     caramba -> rea         NFS C ACCESS3 FH=0156 (lookup)
 14   0.00256     caramba -> rea         NFS C LOOKUP3 FH=F244 libc.so.1
 15   0.00411     caramba -> rea         NFS C ACCESS3 FH=772D (lookup)

 

Snoop reads the previously filtered data from /tmp/capture, and applies the new filter to only display TCP traffic. The resulting output is NFS traffic originating at the host caramba over the TCP protocol. We can apply a UDP filter to the same NFS traffic in the /tmp/capture file and obtain the NFS Version 3 traffic over UDP from host caramba without affecting the information in the /tmp/capture file:

# snoop -i /tmp/capture proto udp
Using device /dev/hme (promiscuous mode)
  1   0.00000      caramba -> rea          NFS C NULL3

 

So far, we’ve presented filters that let you specify the information you are interested in. Use the not operator to specify the criteria of packets that you wish to have excluded during capture. For example, you can use the not operator to capture all network traffic, except that generated by the remote shell:

Code View: Scroll / Show All
# snoop not port login
Using device /dev/hme (promiscuous mode)
      rt-086 -> BROADCAST        RIP R (25 destinations)
      rt-086 -> BROADCAST        RIP R (10 destinations)
     caramba -> schooner         NFS C GETATTR3 FH=B083
    schooner -> caramba          NFS R GETATTR3 OK
     caramba -> donald           NFS C GETATTR3 FH=00BD
    jamboree -> donald           NFS R GETATTR3 OK
     caramba -> donald           TCP D=2049 S=657     Ack=3855205229 Seq=2331839250 Len=0 Win=24820
     caramba -> schooner         TCP D=2049 S=1023    Ack=3647569565 Seq=2612134974 Len=0 Win=24820
     narwhal -> 224.2.127.254    UDP D=9875 S=32825 LEN=368

 

On multihomed hosts (systems with more than one network interface device), use the -d option to specify the particular network interface to snoop on:

snoop -d hme2

 

You can snoop on multiple network interfaces concurrently by invoking separate instances of snoop on each device. This is particularly useful when you don’t know what interface the host will use to generate or receive the requests. The -d option can be used in conjunction with any of the other options and filters previously described:

# snoop -o /tmp/capture-hme0 -d hme0 not port login &
# snoop -o /tmp/capture-hme1 -d hme1 not port login &

2.This article is from book <Managing NFS and NIS, Second Edition>

rpc remote procedure call mechanism

December 2nd, 2013 No comments

The rpcbind daemon (also known as the portmapper),[8] exists to register RPC services and to provide their IP port numbers when given an RPC program number. rpcbind itself is an RPC service, but it resides at a well-known IP port (port 111) so that it may be contacted directly by remote hosts. For example, if host fred needs to mount a filesystem from host barney, it must send an RPC request to themountd daemon on barney. The mechanics of making the RPC request are as follows:

[8] The rpcbind daemon and the old portmapper provide the same RPC service. The portmapper implements Version 2 of the portmap protocol (RPC program number 100000), where the rpcbind daemon implements Versions 3 and 4 of the protocol, in addition to Version 2. This means that the rpcbind daemon already implements the functionality provided by the old portmapper. Due to this overlap in functionality and to add to the confusion, many people refer to the rpcbind daemon as the portmapper.

  • fred gets the IP address for barney, using the ipnodes NIS map. fred also looks up the RPC program number for mountd in the rpc NIS map. The RPC program number for mountd is 100005.
  • Knowing that the portmapper lives at port 111, fred sends an RPC request to the portmapper on barney, asking for the IP port (on barney) of RPC program 100005. fred also specifies the particular protocol and version number for the RPC service. barney ‘s portmapper responds to the request with port 704, the IP port at which mountd is listening for incoming mount RPC requests over the specified protocol. Note that it is possible for the portmapper to return an error, if the specified program does not exist or if it hasn’t been registered on the remote host. barney, for example, might not be an NFS server and would therefore have no reason to run the mountd daemon.
  • fred sends a mount RPC request to barney, using the IP port number returned by the portmapper. This RPC request contains an RPC procedure number, which tells the mountd daemon what to do with the request. The RPC request also contains the parameters for the procedure, in this case, the name of the filesystem fred needs to mount.

Note: this is from book <Managing NFS and NIS, Second Edition>

Categories: Kernel, Linux, Network Tags:

resolved – mount clntudp_create: RPC: Program not registered

December 2nd, 2013 No comments

When I did a showmount -e localhost, error occured:

[root@centos-doxer ~]# showmount -e localhost
mount clntudp_create: RPC: Program not registered

So I checked what RPC program number of showmount was using:

[root@centos-doxer ~]# grep showmount /etc/rpc
mountd 100005 mount showmount

As this indicated, we should startup mountd daemon to make showmount -e localhost work. And mountd is part of nfs, so I started up nfs:

[root@centos-doxer ~]# /etc/init.d/nfs start
Starting NFS services: [ OK ]
Starting NFS quotas: [ OK ]
Starting NFS daemon: [ OK ]
Starting NFS mountd: [ OK ]

Now as mountd was running, showmount -e localhost should work.

 

Categories: Kernel, Linux, Network Tags:

Difference between Computer Configuration settings and User Configuration settings in Active Directory Policy Editor

November 22nd, 2013 No comments
  • Computer Configuration settings are applied to computer accounts at startup and during the background refresh interval.
  • User Configuration settings are applied to the user accounts logon and during the background refresh interval.
Categories: Windows Tags:

resolved – sshd: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost= user=

November 20th, 2013 No comments

Today when I tried to log on one linux server with a normal account, errors were found in /var/log/secure:

Nov 20 07:43:39 test_linux sshd[11200]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.182.120.188 user=testuser
Nov 20 07:43:39 test_linux sshd[11200]: pam_ldap: error trying to bind (Invalid credentials)
Nov 20 07:43:42 test_linux sshd[11200]: nss_ldap: failed to bind to LDAP server ldaps://test.com:7501: Invalid credentials
Nov 20 07:43:42 test_linux sshd[11200]: nss_ldap: failed to bind to LDAP server ldap://test.com: Invalid credentials
Nov 20 07:43:42 test_linux sshd[11200]: nss_ldap: could not search LDAP server – Server is unavailable
Nov 20 07:43:42 test_linux sshd[11200]: nss_ldap: failed to bind to LDAP server ldaps://test.com:7501: Invalid credentials
Nov 20 07:43:43 test_linux sshd[11200]: nss_ldap: failed to bind to LDAP server ldap://test.com: Invalid credentials
Nov 20 07:43:43 test_linux sshd[11200]: nss_ldap: could not search LDAP server – Server is unavailable
Nov 20 07:43:55 test_linux sshd[11200]: pam_ldap: error trying to bind (Invalid credentials)
Nov 20 07:43:55 test_linux sshd[11200]: Failed password for testuser from 10.182.120.188 port 34243 ssh2
Nov 20 07:43:55 test_linux sshd[11201]: fatal: Access denied for user testuser by PAM account configuration

After some attempts on linux PAM(sshd, system-auth), I still got nothing. Later, I checked /etc/ldap.conf with one other box, and found the configuration on the problematic host was not right.

I copied the right ldap.conf and tried log on later, and the issue resolved.

PS:

You can read more about linux PAM here http://www.linux-pam.org/Linux-PAM-html/ (I recommend having a reading on the System Administrators’ Guide as that may be the only one linux administrators can reach. You can also get a detailed info on some commonly used PAM modules such as pam_tally2.so, pam_unix.so, pam_cracklib, etc.)

Categories: Linux, Security, Systems Tags:

Enabling NIS on client hosts

November 19th, 2013 No comments

Once you have one or more NIS servers running ypserv, you can set up NIS clients that query them. Make sure you do not enable NIS on any clients until you have at least one NIS server up and running. If no servers are available, the host that attempts to run as an NIS client will hang.

To enable NIS on a client host, first set up the nsswitch.conf file:

newclient# cp /etc/nsswitch.nis /etc/nsswitch.conf

 

Set up the domain name:

newclient# domainname bedrock
newclient# domainname > /etc/defaultdomain

 

Run ypinit:

newclient# /usr/sbin/ypinit -c

 

You will be prompted for a list of NIS servers. Enter the servers in order of proximity to the client.

Kill (if necessary) ypbind, and restart it:

newclient# ps -ef | grep ypbind
newclient# /usr/lib/netsvc/yp/ypstop
newclient# /usr/lib/netsvc/yp/ypstart

 

Once NIS is running, references to the basic administrative files are handled in two fundamentally different ways, depending on how nsswitch.conf is configured:

  • The NIS database replaces some files. Local copies of replaced files (ethers, hosts, netmasks, netgroups,[3] networks, protocols, rpc, and services) are ignored as soon as the ypbind daemon is started (to enable NIS).

    [3] The netgroups file is a special case. Netgroups are only meaningful when NIS is running, in which case the netgroups map (rather than the file) is consulted. The netgroups file is therefore only used to build the netgroups map; it is never “consulted” in its own right.

  • Some files are augmented, or appended to, by NIS. Files that are appended, or augmented, by NIS are consulted before the NIS maps are queried. The default/etc/nsswitch.conf file for NIS has these appended files: aliases, auto_*, group, passwd, services, and shadow. These files are read first, and if an appropriate entry isn’t found in the local file, the corresponding NIS map is consulted. For example, when a user logs in, an NIS client will first look up the user’s login name in the localpasswd file; if it does not find anything that matches, it will refer to the NIS passwd map.

Although the replaced files aren’t consulted once NIS is running, they shouldn’t be deleted. In particular, the /etc/hosts file is used by an NIS client during the boot process, before it starts NIS, but is ignored as soon as NIS is running. The NIS client needs a “runt” hosts file during the boot process so that it can configure itself and get NIS running. Administrators usually truncate hosts to the absolute minimum: entries for the host itself and the “loopback” address. Diskless nodes need additional entries for the node’s boot server and the server for the diskless node’s /usr filesystem. Trimming the hosts file to these minimal entries is a good idea because, for historical reasons, many systems have extremely long host tables. Other files, like rpc, services, and protocols, could probably be eliminated, but it’s safest to leave the files distributed with your system untouched; these will certainly have enough information to get your system booted safely, particularly if NIS stops running for some reason. However, you should make any local additions to these files on the master server alone. You don’t need to bother keeping the slaves and clients up to date.

PS:

This is from book <Managing NFS and NIS, Second Edition>

Categories: IT Architecture, Network, Systems Tags:

resolved – kernel panic not syncing: Fatal exception Pid: comm: not Tainted

November 13th, 2013 No comments

We’re install IDM OAM today and the linux server panic every time we run the startup script. Server panic info was like this:

Pid: 4286, comm: emdctl Not tainted 2.6.32-300.29.1.el5uek #1
Process emdctl (pid: 4286, threadinfo ffff88075bf20000, task ffff88073d0ac480)
Stack:
ffff88075bf21958 ffffffffa02b1769 ffff88075bf21948 ffff8807cdcce500
<0> ffff88075bf95cc8 ffff88075bf95ee0 ffff88075bf21998 ffffffffa01fd5c6
<0> ffffffffa02b1732 ffff8807bc2543f0 ffff88075bf95cc8 ffff8807bc2543f0
Call Trace:
[<ffffffffa02b1769>] nfs3_xdr_writeargs+0×37/0x7a [nfs]
[<ffffffffa01fd5c6>] rpcauth_wrap_req+0x7f/0x8b [sunrpc]
[<ffffffffa02b1732>] ? nfs3_xdr_writeargs+0×0/0x7a [nfs]
[<ffffffffa01f612a>] call_transmit+0×199/0x21e [sunrpc]
[<ffffffffa01fc8ba>] __rpc_execute+0×85/0×270 [sunrpc]
[<ffffffffa01fcae2>] rpc_execute+0×26/0x2a [sunrpc]
[<ffffffffa01f5546>] rpc_run_task+0×57/0x5f [sunrpc]
[<ffffffffa02abd86>] nfs_write_rpcsetup+0x20b/0x22d [nfs]
[<ffffffffa02ad1e8>] nfs_flush_one+0×97/0xc3 [nfs]
[<ffffffffa02a86b4>] nfs_pageio_doio+0×37/0×60 [nfs]
[<ffffffffa02a87c5>] nfs_pageio_complete+0xe/0×10 [nfs]
[<ffffffffa02ac264>] nfs_writepages+0xa7/0xe4 [nfs]
[<ffffffffa02ad151>] ? nfs_flush_one+0×0/0xc3 [nfs]
[<ffffffffa02acd2e>] nfs_write_mapping+0×63/0x9e [nfs]
[<ffffffff810f02fe>] ? __pmd_alloc+0x5d/0xaf
[<ffffffffa02acd9c>] nfs_wb_all+0×17/0×19 [nfs]
[<ffffffffa029f6f7>] nfs_do_fsync+0×21/0x4a [nfs]
[<ffffffffa029fc9c>] nfs_file_flush+0×67/0×70 [nfs]
[<ffffffff81117025>] filp_close+0×46/0×77
[<ffffffff81059e6b>] put_files_struct+0x7c/0xd0
[<ffffffff81059ef9>] exit_files+0x3a/0x3f
[<ffffffff8105b240>] do_exit+0×248/0×699
[<ffffffff8100e6a1>] ? xen_force_evtchn_callback+0xd/0xf
[<ffffffff8106898a>] ? freezing+0×13/0×15
[<ffffffff8105b731>] sys_exit_group+0×0/0x1b
[<ffffffff8106bd03>] get_signal_to_deliver+0×303/0×328
[<ffffffff8101120a>] do_notify_resume+0×90/0x6d7
[<ffffffff81459f06>] ? kretprobe_table_unlock+0x1c/0x1e
[<ffffffff8145ac6f>] ? kprobe_flush_task+0×71/0x7c
[<ffffffff8103164c>] ? paravirt_end_context_switch+0×17/0×31
[<ffffffff81123e8f>] ? path_put+0×22/0×27
[<ffffffff8101207e>] int_signal+0×12/0×17
Code: 55 48 89 e5 0f 1f 44 00 00 48 8b 06 0f c8 89 07 48 8b 46 08 0f c8 89 47 04 c9 48 8d 47 08 c3 55 48 89 e5 0f 1f 44 00 00 48 0f ce <48> 89 37 c9 48 8d 47 08 c3 55 48 89 e5 53 0f 1f 44 00 00 f6 06
RIP [<ffffffffa02b03c3>] xdr_encode_hyper+0xc/0×15 [nfs]
RSP <ffff88075bf21928>
—[ end trace 04ad5382f19cf8ad ]—
Kernel panic – not syncing: Fatal exception
Pid: 4286, comm: emdctl Tainted: G D 2.6.32-300.29.1.el5uek #1
Call Trace:
[<ffffffff810579a2>] panic+0xa5/0×162
[<ffffffff81450075>] ? threshold_create_device+0×242/0x2cf
[<ffffffff8100ed2f>] ? xen_restore_fl_direct_end+0×0/0×1
[<ffffffff814574b0>] ? _spin_unlock_irqrestore+0×16/0×18
[<ffffffff810580f5>] ? release_console_sem+0×194/0x19d
[<ffffffff810583be>] ? console_unblank+0x6a/0x6f
[<ffffffff8105766f>] ? print_oops_end_marker+0×23/0×25
[<ffffffff814583a6>] oops_end+0xb7/0xc7
[<ffffffff8101565d>] die+0x5a/0×63
[<ffffffff81457c7c>] do_trap+0×115/0×124
[<ffffffff81013731>] do_alignment_check+0×99/0xa2
[<ffffffff81012cb5>] alignment_check+0×25/0×30
[<ffffffffa02b03c3>] ? xdr_encode_hyper+0xc/0×15 [nfs]
[<ffffffffa02b06be>] ? xdr_encode_fhandle+0×15/0×17 [nfs]
[<ffffffffa02b1769>] nfs3_xdr_writeargs+0×37/0x7a [nfs]
[<ffffffffa01fd5c6>] rpcauth_wrap_req+0x7f/0x8b [sunrpc]
[<ffffffffa02b1732>] ? nfs3_xdr_writeargs+0×0/0x7a [nfs]
[<ffffffffa01f612a>] call_transmit+0×199/0x21e [sunrpc]
[<ffffffffa01fc8ba>] __rpc_execute+0×85/0×270 [sunrpc]
[<ffffffffa01fcae2>] rpc_execute+0×26/0x2a [sunrpc]
[<ffffffffa01f5546>] rpc_run_task+0×57/0x5f [sunrpc]
[<ffffffffa02abd86>] nfs_write_rpcsetup+0x20b/0x22d [nfs]
[<ffffffffa02ad1e8>] nfs_flush_one+0×97/0xc3 [nfs]
[<ffffffffa02a86b4>] nfs_pageio_doio+0×37/0×60 [nfs]
[<ffffffffa02a87c5>] nfs_pageio_complete+0xe/0×10 [nfs]
[<ffffffffa02ac264>] nfs_writepages+0xa7/0xe4 [nfs]
[<ffffffffa02ad151>] ? nfs_flush_one+0×0/0xc3 [nfs]
[<ffffffffa02acd2e>] nfs_write_mapping+0×63/0x9e [nfs]
[<ffffffff810f02fe>] ? __pmd_alloc+0x5d/0xaf
[<ffffffffa02acd9c>] nfs_wb_all+0×17/0×19 [nfs]
[<ffffffffa029f6f7>] nfs_do_fsync+0×21/0x4a [nfs]
[<ffffffffa029fc9c>] nfs_file_flush+0×67/0×70 [nfs]
[<ffffffff81117025>] filp_close+0×46/0×77
[<ffffffff81059e6b>] put_files_struct+0x7c/0xd0
[<ffffffff81059ef9>] exit_files+0x3a/0x3f
[<ffffffff8105b240>] do_exit+0×248/0×699
[<ffffffff8100e6a1>] ? xen_force_evtchn_callback+0xd/0xf
[<ffffffff8106898a>] ? freezing+0×13/0×15
[<ffffffff8105b731>] sys_exit_group+0×0/0x1b
[<ffffffff8106bd03>] get_signal_to_deliver+0×303/0×328
[<ffffffff8101120a>] do_notify_resume+0×90/0x6d7
[<ffffffff81459f06>] ? kretprobe_table_unlock+0x1c/0x1e
[<ffffffff8145ac6f>] ? kprobe_flush_task+0×71/0x7c
[<ffffffff8103164c>] ? paravirt_end_context_switch+0×17/0×31
[<ffffffff81123e8f>] ? path_put+0×22/0×27
[<ffffffff8101207e>] int_signal+0×12/0×17

We tried a lot(application coredump, kdump etc) but still not got solution until we notice that there were a lot of nfs related message in the kernel panic info(marked as red above).

As our linux server was not using NFS or autofs, so we tried upgrade nfs client(nfs-utils) and disabled autofs:

yum update nfs-utils

chkconfig autofs off

After this, the startup for IDM succeeded, and no server panic found anymore!

Categories: Kernel, Linux Tags: ,

An Introduction to Active Directory Basics

November 12th, 2013 No comments

Before we get started covering Active Directory, we’ll lay the foundation with some basics. These
definitions aren’t completely comprehensive but will give you the foundation you need to under-
stand the topics in this chapter. Although there are a lot of terms to grasp, no term is that complex.
We’ll define them here with a short introduction and often expand on them later.

 

  • Workgroup

A workgroup is a group of users connected in a local area network (LAN) but

with each computer having its own user accounts. A user who can log onto one computer will
need a different user account to log onto a different computer, which can become a problem.
A single user who needs to access several computers will have several different user accounts,
often with different passwords.

Workgroups are often used in organizations with fewer than 10 computers. As more computers
are added, a decentralized workgroup becomes harder to manage and administer, requiring it
to be promoted to a domain.

  • Domain

When an organization becomes too big for a workgroup, a domain is created by
running the domain controller promotion wizard (DCPromo) on a server and promoting the
server to a domain controller. A domain controller is a server that hosts a copy of Active
Directory Domain Services.

  • Active Directory Domain Services

Active Directory Domain Services (AD DS) is used to
provide several services to an organization. At its core, it’s a big database of objects (such as
users, computers, and groups) and is used to centrally organize and manage all the objects
within an organization. A single user would have a single user account in Active Directory
and can use this single account to access multiple computers in the organization. This is often
referred to as single sign-on.
Additional services include the ability to easily search AD DS so that objects can easily be
located, as well as secure authentication using Kerberos.
Copies of Active Directory are kept on domain controllers. It’s very common to have at least two
domain controllers for redundancy purposes in case one goes down. Any changes to Active
Directory are passed to each of the domain controllers using a process called replication.

  • Replication

When any object (such as a user account) is added, deleted, or modified within
Active Directory, the change is sent to all other domain controllers (DCs) in the domain. When
a business is located in a single location, the changes are sent to all other DCs within a minute.
Modifications can be done on any DC. The initial change is sent from the DC where the change
was created to other DCs (designated as replication partners) within 15 seconds. If there are
more than four DCs in the organization, they are automatically organized in a logical circle,
and the change is replicated through the replication circle until all the DCs have the change.

  • Objects

Objects within AD are used to represent real-world items. Common objects are
user objects and computer objects that represent people and their computers. The objects can
be managed and administered using AD DS. For example, to represent a user named Sally,
a user account object is created. Sally can then use this account to log onto the domain and
access domain resources such as files, folders, printers, and email. Although we would often
say that we give Sally permission to access the resources, we actually give Sally’s user object
permission to access the resources. Similarly, a computer account object is created to repre-
sent Sally’s computer. All objects have properties that can be configured such as the user’s
first name, last name, display name, logon name, and password for a user object.
The types of objects and their properties are predefined. You won’t find a kitchen-sink object
in AD DS, and you won’t find a favorite color property for users—at least not by default. All
objects that can be added to AD DS and the properties used to define these objects are specified
in the schema.

  • Schema

The schema is the definition of all the object types that Active Directory can
contain, and it includes a list of properties that can be used to describe the objects. You
can think of the schema as a set of blueprints for each of the objects. Just as a blueprint for
a house can be used to create a house, a schema definition for a user object can be used to
create a user object.

Only objects that are defined by the schema can be added to Active Directory, and these objects
can be described only by properties defined and identified by the schema. It’s common for
the schema to be modified a few times in the lifetime of an Active Directory enterprise. For
example, to install Exchange Server 2007 (for mail), the schema must be modified to accept the
different objects and properties required by Exchange. Modifying the schema is often referred
to as extending the schema.

  • Organizational units

Organizational units are used to organize objects within Active
Directory. You can think of an OU simply as a container for the objects. By placing the objects
in different containers, they are easier to manage. For example, you can create a Sales OU and
place all the objects representing users and computers in the sales department in the Sales OU.
OUs have two distinct benefits. You can delegate permissions to an OU, and you can link Group
Policy to an OU. As an example, Maria may be responsible for administration for all users and
computers in the sales department. If these objects were placed in the Sales OU, Maria could
be delegated permission to administer the OU, and it would include all the objects in the OU.
Similarly, you can use Group Policy to apply different settings and configurations to all the user
and computer objects in an OU by applying a single Group Policy object to the OU.

  • Group Policy

Group Policy allows you to configure a setting once and have it apply to
many user and/or computer objects. For example, if you want to ensure all the computers in
the sales department have their firewall enabled, you could place the computers in an OU
and call it Sales, configure a Group Policy object (GPO) that enables the firewall, and link the
policy to the Sales OU. It doesn’t matter if there are five computers in the OU or 5,000; a GPO
will apply the setting to all the computers in the OU.
You can link GPOs to OUs, entire domains, or sites. When linked, a GPO applies to all the
objects within the OU, domain, or site. For example, if you want all users in the entire domain
to have firewalls enabled, instead of linking the GPO to the site, you’d link it to the domain. Two
default GPOs are created when a domain is created: the default domain policy and the default
domain controllers policy.

  • Default domain policy

The default domain policy is a preconfigured GPO that is added
when a domain is created and linked at the domain level. Settings within the default domain
policy apply to all user and computer objects within the domain. This policy starts with some
basic security settings such as requirements for passwords but can be modified as desired.

  • Default domain controllers policy

The default domain controller policy is a preconfig-
ured GPO that is added when a domain is created and linked at the Domain Controllers
OU level. The Domain Controllers OU is created when a domain is created, and all domain
controllers are automatically placed in this OU when they are promoted to a DC. Since the
default domain controller policy is linked to the Domain Controllers OU, it applies to all
domain controllers.

  • Site

A site is a group of well-connected computers and is sometimes referred to as a group
of well-connected subnets. Small to medium-sized businesses often operate out of a single
location, and all the computers in this location are connected via a single LAN. This is a site.
If a remote office is created and connected via a slower connection, it could be configured as
a site. The remote office is well connected within the remote office but not well connected to
the main office. Sites are explored in much more depth in Chapter 21.

  • Forest

A forest is a group of one or more domains that share a common Active Directory.
A single forest will have only one schema (only one definition of objects that can be created)
and only one global catalog.

  • Global catalog

The global catalog (GC) is a listing of all the objects in the entire forest. It is
easily searchable and is often used by different applications to search AD DS for specific objects.
The global catalog is hosted on domain controllers that are designated as GC servers. Since there
is only one GC for a forest and a forest can include multiple domains, it can become quite large.
To limit its size, objects in the GC have only a subset of properties included. For example, a user
account may have 100 properties to describe it, but only about 10 are included in the GC.

  • Tree

A tree is a group of domains with a common namespace. That simply means the two-
part root domain name is common to other domains in the tree. The first domain in the forest
may be called Bigfirm.com. A child domain could be created named sales.bigfirm.com. Notice
the common name (Bigfirm.com). It is possible to create a separate tree within a forest. For
example, another domain could be created named littlefirm.com. It’s not the same namespace,
but since it is in the same forest, it would share a common schema and global catalog.

Note: this is from book Mastering Windows Server® 2008 R2

Categories: Windows Tags:

make ssh on linux not to disconnect after some certain time

November 1st, 2013 No comments

You connect to a linux box through ssh, and sometimes you just found ssh “hangs” there or just disconnected. That’s what ssh configuration on server makes this happen.

You can do the following to make this disconnection time long enough so that you get across this annoying issue:

cp /etc/ssh/sshd_config{,.bak30}
sed -i ‘/ClientAliveInterval/ s/^/# /’ /etc/ssh/sshd_config
sed -i ‘/ClientAliveCountMax/ s/^/# /’ /etc/ssh/sshd_config
echo ‘ClientAliveInterval 30′ >> /etc/ssh/sshd_config
echo ‘TCPKeepAlive yes’ >> /etc/ssh/sshd_config
echo ‘ClientAliveCountMax 99999′ >> /etc/ssh/sshd_config
/etc/init.d/sshd restart

Enjoy!

Categories: Linux Tags:

make sudo asking for no password on linux

November 1st, 2013 No comments

Assuming that you have a user named ‘test’, and he belongs to ‘admin’ group. So you want user test can sudo to root, and don’t want linux prompting for password. Here’s the way you can do it:

cp /etc/sudoers{,.bak}
sed -i ‘/%admin/ s/^/# /’ /etc/sudoers
echo ‘%admin ALL=(ALL) NOPASSWD: ALL’ >> /etc/sudoers

Enjoy!

Categories: Linux, Security Tags:

disable linux strong password policy

November 1st, 2013 No comments

You may enable strong password policy for linux, and can disable it of course. So here’s the way if you want to disable it:

cp /etc/pam.d/system-auth{,.bak}
sed -i ‘/pam_cracklib.so/ s/^/# /’ /etc/pam.d/system-auth
sed -i ‘s/use_authtok//’ /etc/pam.d/system-auth
echo “password” | passwd –stdin username

PS:

  1. To enable strong password for linux, you can have a try on this http://goo.gl/uwdbN
  2. You can read more about linux pam here http://www.linux-pam.org/Linux-PAM-html/
Categories: Linux, Security Tags:

make tee to copy stdin as well as stderr & prevent ESC output of script

October 30th, 2013 No comments
  • Make tee to copy stdin as well as stderr

As said by manpage of tee:

read from standard input and write to standard output and files

So if you have error messages in your script, then the error messages will not copied and write to file.

Here’s one workaround for this:

./aaa.sh 2>&1 | tee -a log

Or you can use the more complicated one:

command > >(tee stdout.log) 2> >(tee stderr.log >&2)

  • Prevent ESC output of script

script literally captures every type of output that was sent to the screen. If you have colored or bold output, this shows up as esc characters within the output file. These characters can significantly clutter the output and are not usually useful. If you set the TERM environmental variable to dumb (using setenv TERM dumb for csh-based shells and export TERM=dumb for sh-based shells), applications will not output the escape characters. This provides a more readable output.

In addition, the timing information provided by script clutters the output. Although it can be useful to have automatically generated timing information, it may be easier to not use script’s timing, and instead just time the important commands with the time command mentioned in the previous chapter.

PS:

  1. Here’s the full version http://stackoverflow.com/questions/692000/how-do-i-write-stderr-to-a-file-while-using-tee-with-a-pipe
  2. Some contents of this article is excerpted from <Optimizing Linux® Performance: A Hands-On Guide to Linux® Performance Tools>.
Categories: Linux, SHELL Tags:

use batch script to start up & shutdown Virtualbox VMs

October 28th, 2013 No comments

I woke up before 8 every day on weekdays, and want to poweron two VMs in virtualbox named “xp” and “win2008″. So I can write a script and put it in “startup” folder, then these two VMs will startup with system automatically:

@echo off
date /t | find “Mon” && goto 1
date /t | find “Tue” && goto 1
date /t | find “Wed” && goto 1
date /t | find “Thu” && goto 1
date /t | find “Fri” && goto 1
exit

:1
if %time:~0,2% leq 8 (
c:\VirtualBox\VBoxManage startvm win2008 –type gui
c:\VirtualBox\VBoxManage startvm xp –type gui
) else exit

And I also want to shutdown these two VMs in one run:

c:\VirtualBox\VBoxManage controlvm win2008 acpipowerbutton
c:\VirtualBox\VBoxManage controlvm xp acpipowerbutton

PS:

You may also consider group policy(gpedit.msc -> Computer Configuration -> Windows Settings -> Scripts -> Shutdown) in windows so that when you shutdown your pc, all VMs will turned off automatically if you have a GPO for shutdown. More in

 

Categories: Programming, Windows Tags: ,

make label for swap device using mkswap and blkid

August 6th, 2013 No comments

If you want to label one swap partition in linux, you should not use e2label for this purpose. As e2label is for changing the label on an ext2/ext3/ext4 filesystem, which do not include swap filesystem.

If you use e2label for this, you will get the following error messages:

[root@node2 ~]# e2label /dev/xvda3 SWAP-VM
e2label: Bad magic number in super-block while trying to open /dev/xvda3
Couldn’t find valid filesystem superblock.

We should use mkswap for it. As mkswap has one option -L:

-L labelSpecify a label, to allow swapon by label. (Only for new style swap areas.)

So let’s see example below:

[root@node2 ~]# mkswap -L SWAP-VM /dev/xvda3
Setting up swapspace version 1, size = 2335973 kB
LABEL=SWAP-VM, no uuid

[root@node2 ~]# blkid
/dev/xvda1: LABEL=”/boot” UUID=”6c5ad2ad-bdf5-4349-96a4-efc9c3a1213a” TYPE=”ext3″
/dev/xvda2: LABEL=”/” UUID=”76bf0aaa-a58e-44cb-92d5-098357c9c397″ TYPE=”ext3″
/dev/xvdb1: LABEL=”VOL1″ TYPE=”oracleasm”
/dev/xvdc1: LABEL=”VOL2″ TYPE=”oracleasm”
/dev/xvdd1: LABEL=”VOL3″ TYPE=”oracleasm”
/dev/xvde1: LABEL=”VOL4″ TYPE=”oracleasm”
/dev/xvda3: LABEL=”SWAP-VM” TYPE=”swap”

[root@node2 ~]# swapon /dev/xvda3

[root@node2 ~]# swapon -s
Filename Type Size Used Priority
/dev/xvda3 partition 2281220 0 -1

So now we can add swap to /etc/fstab using LABEL=SWAP-VM:

LABEL=SWAP-VM           swap                    swap    defaults        0 0

Categories: Linux, Storage Tags: ,

get linux server fingerprint through using ssh-keygen

July 31st, 2013 No comments

Run ssh-keygen -lf  /etc/ssh/ssh_host_rsa_key.pub to get the server’s fingerprint which respond to:

$ ssh <test-host>
The authenticity of host ‘testhost[ip address]‘ can’t be established.
RSA key fingerprint is <xx:xx:xx….>
Are you sure you want to continue connecting (yes/no)? yes

Categories: Linux, Networking Security, Security Tags:

setup ntpd server on centos linux

July 31st, 2013 No comments

The first step of setting up ntp server on centos/redhat linux is to install ntp package:

yum install ntp

And here’s the content of /etc/ntp.conf

driftfile /etc/ntp/drift

keys /etc/ntp/keys
#restrict default kod nomodify notrap nopeer noquery
#restrict -6 default kod nomodify notrap nopeer noquery

#to allow all clients to get time

restrict default nomodify

#you can also allow only one client or one subnet
#restrict 10.176.120.178 nomodify

#restrict 192.168.18.0 mask 255.255.255.0 nomodify
server 10.172.24.1 prefer #this is the upper stratum ntp server, you can also use 0.centos.pool.ntp.org/1.centos.pool.ntp.org/2.centos.pool.ntp.org

After this, start up ntpd using service ntpd start. Then wait for 5 minutes or so, and then on ntp client, do a ntpdate:

 ntpdate <ip address of >

PS:

You should do ntpdate after several minutes because you’ll get error “ntpdate[13703]: no server suitable for synchronization found” if not doing so.

Categories: Linux, Systems Tags:

linux – how to find which process is doing the most io

July 30th, 2013 No comments

find /proc/ -maxdepth 3 -type f -name io -exec egrep -H ‘read_bytes|write_bytes’ {} \;

Then you can ps auxww|grep <pid> to see what processes are doing most of the IO.

Categories: Linux Tags: ,