Archive

Posts Tagged ‘linux’

linux tips

April 10th, 2014 No comments
Linux Performance & Troubleshooting
For Linux Performance & Troubeshooting, please refer to another post - Linux tips – Performance and Troubleshooting
Linux system tips
ls -lu(access time, like cat file) -lt(modification time, like vi, ls -l defaults to use this) -lc(change time, chmod), stat ./aa.txt <UTC>
ctrl +z #bg and stopped
%1 & #bg and running
%1 #fg
pgrep -flu oracle  # processes owned by the user oracle
watch free -m #refresh every 2 seconds
pmap -x 30420 #memory mapping.
openssl s_client -connect localhost:636 -showcerts #verify ssl certificates, or 443
openssl x509 -in cacert.pem -noout -text
openssl x509 -in cacert.pem -noout -dates
openssl x509 -in cacert.pem -noout -purpose
openssl req -in robots.req.pem -text -verify -noout
blockdev –getbsz /dev/xvda1 #get blocksize of FS
dumpe2fs /dev/xvda1 |grep ‘Block size’
Strings
ovm svr ls|sort -rn -k 4 #sort by column 4
cat a1|sort|uniq -c |sort #SUS
ovm svr ls|uniq -f3 #skip the first three columns, this will list only 1 server per pool
for i in <all OVMs>;do (test.sh $i &);done #instead of using nohup &
ovm vm ls|egrep “`echo testhost{0\|,1\|,2\|,3\|,4}|tr -d ‘[:space:]‘`”
cat a|awk ‘{print $5}’|tr ‘\n’ ‘ ‘
getopt #getopts is builtin, more on http://wuliangxx.iteye.com/blog/750940
date -d ’1970-1-1 1276059000 sec utc’
date -d ’2010-09-11 23:20′ +%s
find . -name ‘*txt’|xargs tar cvvf a.tar
find . -maxdepth 1
for i in `find /usr/sbin/ -type f ! -perm -u+x`;do chmod +x $i;done #files that has no execute permisson for owner
find ./* -prune -print #-prune,do not cascade
find . -fprint file #put result to file
tar tvf a.tar  –wildcards “*ipp*” #globbing patterns
tar xvf bfiles.tar –wildcards –no-anchored ‘b*’
tar –show-defaults
tar cvf a.tar –totals *.txt #show speed
tar –append –file=collection.tar rock #add rock to collection.tar
tar –update -v -f collection.tar blues folk rock classical #only append new or updated ones, not replace
tar –delete –file=collection.tar blues #not on tapes
tar -c -f archive.tar –mode=’a+rw’
tar -C sourcedir -cf – . | tar -C targetdir -xf – #copy directories
tar -c -f jams.tar grape prune -C food cherry #-C,change dir, foot file cherry under foot directory
find . -size -400 -print > small-files
tar -c -v -z -T small-files -f little.tgz
tar -cf src.tar –exclude=’*.o’ src #multiple –exclude can be specified
expr 5 – 1
rpm2cpio ./ash-1.0.1-1.x86_64.rpm |cpio -ivd
eval $cmd
exec menu.viewcards #same to .
ls . | xargs -0 -i cp ./{} /etc #-i,use \n as separator, just like find -exec. -0 for space in filename. find -print0 use space to separate, not enter.(-i or -I {} for revoking filenames in the middle)
ls | xargs -t -i mv {} {}.old #mv source should exclude /,or unexpected errors may occur
mv –strip-trailing-slashes source destination
ls |xargs file /dev/fd/0 #replace -
ls -l -I “*out*” #not include out
find . -type d |xargs -i du -sh {} |awk ‘$1 ~ /G/’
find . -type f -name “*20120606″ -exec rm {} \; #do not need rm -rf. find . -type f -exec bash -c “ls -l ‘{}’” \;
ps -ef|grep init|sed -n ’1p’
cut -d ‘ ‘ -f1,3 /etc/mtab #first and third
seq 15 21 #print 15 to 21, or echo {15..21}
seq -s” ” 15 21 #use space as separator
Categories: Linux, tips Tags:

Linux tips – Performance and Troubleshooting

April 10th, 2014 No comments

System CPU

top
procinfo #yum install procinfo
gnome-system-monitor #can also see network flow rate
mpstat
sar

System Memory

top
free
slabtop
sar
/proc/meminfo #provides the most complete view of system memory usage
procinfo
gnome-system-monitor #can also see network flow rate

Process-specific CPU

time
strace #traces the system calls that a program makes while executing
ltrace #traces the calls(functions) that an application makes to libraries rather than to the kernel. Then use ldd to display which libraries are used, and use objdump to search each of those libraries for the given function.
ps
ld.so #ld

Process-specific Memory

ps
/proc/<pid> #you can refer to http://www.doxer.org/proc-filesystem-day-1/ for more info.

/proc/<PID>/status #provides information about the status of a given process PID
/proc/<PID>/maps #how the process’s virtual address space is used

ipcs #more info on http://www.doxer.org/resolved-semget-failed-with-status-28-failed-oracle-database-starting-up/ and http://www.doxer.org/resolvedload-manager-shared-memory-error-is-28-no-space-left-on-devicefor-apache-pmserver-etc-running-on-linux-solaris-unix/

Disk I/O

vmstat #provides totals rather than the rate of change during the sample
sar
lsof
time sh -c “dd if=/dev/zero of=System2.img bs=1M count=10240 && sync” #10G
time dd if=ddfile of=/dev/null bs=8k
dd if=/dev/zero of=vm1disk bs=1M seek=10240 count=0 #10G

Network

ethtool
ifconfig
ip
iptraf
gkrellm
netstat
gnome-system-monitor #can also see network flow rate
sar #network statistics
/etc/cron.d/sysstat #/var/log/sa/

General Ideas & options & outputs

Run Queue Statistics
In Linux, a process can be either runnable or blocked waiting for an event to complete.

A blocked process may be waiting for data from an I/O device or the results of a system call.

When these processes are runnable, but waiting to use the processor, they form a line called the run queue.
The load on a system is the total amount of running and runnable process.

Context Switches
To create the illusion that a given single processor runs multiple tasks simultaneously, the Linux kernel constantly switches between different processes.
The switch between different processes is called a context switch.
To guarantee that each process receives a fair share of processor time, the kernel periodically interrupts the running process and, if appropriate, the kernel scheduler decides to start another process rather than let the current process continue executing. It is possible that your system will context switch every time this periodic interrupt or timer occurs. (cat /proc/interrupts | grep timer, and do this again after e.g. 10s interval)

Interrupts
In addition, periodically, the processor receives an interrupt by hardware devices.
/proc/interrupts can be examined to show which interrupts are firing on which CPUs

CPU Utilization
At any given time, the CPU can be doing one of seven things:
Idle
Running user code #user time
System time #executing code in the Linux kernel on behalf of the application code
Executing user code that has been “nice”ed or set to run at a lower priority than normal processes
iowait #waiting for I/O (such as disk or network) to complete
irq #means it is in high-priority kernel code handling a hardware interrupt
softirq #executing kernel code that was also triggered by an interrupt, but it is running at a lower priority


Buffers and cache
Alternatively, if your system has much more physical memory than required by your applications, Linux will cache recently used files in physical memory so that subsequent accesses to that file do not require an access to the hard drive. This can greatly speed up applications that access the hard drive frequently, which, obviously, can prove especially useful for frequently launched applications. The first time the application is launched, it needs to be read from the disk; if the application remains in the cache, however, it needs to be read from the much quicker physical memory. This disk cache differs from the processor cache mentioned in the previous chapter. Other than oprofile, valgrind, and kcachegrind, most tools that report statistics about “cache” are actually referring to disk cache.

In addition to cache, Linux also uses extra memory as buffers. To further optimize applications, Linux sets aside memory to use for data that needs to be written to disk. These set-asides are called buffers. If an application has to write something to the disk, which would usually take a long time, Linux lets the application continue immediately but saves the file data into a memory buffer. At some point in the future, the buffer is flushed to disk, but the application can continue immediately.
Active Versus Inactive Memory
Active memory is currently being used by a process. Inactive memory is memory that is allocated but has not been used for a while. Nothing is essentially different between the two types of memory. When required, the Linux kernel takes a process’s least recently used memory pages and moves them from the active to the inactive list. When choosing which memory will be swapped to disk, the kernel chooses from the inactive memory list.
Kernel Usage of Memory (Slabs)
In addition to the memory that applications allocate, the Linux kernel consumes a certain amount for bookkeeping purposes. This bookkeeping includes, for example, keeping track of data arriving from network and disk I/O devices, as well as keeping track of which processes are running and which are sleeping. To manage this bookkeeping, the kernel has a series of caches that contains one or more slabs of memory. Each slab consists of a set of one or more objects. The amount of slab memory consumed by the kernel depends on which parts of the Linux kernel are being used, and can change as the type of load on the machine changes.

slabtop

slabtop shows in real-time how the kernel is allocating its various caches and how full they are. Internally, the kernel has a series of caches that are made up of one or more slabs. Each slab consists of a set of one or more objects. These objects can be active (or used) or inactive (unused). slabtop shows you the status of the different slabs. It shows you how full they are and how much memory they are using.


time

time measures three types of time. First, it measures the real or elapsed time, which is the amount of time between when the program started and finished execution. Next, it measures the user time, which is the amount of time that the CPU spent executing application code on behalf of the program. Finally, time measures system time, which is the amount of time the CPU spent executing system or kernel code on behalf of the application.


Disk I/O

When an application does a read or write, the Linux kernel may have a copy of the file stored into its cache or buffers and returns the requested information without ever accessing the disk. If the Linux kernel does not have a copy of the data stored in memory, however, it adds a request to the disk’s I/O queue. If the Linux kernel notices that multiple requests are asking for contiguous locations on the disk, it merges them into a single big request. This merging increases overall disk performance by eliminating the seek time for the second request. When the request has been placed in the disk queue, if the disk is not currently busy, it starts to service the I/O request. If the disk is busy, the request waits in the queue until the drive is available, and then it is serviced.

iostat

iostat provides a per-device and per-partition breakdown of how many blocks are written to and from a particular disk. (Blocks in iostat are usually sized at 512 bytes.)

lsof
lsof can prove helpful when narrowing down which applications are generating I/O


 top output

S(or STAT) – This is the current status of a process, where the process is either sleeping (S), running (R), zombied (killed but not yet dead) (Z), in an uninterruptable sleep (D), or being traced (T).

TIME – The total amount CPU time (user and system) that this process has used since it started executing.

top options

-b Run in batch mode. Typically, top shows only a single screenful of information, and processes that don’t fit on the screen never display. This option shows all the processes and can be very useful if you are saving top’s output to a file or piping the output to another command for processing.

I This toggles whether top will divide the CPU usage by the number of CPUs on the system. For example, if a process was consuming all of both CPUs on a two-CPU system, this toggles whether top displays a CPU usage of 100% or 200%.

1 (numeral 1) This toggles whether the CPU usage will be broken down to the individual usage or shown as a total.

mpstat options

-P { cpu | ALL } This option tells mpstat which CPUs to monitor. cpu is the number between 0 and the total CPUs minus 1.

The biggest benefit of mpstat is that it shows the time next to the statistics, so you can look for a correlation between CPU usage and time of day.

mpstat can be used to determine whether the CPUs are fully utilized and relatively balanced. By observing the number of interrupts each CPU is handling, it is possible to find an imbalance.

 sar options

-I {irq | SUM | ALL | XALL} This reports the rates that interrupts have been occurring in the system.
-P {cpu | ALL} This option specifies which CPU the statistics should be gathered from. If this isn’t specified, the system totals are reported.
-q This reports information about the run queues and load averages of the machine.
-u This reports information about CPU utilization of the system. (This is the default output.)
-w This reports the number of context switches that occurred in the system.
-o filename This specifies the name of the binary output file that will store the performance statistics.
-f filename This specifies the filename of the performance statistics.

-B – This reports information about the number of blocks that the kernel swapped to and from disk. In addition, for kernel versions after v2.5, it reports information about the number of page faults.
-W – This reports the number of pages of swap that are brought in and out of the system.
-r – This reports information about the memory being used in the system. It includes information about the total free memory, swap, cache, and buffers being used.
-R Report memory statistics

-d –  reports disk activities

-n DEV – Shows statistics about the number of packets and bytes sent and received by each device.
-n EDEV – Shows information about the transmit and receive errors for each device.
-n SOCK – Shows information about the total number of sockets (TCP, UDP, and RAW) in use.
-n ALL – Shows all the network statistics.

sar output

runq-sz This is the size of the run queue when the sample was taken.
plist-sz This is the number of processes present (running, sleeping, or waiting for I/O) when the sample was taken.
proc/s This is the number of new processes created per second. (This is the same as the forks statistic from vmstat.)

tps – Transfers per second. This is the number of reads and writes to the drive/partition per second.
rd_sec/s – Number of disk sectors read per second.
wr_sec/s – Number of disk sectors written per second.


vmstat options

-n print header info only once

-a This changes the default output of memory statistics to indicate the active/inactive amount of memory rather than information about buffer and cache usage.
-s (procps 3.2 or greater) This prints out the vm table. This is a grab bag of different statistics about the system since it has booted. It cannot be run in sample mode. It contains both memory and CPU statistics.

-d – This option displays individual disk statistics at a rate of one sample per interval. The statistics are the totals since system boot, rather than just those that occurred between this sample and the previous sample.
-p partition – This displays performance statistics about the given partition at a rate of one sample per interval. The statistics are the totals since system boot, rather than just those that occurred between this sample and the previous sample.

vmstat output
si – The rate of memory (in KB/s) that has been swapped in from disk during the last sample.
so – The rate of memory (in KB/s) that has been swapped out to disk during the last sample.
pages paged in – The amount of memory (in pages) read from the disk(s) into the system buffers. (On most IA32 systems, a page is 4KB.)
pages paged out – The amount of memory (in pages) written to the disk(s) from the system cache. (On most IA32 systems, a page is 4KB.)
pages swapped in – The amount of memory (in pages) read from swap into system memory.
pages swapped in/out – The amount of memory (in pages) written from system memory to the swap.

bo – This indicates the number of total blocks written to disk in the previous interval. (In vmstat, block size for a disk is typically 1,024 bytes.)
bi – This shows the number of blocks read from the disk in the previous interval. (In vmstat, block size for a disk is typically 1,024 bytes.)
wa – This indicates the amount of CPU time spent waiting for I/O to complete. The rate of disk blocks written per second.
reads: ms – The amount of time (in ms) spent reading from the disk.
writes: ms – The amount of time (in ms) spent writing to the disk.
IO: cur – The total number of I/O that are currently in progress. Note that there is a bug in recent versions of vmstat in which this is incorrectly divided by 1,000, which almost always yields a 0.
IO: s – This is the number of seconds spent waiting for I/O to complete.

iostat options
-d – This displays only information about disk I/O rather than the default display, which includes information about CPU usage as well.
-k – This shows statistics in kilobytes rather than blocks.
-x – This shows extended-performance I/O statistics.
device – If a device is specified, iostat shows only information about that device.

iostat output
tps – Transfers per second. This is the number of reads and writes to the drive/partition per second.
Blk_read/s – The rate of disk blocks read per second.
Blk_wrtn/s – The rate of disk blocks written per second.
Blk_read – The total number of blocks read during the interval.
Blk_wrtn – The total number of blocks written during the interval.
rrqm/s – The number of reads merged before they were issued to the disk.
wrqm/s – The number of writes merged before they were issued to the disk.
r/s – The number of reads issued to the disk per second.
w/s – The number of writes issued to the disk per second.
rsec/s – Disk sectors read per second.
wsec/s – Disk sectors written per second.
avgrq-sz – The average size (in sectors) of disk requests.
avgqu-sz – The average size of the disk request queue.
await – The average time (in ms) for a request to be completely serviced. This average includes the time that the request was waiting in the disk’s queue plus the amount of time it was serviced by the disk.
svctm – The average service time (in ms) for requests submitted to the disk. This indicates how long on average the disk took to complete a request. Unlike await, it does not include the amount of time spent waiting in the queue.

lsof options
+D directory – This causes lsof to recursively search all the files in the given directory and report on which processes are using them.
+d directory – This causes lsof to report on which processes are using the files in the given directory.

lsof output
FD – The file descriptor of the file, or tex for a executable, mem for a memory mapped file.
TYPE – The type of file. REG for a regular file.
DEVICE – Device number in major, minor number.
SIZE – The size of the file.
NODE – The inode of the file.


free options

-s delay – This option causes free to print out new memory statistics every delay seconds.


 strace options

strace [-p <pid>] -s 200 <program>#attach to a process. -s 200 to make the maximum string size to print (the default is 32) to 200. Note that filenames are not considered strings and are always printed in full.

-c – This causes strace to print out a summary of statistics rather than an individual list of all the system calls that are made.

ltrace options
-c – This option causes ltrace to print a summary of all the calls after the command has completed.
-S – ltrace traces system calls in addition to library calls, which is identical to the functionality strace provides.
-p pid – This traces the process with the given PID.


ps options
vsz The virtual set size is the amount of virtual memory that the application is using. Because Linux only allocated physical memory when an application tries to use it, this value may be much greater than the amount of physical memory the application is using.
rss The resident set size is the amount of physical memory the application is currently using.
pmep The percentage of the system memory that the process is consuming.
command This is the command name.

/proc/<PID>/status output
VmSize This is the process’s virtual set size, which is the amount of virtual memory that the application is using. Because Linux only allocates physical memory when an application tries to use it, this value may be much greater than the amount of physical memory the application is actually using. This is the same as the vsz parameter provided by ps.
VmLck This is the amount of memory that has been locked by this process. Locked memory cannot be swapped to disk.
VmRSS This is the resident set size or amount of physical memory the application is currently using. This is the same as the rss statistic provided by ps.

ipcs
Because shared memory is used by multiple processes, it cannot be attributed to any particular process. ipcs provides enough information about the state of the system-wide shared memory to determine which processes allocated the shared memory, which processes are using it, and how often they are using it. This information proves useful when trying to reduce shared memory usage.

ipcs options

lsof –u oracle | grep <shmid> #shmid is from output of ipcs -m. lists the processes under the oracle user attached to the shared memory segment

-t – This shows the time when the shared memory was created, when a process last attached to it, and when a process last detached from it.
-u – This provides a summary about how much shared memory is being used and whether it has been swapped or is in memory.
-l – This shows the system-wide limits for shared memory usage.
-p – This shows the PIDs of the processes that created and last used the shared memory segments.
-c – creator


ifconfig output #more on http://www.thegeekscope.com/linux-ifconfig-command-output-explained/

Errors – Frames with errors (possibly because of a bad network cable or duplex mismatch).
Dropped – Frames that were discarded (most likely because of low amounts of memory or buffers).
Overruns – Frames that may have been discarded by the network card because the kernel or network card was overwhelmed with frames. This should not normally happen.
Frame – These frames were dropped as a result of problems on the physical level. This could be the result of cyclic redundancy check (CRC) errors or other low-level problems.
Compressed – Some lower-level interfaces, such as Point-to-Point Protocol (PPP) or Serial Line Internet Protocol (SLIP) devices compress frames before they are sent over the network. This value indicates the number of these compressed frames. (Compressed packets are usually present during SLIP or PPP connections)

carrier – The number of packets discarded because of link media failure (such as a faulty cable)

ip options
-s [-s] link – If the extra -s is provided to ip, it provides a more detailed list of low-level Ethernet statistics.

iptraf options
-d interface – Detailed statistics for an interface including receive, transmit, and error rates
-s interface – Statistics about which IP ports are being used on an interface and how many bytes are flowing through them
-t <minutes> – Number of minutes that iptraf runs before exiting
-z interface – shows packet counts by size on the specified interface

netstat options
-p – Displays the PID/program name responsible for opening each of the displayed sockets
-c – Continually updates the display of information every second
–interfaces=<name> – Displays network statistics for the given interface
–statistics|-s – IP/UDP/ICMP/TCP statistics
–tcp|-t – Shows only information about TCP sockets
–udp|-u – Shows only information about UDP sockets.
–raw|-w – Shows only information about RAW sockets (IP and ICMP)
–listening|-l – Show only listening sockets. (These are omitted by default.)
–all|-a – Show both listening and non-listening (for TCP this means established connections) sockets. With the –interfaces option, show interfaces that are not marked
–numeric|-n – Show numerical addresses instead of trying to determine symbolic host, port or user names.
–extend|-e – Display additional information. Use this option twice for maximum detail.

netstat output

Active Internet connections (w/o servers)
Proto - The protocol (tcp, udp, raw) used by the socket.
Recv-Q - The count of bytes not copied by the user program connected to this socket.
Send-Q - The count of bytes not acknowledged by the remote host.
Local Address - Address and port number of the local end of the socket. Unless the --numeric (-n) option is specified, the socket address is resolved to its canonical host name (FQDN), and the port number is translated into the corresponding service name.
Foreign Address - Address and port number of the remote end of the socket. Analogous to "Local Address."
State - The state of the socket. Since there are no states in raw mode and usually no states used in UDP, this column may be left blank. Normally this can be one of several values: #more on http://www.doxer.org/tcp-flags-explanation-in-details-syn-ack-fin-rst-urg-psh-and-iptables-for-sync-flood/
    ESTABLISHED
        The socket has an established connection.
    SYN_SENT
        The socket is actively attempting to establish a connection.
    SYN_RECV
        A connection request has been received from the network.
    FIN_WAIT1
        The socket is closed, and the connection is shutting down.
    FIN_WAIT2
        Connection is closed, and the socket is waiting for a shutdown from the remote end.
    TIME_WAIT
        The socket is waiting after close to handle packets still in the network.
    CLOSED
        The socket is not being used.
    CLOSE_WAIT
        The remote end has shut down, waiting for the socket to close.
    LAST_ACK
        The remote end has shut down, and the socket is closed. Waiting for acknowledgement.
    LISTEN
        The socket is listening for incoming connections. Such sockets are not included in the output unless you specify the --listening (-l) or --all (-a) option.
    CLOSING
        Both sockets are shut down but we still don't have all our data sent.
    UNKNOWN
        The state of the socket is unknown.
User - The username or the user id (UID) of the owner of the socket.
PID/Program name - Slash-separated pair of the process id (PID) and process name of the process that owns the socket. --program causes this column to be included. You will also need superuser privileges to see this information on sockets you don't own. This identification information is not yet available for IPX sockets.

Example

[ezolt@scrffy ~/edid]$ vmstat 1 | tee /tmp/output
procs -----------memory---------- ---swap-- -----io----  --system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo    in    cs  us sy id wa
0  1 201060  35832  26532 324112    0    0     3     2     6     2  5  1  94  0
0  0 201060  35888  26532 324112    0    0    16     0  1138   358  0  0  99  0
0  0 201060  35888  26540 324104    0    0     0    88  1163   371  0  0 100  0

The number of context switches looks good compared to the number of interrupts. The scheduler is switching processes less than the number of timer interrupts that are firing. This is most likely because the system is nearly idle, and most of the time when the timer interrupt fires, the scheduler does not have any work to do, so it does not switch from the idle process.

[ezolt@scrffy manuscript]$ sar -w -c -q 1 2
Linux 2.6.8-1.521smp (scrffy)   10/20/2004

08:23:29 PM    proc/s
08:23:30 PM      0.00

08:23:29 PM   cswch/s
08:23:30 PM    594.00

08:23:29 PM   runq-sz  plist-sz   ldavg-1    ldavg-5  ldavg-15
08:23:30 PM         0       163      1.12       1.17      1.17

08:23:30 PM    proc/s
08:23:31 PM      0.00

08:23:30 PM   cswch/s
08:23:31 PM    812.87

08:23:30 PM   runq-sz  plist-sz   ldavg-1    ldavg-5  ldavg-15
08:23:31 PM         0       163      1.12       1.17      1.17

Average:       proc/s
Average:         0.00

Average:      cswch/s
Average:       703.98

Average:      runq-sz  plist-sz   ldavg-1    ldavg-5  ldavg-15
Average:            0       163      1.12       1.17      1.17

In this case, we ask sar to show us the total number of context switches and process creations that occur every second. We also ask sar for information about the load average. We can see in this example that this machine has 163 process that are in memory but not running. For the past minute, on average 1.12 processes have been ready to run.

bash-2.05b$ vmstat -a
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free  inact active   si   so    bi    bo   in    cs us sy id wa
 2  1 514004   5640 79816 1341208   33   31   204   247 1111  1548  8  5 73 14

The amount of inactive pages indicates how much of the memory could be swapped to disk and how much is currently being used. In this case, we can see that 1310MB of memory is active, and only 78MB is considered inactive. This machine has a large amount of memory, and much of it is being actively used.


bash-2.05b$ vmstat -s

      1552528  total memory
      1546692  used memory
      1410448  active memory
        11100  inactive memory
         5836  free memory
         2676  buffer memory
       645864  swap cache
      2097096  total swap
       526280  used swap
      1570816  free swap
     20293225 non-nice user cpu ticks
     18284715 nice user cpu ticks
     17687435 system cpu ticks
    357314699 idle cpu ticks
     67673539 IO-wait cpu ticks
       352225 IRQ cpu ticks
      4872449 softirq cpu ticks
    495248623 pages paged in
    600129070 pages paged out
     19877382 pages swapped in
     18874460 pages swapped out
   2702803833 interrupts
   3763550322 CPU context switches
   1094067854 boot time
     20158151 forks

It can be helpful to know the system totals when trying to figure out what percentage of the swap and memory is currently being used. Another interesting statistic is the pages paged in, which indicates the total number of pages that were read from the disk. This statistic includes the pages that are read starting an application and those that the application itself may be using.


[ezolt@wintermute tmp]$ ps -o etime,time,pcpu,cmd 10882
    ELAPSED     TIME %CPU CMD
      00:06 00:00:05 88.0 ./burn

This example shows a test application that is consuming 88 percent of the CPU and has been running for 6 seconds, but has only consumed 5 seconds of CPU time.


[ezolt@wintermute tmp]$ ps –o vsz,rss,tsiz,dsiz,majflt,minflt,cmd 10882
VSZ RSS TSIZ DSIZ MAJFLT MINFLT CMD
11124 10004 1 11122 66 2465 ./burn

The burn application has a very small text size (1KB), but a very large data size (11,122KB). Of the total virtual size (11,124KB), the process has a slightly smaller resident set size (10,004KB), which represents the total amount of physical memory that the process is actually using. In addition, most of the faults generated by burn were minor faults, so most of the memory faults were due to memory allocation rather than loading in a large amount of text or data from the program image on the disk.


[ezolt@wintermute tmp]$ cat /proc/4540/status
Name: burn
State: T (stopped)
Tgid: 4540
Pid: 4540
PPid: 1514
TracerPid: 0
Uid: 501 501 501 501
Gid: 501 501 501 501
FDSize: 256
Groups: 501 9 502
VmSize: 11124 kB
VmLck: 0 kB
VmRSS: 10004 kB
VmData: 9776 kB
VmStk: 8 kB
VmExe: 4 kB
VmLib: 1312 kB
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000

The VmLck size of 0KB means that the process has not locked any pages into memory, making them unswappable. The VmRSS size of 10,004KB means that the application is currently using 10,004KB of physical memory, although it has either allocated or mapped the VmSize or 11,124KB. If the application begins to use the memory that it has allocated but is not currently using, the VmRSS size increases but leaves the VmSize unchanged.

[ezolt@wintermute test_app]$ cat /proc/4540/maps
08048000-08049000 r-xp 00000000 21:03 393730 /tmp/burn
08049000-0804a000 rw-p 00000000 21:03 393730 /tmp/burn
0804a000-089d3000 rwxp 00000000 00:00 0
40000000-40015000 r-xp 00000000 21:03 1147263 /lib/ld-2.3.2.so
40015000-40016000 rw-p 00015000 21:03 1147263 /lib/ld-2.3.2.so
4002e000-4002f000 rw-p 00000000 00:00 0
4002f000-40162000 r-xp 00000000 21:03 2031811 /lib/tls/libc-2.3.2.so
40162000-40166000 rw-p 00132000 21:03 2031811 /lib/tls/libc-2.3.2.so
40166000-40168000 rw-p 00000000 00:00 0
bfffe000-c0000000 rwxp fffff000 00:00 0

The burn application is using two libraries: ld and libc. The text section (denoted by the permission r-xp) of libc has a range of 0x4002f000 through 0×40162000 or a size of 0×133000 or 1,257,472 bytes.
The data section (denoted by permission rw-p) of libc has a range of 40162000 through 40166000 or a size of 0×4000 or 16,384 bytes. The text size of libc is bigger than ld’s text size of 0×15000 or 86,016 bytes. The data size of libc is also bigger than ld’s text size of 0×1000 or 4,096 bytes. libc is the big library that burn is linking in.


[ezolt@wintermute tmp]$ ipcs -u

------ Shared Memory Status --------
segments allocated 21
pages allocated 1585
pages resident 720
pages swapped 412
Swap performance: 0 attempts 0 successes

------ Semaphore Status --------
used arrays = 0
allocated semaphores = 0

------ Messages: Status --------
allocated queues = 0
used headers = 0
used space = 0 bytes

In this case, we can see that 21 different segments or pieces of shared memory have been allocated. All these segments consume a total of 1,585 pages of memory; 720 of these exist in physical memory and 412 have been swapped to disk.

[ezolt@wintermute tmp]$ ipcs

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 0 root 777 49152 1
0x00000000 32769 root 777 16384 1
0x00000000 65538 ezolt 600 393216 2 dest

we ask ipcs for a general overview of all the shared memory segments in the system. This indicates who is using each memory segment. In this case, we see a list of all the shared segments. For one in particular, the one with a share memory ID of 65538, the user (ezolt) is the owner. It has a permission of 600 (a typical UNIX permission), which in this case, means that only ezolt can read and write to it. It has 393,216 bytes, and 2 processes are attached to it.

[ezolt@wintermute tmp]$ ipcs -p

------ Shared Memory Creator/Last-op --------
shmid owner cpid lpid
0 root 1224 11954
32769 root 1224 11954
65538 ezolt 1229 11954

Finally, we can figure out exactly which processes created the shared memory segments and which other processes are using them. For the segment with shmid 32769, we can see that the PID 1229 created it and 11954 was the last to use it.


[ezolt@wintermute procps-3.2.0]$ ./vmstat 1 3

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 0 197020 81804 29920 0 0 236 25 1017 67 1 1 93 4
1 1 0 172252 106252 29952 0 0 24448 0 1200 395 1 36 0 63
0 0 0 231068 50004 27924 0 0 19712 80 1179 345 1 34 15 49

During one of the samples, the system read 24,448 disk blocks. As mentioned previously, the block size for a disk is 1,024 bytes(or 4,096 bytes), so this means that the system is reading in data at about 23MB per second. We can also see that during this sample, the CPU was spending a significant portion of time waiting for I/O to complete. The CPU waits on I/O 63 percent of the time during the sample in which the disk was reading at ~23MB per second, and it waits on I/O 49 percent for the next sample, in which the disk was reading at ~19MB per second.

[ezolt@wintermute procps-3.2.0]$ ./vmstat -D
3 disks
5 partitions
53256 total reads
641233 merged reads
4787741 read sectors
343552 milli reading
14479 writes
17556 merged writes
257208 written sectors
7237771 milli writing
0 inprogress IO
342 milli spent IO

In this example, a large number of the reads issued to the system were merged before they were issued to the device. Although there were ~640,000 merged reads, only ~53,000 read commands were actually issued to the drives. The output also tells us that a total of 4,787,741 sectors have been read from the disk, and that since system boot, 343,552ms (or 344 seconds) were spent reading from the disk. The same statistics are available for write performance.

[ezolt@wintermute procps-3.2.0]$ ./vmstat -p hde3 1 3
hde3 reads read sectors writes requested writes
18999 191986 24701 197608
19059 192466 24795 198360
- 19161 193282 24795 198360

Shows that 60 (19,059 – 18,999) reads and 94 writes (24,795 – 24,795) have been issued to partition hde3. This view can prove particularly useful if you are trying to determine which partition of a disk is seeing the most usage.


 

[ezolt@localhost sysstat-5.0.2]$ ./iostat -x -dk 1 5 /dev/hda2
Linux 2.4.22-1.2188.nptl (localhost.localdomain) 05/01/2004
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 11.22 44.40 3.15 4.20 115.00 388.97 57.50 194.49
68.52 1.75 237.17 11.47 8.43

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 1548.00 0.00 100.00 0.00 13240.00 0.00 6620.00
132.40 55.13 538.60 10.00 100.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 1365.00 0.00 131.00 0.00 11672.00 0.00 5836.00
89.10 53.86 422.44 7.63 100.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 1483.00 0.00 84.00 0.00 12688.00 0.00 6344.00
151.0 39.69 399.52 11.90 100.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 2067.00 0.00 123.00 0.00 17664.00 0.00 8832.00
143.61 58.59 508.54 8.13 100.00

you can see that the average queue size is pretty high (~237 to 538) and, as a result, the amount of time that a request must wait (~422.44ms to 538.60ms) is much greater than the amount of time it takes to service the request (7.63ms to 11.90ms). These high average service times, along with the fact that the utilization is 100 percent, show that the disk is completely saturated.


[ezolt@wintermute sysstat-5.0.2]$ sar -n SOCK 1 2

Linux 2.4.22-1.2174.nptlsmp (wintermute.phil.org) 06/07/04
21:32:26 totsck tcpsck udpsck rawsck ip-frag
21:32:27 373 118 8 0 0
21:32:28 373 118 8 0 0
Average: 373 118 8 0 0

We can see the total number of open sockets and the TCP, RAW, and UDP sockets. sar also displays the number of fragmented IP packets.

PS:

resolved – /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory

April 1st, 2014 No comments

When I ran perl command today, I met problem below:

[root@test01 bin]# /usr/local/bin/perl5.8
-bash: /usr/local/bin/perl5.8: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory

Now let’s check which package /lib/ld-linux.so.2 belongs to on a good linux box:

[root@test02 ~]# rpm -qf /lib/ld-linux.so.2
glibc-2.5-118.el5_10.2

So here’s the resolution to the issue:

[root@test01 bin]# yum install -y glibc.x86_64 glibc.i686 glibc-devel.i686 glibc-devel.x86_64 glibc-headers.x86_64

Categories: Kernel, Linux, Systems Tags:

resolved – sudo: sorry, you must have a tty to run sudo

April 1st, 2014 2 comments

The error message below sometimes will occur when you run a sudo <command>:

sudo: sorry, you must have a tty to run sudo

To resolve this, you may comment out “Defaults requiretty” in /etc/sudoers(revoked by running visudo). Here is more info about this method: http://www.cyberciti.biz/faq/linux-unix-bsd-sudo-sorry-you-must-haveattytorun/

However, sometimes it’s not convenient or even not possible to modify /etc/sudoers, then you can consider the following:

echo -e “<password>\n”|sudo -S <sudo command>

For -S parameter of sudo, you may refer to sudo man page:

-S‘ The -S (stdin) option causes sudo to read the password from the standard input instead of the terminal device. The password must be followed by a newline character.

So here -S bypass tty(terminal device) to read the password from the standard input. And by this, we can now pipe password to sudo.

Categories: Linux, Programming, SHELL, Systems Tags: ,

set vnc not asking for OS account password

March 18th, 2014 No comments

As you may know, vncpasswd(belongs to package vnc-server) is used to set password for users when connecting to vnc using a vnc client(such as tightvnc). When you connect to vnc-server, it’ll ask for the password:

vnc-0After you connect to the host using VNC, you may also find that the remote server will ask again for OS password(this is set by passwd):

vnc-01For some cases, you may not want the second one. So here’s the way to cancel this behavior:

vnc-1vnc-2

 

 

Categories: Linux, Systems Tags: ,

resolved – ssh Read from socket failed: Connection reset by peer and Write failed: Broken pipe

March 13th, 2014 No comments

If you met following errors when ssh to linux box:

Read from socket failed: Connection reset by peer

Write failed: Broken pipe

Then there’s one possibility that the linux box’s filesystem was corrupted. As in my case there’s output to stdout:

EXT3-fs error ext3_lookup: deleted inode referenced

To resolve this, you need make linux go to single user mode and fsck -y <filesystem>. You can get corrupted filesystem names when booting:

[/sbin/fsck.ext3 (1) -- /usr] fsck.ext3 -a /dev/xvda2
/usr contains a file system with errors, check forced.
/usr: Directory inode 378101, block 0, offset 0: directory corrupted

/usr: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)

[/sbin/fsck.ext3 (1) -- /oem] fsck.ext3 -a /dev/xvda5
/oem: recovering journal
/oem: clean, 8253/1048576 files, 202701/1048233 blocks
[/sbin/fsck.ext3 (1) -- /u01] fsck.ext3 -a /dev/xvdb
u01: clean, 36575/14548992 files, 2122736/29081600 blocks
[FAILED]

So in this case, I did fsck -y /dev/xvda2 && fsck -y /dev/xvda5. Later reboot host, and then everything went well.

PS:

If two VMs are booted up in two hypervisors and these VMs shared the same filesystem(like NFS), then after fsck -y one FS and booted up the VM, the FS will corrupt soon as there’re other copies of itself is using that FS. So you need first make sure that only one copy of VM is running on hypervisors of the same server pool.

Categories: Kernel, Linux Tags:

psftp through a proxy

March 5th, 2014 No comments

You may know that, we can set proxy in putty for ssh to remote host, as shown below:

putty_proxyAnd if you want to scp files from remote site to your local box, you can use putty’s psftp.exe. There’re many options for psftp.exe:

C:\Users\test>d:\PuTTY\psftp.exe -h
PuTTY Secure File Transfer (SFTP) client
Release 0.62
Usage: psftp [options] [user@]host
Options:
-V print version information and exit
-pgpfp print PGP key fingerprints and exit
-b file use specified batchfile
-bc output batchfile commands
-be don’t stop batchfile processing if errors
-v show verbose messages
-load sessname Load settings from saved session
-l user connect with specified username
-P port connect to specified port
-pw passw login with specified password
-1 -2 force use of particular SSH protocol version
-4 -6 force use of IPv4 or IPv6
-C enable compression
-i key private key file for authentication
-noagent disable use of Pageant
-agent enable use of Pageant
-batch disable all interactive prompts

Although there’s proxy setting option for putty.exe, there’s no proxy setting for psftp.exe! So what should you do if you want to copy files back to local box, and there’s firewall blocking you from doing this directly, and you must use a proxy?

As you may notice, there’s “-load sessname” option in psftp.exe:

-load sessname Load settings from saved session

This option means that, if you have session opened by putty.exe, then you can use psftp.exe -load <session name> to copy files from remote site. For example, suppose you opened one session named mysession in putty.exe in which you set proxy there, then you can use “psftp.exe -load mysession” to copy files from remote site(no need for username/password, as you must have entered that in putty.exe session):

C:\Users\test>d:\PuTTY\psftp.exe -load mysession
Using username “root”.
Remote working directory is /root
psftp> ls
Listing directory /root
drwx—— 3 ec2-user ec2-user 4096 Mar 4 09:27 .
drwxr-xr-x 3 root root 4096 Dec 10 23:47 ..
-rw——- 1 ec2-user ec2-user 388 Mar 5 05:07 .bash_history
-rw-r–r– 1 ec2-user ec2-user 18 Sep 4 18:23 .bash_logout
-rw-r–r– 1 ec2-user ec2-user 176 Sep 4 18:23 .bash_profile
-rw-r–r– 1 ec2-user ec2-user 124 Sep 4 18:23 .bashrc
drwx—— 2 ec2-user ec2-user 4096 Mar 4 09:21 .ssh
psftp> help
! run a local command
bye finish your SFTP session
cd change your remote working directory
chmod change file permissions and modes
close finish your SFTP session but do not quit PSFTP
del delete files on the remote server
dir list remote files
exit finish your SFTP session
get download a file from the server to your local machine
help give help
lcd change local working directory
lpwd print local working directory
ls list remote files
mget download multiple files at once
mkdir create directories on the remote server
mput upload multiple files at once
mv move or rename file(s) on the remote server
open connect to a host
put upload a file from your local machine to the server
pwd print your remote working directory
quit finish your SFTP session
reget continue downloading files
ren move or rename file(s) on the remote server
reput continue uploading files
rm delete files on the remote server
rmdir remove directories on the remote server
psftp>

Now you can get/put files as we used to now.

PS:

If you do not need proxy connecting to remote site, then you can use psftp.exe CLI to get remote files directly. For example:

d:\PuTTY\psftp.exe [email protected] -i d:\PuTTY\aws.ppk -b d:\PuTTY\script.scr -bc -be -v

And in d:\PuTTY\script.scr is script for put/get files:

cd /backup
lcd c:\
mget *.tar.gz
close

Categories: Linux, Systems Tags: ,

avoid putty ssh connection sever or disconnect

January 17th, 2014 2 comments

After sometime, ssh will disconnect itself. If you want to avoid this, you can try run the following command:

while [ 1 ];do echo hi;sleep 60;done &

This will print message “hi” every 60 seconds on the standard output.

PS:

You can also set some parameters in /etc/ssh/sshd_config, you can refer to http://www.doxer.org/learn-linux/make-ssh-on-linux-not-to-disconnect-after-some-certain-time/

Categories: Linux, SHELL, Unix Tags:

make sudo asking for no password on linux

November 1st, 2013 No comments

Assuming that you have a user named ‘test’, and he belongs to ‘admin’ group. So you want user test can sudo to root, and don’t want linux prompting for password. Here’s the way you can do it:

cp /etc/sudoers{,.bak}
sed -i ‘/%admin/ s/^/# /’ /etc/sudoers
echo ‘%admin ALL=(ALL) NOPASSWD: ALL’ >> /etc/sudoers

Enjoy!

Categories: Linux, Security Tags:

make tee to copy stdin as well as stderr & prevent ESC output of script

October 30th, 2013 No comments
  • Make tee to copy stdin as well as stderr

As said by manpage of tee:

read from standard input and write to standard output and files

So if you have error messages in your script, then the error messages will not copied and write to file.

Here’s one workaround for this:

./aaa.sh 2>&1 | tee -a log

Or you can use the more complicated one:

command > >(tee stdout.log) 2> >(tee stderr.log >&2)

  • Prevent ESC output of script

script literally captures every type of output that was sent to the screen. If you have colored or bold output, this shows up as esc characters within the output file. These characters can significantly clutter the output and are not usually useful. If you set the TERM environmental variable to dumb (using setenv TERM dumb for csh-based shells and export TERM=dumb for sh-based shells), applications will not output the escape characters. This provides a more readable output.

In addition, the timing information provided by script clutters the output. Although it can be useful to have automatically generated timing information, it may be easier to not use script’s timing, and instead just time the important commands with the time command mentioned in the previous chapter.

PS:

  1. Here’s the full version http://stackoverflow.com/questions/692000/how-do-i-write-stderr-to-a-file-while-using-tee-with-a-pipe
  2. Some contents of this article is excerpted from <Optimizing Linux® Performance: A Hands-On Guide to Linux® Performance Tools>.
Categories: Linux, SHELL Tags:

make label for swap device using mkswap and blkid

August 6th, 2013 No comments

If you want to label one swap partition in linux, you should not use e2label for this purpose. As e2label is for changing the label on an ext2/ext3/ext4 filesystem, which do not include swap filesystem.

If you use e2label for this, you will get the following error messages:

[root@node2 ~]# e2label /dev/xvda3 SWAP-VM
e2label: Bad magic number in super-block while trying to open /dev/xvda3
Couldn’t find valid filesystem superblock.

We should use mkswap for it. As mkswap has one option -L:

-L labelSpecify a label, to allow swapon by label. (Only for new style swap areas.)

So let’s see example below:

[root@node2 ~]# mkswap -L SWAP-VM /dev/xvda3
Setting up swapspace version 1, size = 2335973 kB
LABEL=SWAP-VM, no uuid

[root@node2 ~]# blkid
/dev/xvda1: LABEL=”/boot” UUID=”6c5ad2ad-bdf5-4349-96a4-efc9c3a1213a” TYPE=”ext3″
/dev/xvda2: LABEL=”/” UUID=”76bf0aaa-a58e-44cb-92d5-098357c9c397″ TYPE=”ext3″
/dev/xvdb1: LABEL=”VOL1″ TYPE=”oracleasm”
/dev/xvdc1: LABEL=”VOL2″ TYPE=”oracleasm”
/dev/xvdd1: LABEL=”VOL3″ TYPE=”oracleasm”
/dev/xvde1: LABEL=”VOL4″ TYPE=”oracleasm”
/dev/xvda3: LABEL=”SWAP-VM” TYPE=”swap”

[root@node2 ~]# swapon /dev/xvda3

[root@node2 ~]# swapon -s
Filename Type Size Used Priority
/dev/xvda3 partition 2281220 0 -1

So now we can add swap to /etc/fstab using LABEL=SWAP-VM:

LABEL=SWAP-VM           swap                    swap    defaults        0 0

Categories: Linux, Storage Tags: ,

linux – how to find which process is doing the most io

July 30th, 2013 No comments

find /proc/ -maxdepth 3 -type f -name io -exec egrep -H ‘read_bytes|write_bytes’ {} \;

Then you can ps auxww|grep <pid> to see what processes are doing most of the IO.

Categories: Linux Tags: ,

dd seek and sparse file

July 17th, 2013 No comments

This is from http://unixadministrator.blogspot.hu/2010/02/how-to-detect-sparse-files.html

How to detect sparse files?

Sparse files are the files with lots of holes :) and can become painful while copying those to other location/filesystem/tape for that matter. In this example, I tried to create a sparse file. The question is how to detect a sparse file. Well, the sparse files, in most of the cases, shows different size in different commands output. Here is shown how:

$ dd if=/dev/zero of=FILE1 bs=8k count=1 seek=1000
1+0 records in
1+0 records out

$ ls -lh
-rw-r--r--  1 vikass htt 7.9M Feb 23 09:59 FILE1

$ du -sh FILE1
12K     FILE1

$ stat FILE1
  File: `FILE1'
  Size: 8200192         Blocks: 24         IO Block: 32768  regular file
Device: 16h/22d Inode: 316723      Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1020/  vikass)   Gid: (  103/     htt)
Access: 2010-02-23 09:59:53.109548000 +0800
Modify: 2010-02-23 09:59:53.111551000 +0800
Change: 2010-02-23 09:59:53.111551000 +0800

Now, check the size reported in “ls -lh” command and “du -h” command on the same file. If it reports different, then it is nothing but a sparse file.

This example was from a Linux system (RHEL 4U7) but will work for Solaris also, apart from stat command. Enjoy! :)

Categories: Linux Tags: ,

resolved – yum returned Segmentation fault error on centos

January 6th, 2013 No comments

The following error messages occurred while running yum list or yum update on a centos/rhel host:

[root@test-centos ~]# yum list
Loaded plugins: rhnplugin, security
This system is not registered with ULN.
ULN support will be disabled.
Segmentation fault

As always, I did a strace on this:

[root@test-centos ~]# strace yum list
open(“/var/cache/yum/el5_ga_base/primary.xml.gz”, O_RDONLY) = 6
lseek(6, 0, SEEK_CUR) = 0
read(6, “\37\213\10\10\0\0\0\0\2\377/u01/basecamp/www/el5_”…, 8192) = 8192
— SIGSEGV (Segmentation fault) @ 0 (0) —
+++ killed by SIGSEGV +++

And here’s the error messages from /var/log/messages:

[root@test-centos ~]# tail /var/log/messages
Jan 6 07:07:44 test-centos kernel: yum[5951]: segfault at 3500000000 ip 000000350cc79e0a sp 00007fff05633b78 error 4 in libc-2.5.so[350cc00000+14e000]

After some googling, I found the yum “Segmentation fault” was caused by the conflict between zlib and yum. To resolve this problem, we need use the older version of zlib. Here’s the detailed steps:

[root@test-centos ~]# cd /usr/lib
[root@test-centos lib]# ls -l libz*
-rw-r–r– 1 root root 125206 Jul 9 07:40 libz.a
lrwxrwxrwx 1 root root 22 Aug 2 07:10 libz.so -> /usr/lib/libz.so.1.2.7
lrwxrwxrwx 1 root root 22 Aug 2 07:10 libz.so.1 -> /usr/lib/libz.so.1.2.7
-rwxr-xr-x 1 root root 75028 Jun 7 2007 libz.so.1.2.3
-rwxr-xr-x 1 root root 99161 Jul 9 07:40 libz.so.1.2.7

[root@test-centos lib]# rm libz.so libz.so.1
rm: remove symbolic link `libz.so’? y
rm: remove symbolic link `libz.so.1′? y
[root@test-centos lib]# ln -s libz.so.1.2.3 libz.so
[root@test-centos lib]# ln -s libz.so.1.2.3 libz.so.1

After these steps, you should now able to run yum commands without any issue.

Also, after using yum, you should change back zlib to the newer version, and here’s the steps:

[root@test-centos ~]# cd /usr/lib
[root@test-centos lib]# rm libz.so libz.so.1
rm: remove symbolic link `libz.so’? y
rm: remove symbolic link `libz.so.1′? y
[root@test-centos lib]# ln -s libz.so.1.2.7 libz.so
[root@test-centos lib]# ln -s libz.so.1.2.7 libz.so.1

Categories: Linux, Systems Tags:

ldap auto_home error – Could not chdir to home directory /home/xxx: No such file or directory

October 10th, 2012 No comments

If you can log on the host but the home directory failed mouting with the following error message:

Could not chdir to home directory /home/xxx: No such file or directory

Then one method you can try is that:

  1. Ensure the home directory for your username exists on the exported NFS server
  2. Append /etc/auto_home on the host with text like the following:<username> <NFS server>:/export/home/&  #this assume the exported home directory is on /export/home, your environment may varies
  3. At last, ensure automount is running on the host and then try log on again. You should now able to mount your home directory.
Categories: Linux Tags:

resolved – bnx2i dev eth0 does not support iscsi

September 19th, 2012 No comments

There’s a weird incident occurred on a linux box. The linux box turned not responsible to ping or ssh, although from ifconfig and /proc/net/bonding/bond0 file, the system said it’s running ok. After some google work, I found that the issue may related to the NIC driver. I tried bring down/bring up NICs one by one, but got error:

Bringing up loopback interface bond0: bnx2i: dev eth0 does not support iscsi

bnx2i: iSCSI not supported, dev=eth0

bonding: no command found in slaves file for bond bond0. Use +ifname or -ifname

At last, I tried restart the whole network i.e. /etc/init.d/network restart. And that did the trick, the networking was then running ok and can ping/ssh to it without problem.

resolved – passwd permission denied even for root on solaris

July 14th, 2012 No comments

When I tried resetting a local user’s password on a solaris host, I met the following error message:

root@doxer # passwd <username>
New Password:
Re-enter new Password:
Permission denied

This was very weird as I was logged on as root when doing this operation:

root@doxer # id
uid=0(root) gid=1(other)

After some searching I found that this was caused by passwd by default will try to reset LDAP password if the host is using ldap for authentication. Here’s excerpt from /etc/nsswitch.conf:

passwd: compat
passwd_compat: ldap

To resolve this, you need designate which authentication mechanism you want to use for resetting a password(here we should use files as this user was local one):

passwd -r files <username>

PS:

Here’s more about NIS passwd map:<from book Managing NFS and NIS, Second Edition>

Earlier, we introduced the concept of replaced files and appended files. Now, we’ll discuss how to work with these files. First, let’s review: these are important concepts, so repetition is helpful. If a map replaces the local file, the file is ignored once NIS is running. Aside from making sure that misplaced optimism doesn’t lead you to delete the files that were distributed with your system, there’s nothing interesting that you can do with these replaced files. We won’t have anything further to say about them.

Conversely, local files that are appended to by NIS maps are always consulted first, even if NIS is running. The password file is a good example of a file augmented by NIS. You may want to give some users access to one or two machines, and not include them in the NIS password map. The solution to this problem is to put these users into the local passwd file, but not into the master passwd file on the master server. The local password file is always read before getpwuid( ) goes to an NIS server. Password-file reading routines find locally defined users as well as those in the NIS map, and the search order of “local, then NIS” allows local password file entries to override values in the NIS map. Similarly, the local aliases file can be used to override entries in the NIS mail aliases map, setting up machine-specific expansion of one or more aliases.

Categories: Linux Tags:

Resolved – bash /usr/bin/find Arg list too long

July 3rd, 2012 No comments

Have you ever met error like the following?

root@doxer# find /PRD/*/connectors/A01/QP*/*/logFiles/* -prune -name “*.log” -mtime +7 -type f |wc -l

bash: /usr/bin/find: Arg list too long

0

The cause of issue is kernel limitation for argument count which can be passed to find (as well as ls, and other utils). ARG_MAX defines

the maximum length of arguments for a new process. You can get the number of it using command:

root@doxer# getconf ARG_MAX
1048320

To quickly fix this, you can move your actions into the directory(replace * with subdir_NAME):

cd /PRD/subdir_NAME/connectors/A01/QP*/*/logFiles/;find . -prune -name “*.log” -mtime +7 -type f |wc -l

11382

PS:

  1. you can get all configuration values with getconf -a.
  2. For more solutions about the error “bash: /usr/bin/find: Arg list too long”, you can refer to http://www.in-ulm.de/~mascheck/various/argmax/
Categories: Kernel, Linux Tags:

trap bash shell script explanation and example

July 2nd, 2012 No comments

If you want to give some information on standard output when the user press ctrl+c on the bash script, or you want to print something when the script completes, then you should consider using trap to implement this.

Here’s an example which will print something to end user when the user print ctrl+c(SIGINT is equal to number 2):

#!/bin/bash
trap “echo ‘you typed ctrl+c’” 2
sleep 5

And if you want print something when the script ends, you can use the following as an example:

#!/bin/bash
trap “echo ‘you typed ctrl+c’” 0
sleep 5

useful sed single line examples when clearing embedded trojans or embedded links

June 7th, 2012 No comments

When your site is embedded with some links/trojans by somebody maliciously, the first thing you could think of would mostly like to clear these malicious links/trojans. sed is a useful stream editor based on line, and you would of course think of using sed to do the cleaning job.

Usually, the embedded codes would be several lines of html codes like the following:

<div class=”trojans”>
<a href=”http://www.malicous-site-url.com”>malicous site’s name</a>
blablabla…
</div>

To clear these html codes, you can use the following sed line:
sed  ‘/<div class=\”trojans\”>/,/<\/div>/d’ injected.htm

But usually the injected files are spread across several directories or even your whole website’s directory. You can combine using find and sed together to clean these annoying trojans:

find /var/www/html/yoursite.com/ -type f \( -name *.htm -o -name *.html -o -name *.php \) -exec sed  -i.bak’ /<div class=\”trojans\”>/,/<\/div>/d’ {} \;

Please note I use -i.bak to backup file before doing the replacement.(you should also backup your data before cleaning trojans!)

PS:

For more info about sed examples/tutorials, you may refer to the following two resources:

1.http://sed.sourceforge.net/sed1line.txt

2.http://www.grymoire.com/Unix/Sed.html

requiretty in sudoers file will break functioning of accounts without tty

May 2nd, 2012 No comments

Intercepted from /etc/sudoers:

Defaults requiretty

#
# Refuse to run if unable to disable echo on the tty. This setting should also be
# changed in order to be able to use sudo without a tty. See requiretty above.
#

This means that if you have created an account without a tty for it, and you want that user have the privileges to some sudo commands, this setting(Defaults requiretty) will not make the account not able to execute these wanted sudo commands.

To fix this, you can do the following:

  1. disable “Defaults requiretty” in /etc/sudoers file
  2. Change nsswitch.conf to be ldap files rather than files ldap
  3. Better yet don’t enable local sudoers