general networking tips

April 18th, 2014

How Autonegotiation Works

First, let’s cover what autonegotiation does not do: when autonegotiation is enabled on a port, it does not automatically determine the configuration of the port on the other side of the Ethernet cable and then match it. This is a common misconception that often leads to problems.

Autonegotiation is a protocol and, as with any protocol, it only works if it’s running on both sides of the link. In other words, if one side of a link is running autonegotiation and the other side of the link is not, autonegotiation cannot determine the speed and duplex configuration of the other side. If autonegotiation is running on the other side of the link, the two devices decide together on the best speed and duplex mode. Each interface advertises the speeds and duplex modes at which it can operate, and the best match is selected (higher speeds and full duplex are preferred).

The confusion exists primarily because autonegotiation always seems to work. This is because of a feature called parallel detection, which kicks in when the autonegotiation process fails to find autonegotiation running on the other end of the link. Parallel detection works by sending the signal being received to the local 10Base-T, 100Base-TX, and 100Base-T4 drivers. If any one of these drivers detects the signal, the interface is set to that speed.

Parallel detection determines only the link speed, not the supported duplex modes. This is an important consideration because the common modes of Ethernet have differing levels of duplex support:

10Base-T
10Base-T was originally designed without full-duplex support. Some implementations of 10Base-T support full duplex, but many do not.

100Base-T
100Base-T has long supported full duplex, which has been the preferred method for connecting 100 Mbps links for as long as the technology has existed. However, the default behavior of 100Base-T is usually half duplex, and full-duplex support must be set manually.

1000Base-T
Gigabit Ethernet has a much more robust autonegotiation protocol than 10M or 100M Ethernet. Gigabit interfaces should be left to autonegotiate in most situations.

10 Gigabit
10 Gigabit (10G) connections are generally dependent on fiber transceivers or special copper connections that differ from the RJ-45 connections seen on other Ethernet types. The hardware usually dictates how 10G connects. On a 6500, 10G interfaces usually require XENPAKs, which only run at 10G. On a Nexus 5000 switch, some of the ports are 1G/10G and can be changed with the speed command.

Because of the lack of widespread full-duplex support on 10Base-T and the typical default behavior of 100Base-T, when autonegotiation falls through to the parallel detection phase (which only detects speed), the safest thing for the driver to do is to choose half-duplex mode for the link.

As networks and networking hardware evolve, higher-speed links with more robust negotiation protocols will likely make negotiation problems a thing of the past. That being said, I still see 20-year-old routers in service, so knowing how autonegotiation works will be a valuable skill for years to come.

 When Autonegotiation Fails

In a half-duplex environment, the receiving (RX) line is monitored. If a frame is present on the RX link, no frames are sent until the RX line is clear. If a frame is received on the RX line while a frame is being sent on the transmitting (TX) line, a collision occurs. Collisions cause the collision error counter to be incremented—and the sending frame to be retransmitted—after a random back-off delay. This may seem counterintuitive in a modern switched environment, but remember that Ethernet was originally designed to work over a single wire. Switches and twisted pair came along later.

half-duplex

In full-duplex operation, the RX line is not monitored, and the TX line is always considered available. Collisions do not occur in full-duplex mode because the RX and TX lines are completely independent.

When one side of the link is full duplex and the other side is half duplex, a large number of collisions will occur on the half-duplex side. The issue may not be obvious, because a half-duplex interface normally shows collisions, while a full-duplex interface does not. Since full duplex means never having to test for a clear-to-send condition, a full-duplex interface will not record any errors in this situation. The problem should present itself as excessive collisions, but only on the half-duplex side.

Gigabit Ethernet uses a substantially more robust autonegotiation mechanism than the one described in this chapter. Gigabit Ethernet should thus always be set to autonegotiation, unless there is a compelling reason not to do so (such as an interface that will not properly negotiate). Even then, this should be considered a temporary workaround until the misbehaving part can be replaced.

 VLANs

VLANS
When expanding a network using VLANs, you face the same limitations. If you connect another switch to a port that is configured for VLAN 20, the new switch will be able to forward frames only to or from VLAN 20. If you wanted to connect two switches, each containing four VLANs, you would need four links between the switches: one for each VLAN. A solution to this problem is to deploy trunks between switches. Trunks are links that carry frames for more than one VLAN.

img1

img3
Another way to route between VLANs is commonly known as the router-on-a-stick configuration. Instead of running a link from each VLAN to a router interface, you can run a single trunk from the switch to the router. All the VLANs will then pass over a single link.

Deploying a router on a stick saves a lot of interfaces on both the switch and the router. The downside is that the trunk is only one link, and the total bandwidth available on that link is only 10 Mbps. In contrast, when each VLAN has its own link, each VLAN has 10 Mbps to itself. Also, don’t forget that the router is passing traffic between VLANs, so chances are each frame will be seen twice on the same link—once to get to the router, and once to get back to the destination VLAN.
img4

Jack is connected to VLAN 20 on Switch B, and Diane is connected to VLAN 20 on Switch A. Because there is a trunk connecting these two switches together, assuming the trunk is allowed to carry traffic for all configured VLANs, Jack will be able to communicate with Diane. Notice that the ports to which the trunk is connected are not assigned VLANs. These ports are trunk ports and, as such, do not belong to a single VLAN.

img2

Trunking

 Possible switch port modes related to trunking

img1

 

 VTP #VLAN Trunking Protocol

img9

VTP allows VLAN configurations to be managed on a single switch. Those changes are then propagated to every switch in the VTP domain. A VTP domain is a group of connected switches with the same VTP domain string configured. Interconnected switches with differently configured VTP domains will not share VLAN information. A switch can only be in one VTP domain; the VTP domain is null by default. Switches with mismatched VTP domains will not negotiate trunk protocols. If you wish to establish a trunk between switches with mismatched VTP domains, you must have their trunk ports set to mode trunk.

img2

The main idea of VTP is that changes are made on VTP servers. These changes are then propagated to VTP clients, and any other VTP servers in the domain. Switches can be configured manually as VTP servers, VTP clients, or the third possibility, VTP transparent. A VTP transparent switch receives and forwards VTP updates but does not update its configuration to reflect the changes they contain. Some switches default to VTP server, while others default to VTP transparent. VLANs cannot be locally configured on a switch in client mode.

There is actually a fourth state for a VTP switch: off. A switch in VTP mode off will not accept VTP packets, and therefore will not forward them either. This can be handy if you want to stop the forwarding of VTP updates at some point in the network.

img3

SW1 and SW2 are both VTP servers. SW3 is set to VTP transparent, and SW4 is a VTP client. Any changes to the VLAN information on SW1 will be propagated to SW2 and SW4. The changes will be passed through SW3 but will not be acted upon by that switch. Because the switch does not act on VTP updates, its VLANs must be configured manually if users on that switch are to interact with the rest of the network.

When a switch receives a VTP update, the first thing it does is compare the VTP domain name in the update to its own. If the domains are different, the update is ignored. If they are the same, the switch compares the update’s configuration revision number to its own. If the revision number of the update is lower than or equal to the switch’s own revision number, the update is ignored. If the update has a higher revision number, the switch sends an advertisement request. The response to this request is another summary advertisement, followed by subset advertisements. Once it has received the subset advertisements, the switch has all the information necessary to implement the required changes in the VLAN configuration.

When a switch’s VTP domain is null, if it receives a VTP advertisement over a trunk link, it will inherit the VTP domain and VLAN configuration from the switch on the other end of the trunk. This will happen only over manually configured trunks, as DTP negotiations cannot take place unless a VTP domain is configured. Be careful of this behavior, as it can cause serious heartache, nausea, and potential job loss if you’re not (or the person before you wasn’t).

VTP Pruning
On large or congested networks, VTP can create a problem when excess traffic is sent across trunks needlessly. The switches in the gray box all have ports assigned to VLAN 100, while the rest of the switches do not. With VTP active, all of the switches will have VLAN 100 configured, and as such will receive broadcasts initiated on that VLAN. However, those without ports assigned to VLAN 100 have no use for the broadcasts.

img1
On a busy VLAN, broadcasts can amount to a significant percentage of traffic. In this case, all that traffic is being needlessly sent over the entire network, and is taking up valuable bandwidth on the interswitch trunks.
VTP pruning prevents traffic originating from a particular VLAN from being sent to switches on which that VLAN is not active (i.e., switches that do not have ports connected and configured for that VLAN). With VTP pruning enabled, the VLAN 100 broadcasts will be restricted to switches on which VLAN 100 is actively in use.

VTP pruning must be enabled or disabled throughout the entire VTP domain. Failure to configure VTP pruning properly can result in instability in the network. By default, all VLANs up to VLAN 1001 are eligible for pruning, except VLAN 1, which can never be pruned. VTP does not support the extended VLANs above VLAN 1001, so VLANs higher than 1001 cannot be pruned. If you enable VTP pruning on a VTP server, VTP pruning will automatically be enabled for the entire domain.
img4

Dangers of VTP
Remember that many switches are VTP servers by default. Remember, also, that when a switch participating in VTP receives an update that has a higher revision number than its own configuration’s revision number, the switch will implement the new scheme. In our scenario, the lab’s 3750s had been functioning as a standalone network with the same VTP domain as the regular network. Multiple changes were made to their VLAN configurations, resulting in a high configuration revision number. When these switches, which were VTP servers, were connected to the more stable production network, they automatically sent out updates. Each switch on the main network, including the core 6509s, received an update with a higher revision number than its current configuration. Consequently, they all requested the VLAN configuration from the rogue 3750s and implemented that design.

 Link Aggregation

EtherChannel is the Cisco term for the technology that enables the bonding of up to eight physical Ethernet links into a single logical link. The non-Cisco term used for link aggregation is generally Link Aggregation, or LAG for short.

img1

The default behavior is to assign one of the physical links to each packet that traverses the EtherChannel, based on the packet’s destination MAC address. This means that if one workstation talks to one server over an EtherChannel, only one of the physical links will be used. In fact, all of the traffic destined for that server will traverse a single physical link in the EtherChannel. This means that a single user will only ever get 1 Gbps from the EtherChannel at a time. This behavior can be changed to send each packet over a different physical link, but as you’ll see, there are limits to how well this works for applications like VoIP. The benefit arises when there are multiple destinations, which can each use a different path.

You can change the method the switch uses to determine which path to assign. The default behavior is to use the destination MAC address. However, depending on the version of the software and hardware in use, the options may include:

The source MAC address
The destination MAC address
The source and destination MAC addresses
The source IP address
The destination IP address
The source and destination IP addresses
The source port
The destination port
The source and destination ports

There is another terminology problem that can create many headaches for network administrators. While a group of physical Ethernet links bonded together is called an EtherChannel in Cisco parlance, Unix admins sometimes refer to the same configuration as a trunk. Of course, in the Cisco world the term “trunk” refers to something completely different: a link that labels frames with VLAN information so that multiple VLANs can traverse it. Some modern Unixes sometimes create a bond interface when performing link aggregation, and Windows admins often use the term teaming when combining links.
EtherChannel protocols

img2

EtherChannel can negotiate with the device on the other side of the link. Two protocols are supported on Cisco devices. The first is the Link Aggregation Control Protocol (LACP), which is defined in IEEE specification 802.3ad. LACP is used when you’re connecting to non-Cisco devices, such as servers. The other protocol used in negotiating EtherChannel links is the Port Aggregation Control Protocol (PAgP). Since PAgP is Cisco-proprietary, it is used only when you’re connecting two Cisco devices via an EtherChannel. Each protocol supports two modes: a passive mode (auto in PAgP and passive in LACP), and an active mode (desirable in PAgP and active in LACP). Alternatively, you can set the mode to on, thus forcing the creation of the EtherChannel.

 Spanning Tree

The Spanning Tree Protocol (STP) is used to ensure that no Layer-2 loops exist in a LAN. Spanning tree is designed to prevent loops among bridges. A bridge is a device that connects multiple segments within a single collision domain. Switches are considered bridges—hubs are not.

img1

When a switch receives a broadcast, it repeats the broadcast on every port (except the one on which it was received). In a looped environment, the broadcasts are repeated forever. The result is called a broadcast storm, and it will quickly bring a network to a halt. Spanning tree is an automated mechanism used to discover and break loops of this kind.

A useful tool when you’re troubleshooting a broadcast storm is the show processes cpu history command.
Here is the output from the show process cpu history command on switch B, which shows 0–3 percent CPU utilization over the course of the last minute:

img2

The numbers on the left side of the graph are the CPU utilization percentages. The numbers on the bottom are seconds in the past (0 = the time of command execution). The numbers on the top of the graph show the integer values of CPU utilization for that time period on the graph. For example, according to the preceding graph, CPU utilization was normally 0 percent, but increased to 1 percent 5 seconds ago and to 3 percent 20 seconds ago. When the values exceed 10 percent, you’ll see visual peaks in the graph itself.

3550-IOS#sho mac-address-table | include 0030.1904.da60 #Another problem caused by a looped environment is MAC address tables (CAM tables in CatOS) being constantly updated.

Spanning tree elects a root bridge (switch) in the network. The root bridge is the bridge that all other bridges need to reach via the shortest path possible. Spanning tree calculates the cost for each path from each bridge in the network to the root bridge. The path with the lowest cost is kept intact, while all others are broken. Spanning tree breaks paths by putting ports into a blocking state.

 

 

 


 

Q: What’s the difference between bandwidth and speed?

A: Bandwidth is a capacity; speed is a rate. Bandwidth tells you the maximum amount of data that your network can transmit. Speed tells you the rate at which the data can travel. The bandwidth for a CAT-5 cable is 10/100 Base-T. The speed of a CAT-5 cable changes depending on conditions.

Q: What is Base-T?
A: Base-T refers to the different standards for Ethernet transmission rates. The 10 Base-T standard transfers data at 10 megabits per second (Mbps). The 100 Base-T standard transfers data at 100 Mbps. The 1000 Base-T standard transfers data at a massive 1000 Mbps.

Q: What is a crossover cable used for?
A: Suppose you want to connect a laptop to a desktop computer. One way of doing this is to use a switch or a hub to connect the two devices, and another way of doing this would be to use a crossover cable, a cable that can send and receive data on both ends at the same time. A crossover cable is different from a straight-through cable in that a straight-through cable can only send or receive data on one end at a time.

Q: Aren’t packets and frames really the same thing?
A: No. We call data transmitting over Ethernet frames. Inside those frames, in the data field, are packets. Generally frames have to due with the transmission protocol, i.e., Ethernet, ATM, Token Ring, etc. But, as you read more about networking, you will see that there is some confusion on this.

Q: A guy in my office calls packets datagrams. Are they the same?
A: Not really. Packets refer to any data sent in as packets. Whereas datagrams are used to refer to data sent in packets by an unreliable protocol such as UDP or ICMP.

Q: What’s the difference between megabits per second (Mbps) and megabytes per second (MBps)?
A: Megabits per second (Mbps) is a bandwidth rate used in the telecommunications and computer networking field. One megabit equals one million bursts of electrical current (aka binary pulses). Megabytes per second (MBps) is a data transfer rate used in computing. One megabyte equals 1, 048, 576 bytes, and one byte equals 8 binary digits (aka bits).
The order of the wires in an RJ-45 connector conforms to one of two standards. These standards are 568A and 568B.

 

586B

586B

586A

586A

We can convert to ASCII using hex

Once you learn to use hexadecimal, you realize just how cool it is. Hex and binary make great partners, which simplifies conversions between binary and ASCII. Hex is like a bridge between the weird world of binary and our world (the human, readable world).
Here’s what we do:

  • Break the byte in half.

Each half-byte is called a nibble. [Note from Editor: you’re kidding, right?]

binary-hex-ascii-convert

  • Convert each half into its hexadecimal equivalent.

Because the binary number is broken into halves, the highest number you can get is 15 (which is “F” in hex).

  • Concatenate the two numbers.

Concatenate is a programmer’s word that simply means “put them beside each other from left to right.”

  • Look the number up in an ASCII table.(you can man ascii to see the full ascii hex character table)

binary-hex-ascii-convert-2

 

Hubs – Switches – Routers

hubs

A hub receives incoming signals and sends them out on all the other ports. When several devices start sending signals, the hub’s incessant repetition creates heavy traffic and collisions. A collision happens when two signals run into one another, creating an error. The sending network device has to back off and wait to send the signal again.

A hub contains no processors, and this means that a hub has no real understanding of network data. It doesn’t understand MAC addresses or frames. It sees an incoming networking signal as a purely electrical signal, and passes it on.

A hub is really just an electrical repeater. It takes whatever signal comes in, and sends it out on all the other ports.

switches

  • The source workstation sends a frame.

A frame carries the payload of data and keeps track of the time sent, as well as the MAC address of the source and the MAC address of the target.

  • The switch updates its MAC address table with the MAC address and the port it’s on.

Switches maintain MAC address tables. As frames come in, the switch’s knowledge of the traffic gets more descript. The switch matches ports with MAC addresses.

  • The switch forwards the frame to its target MAC address using information from its table.

It does this by sending the frame out the port where that MAC address is located as the MAC address table indicates.

Switches avoid collisions by storing and forwarding frames on the intranet. Switches are able to do this by using the MAC address of the frame. Instead of repeating the signal on all ports, it sends it on to the device that needs it.

A switch reads the signal as a frame and uses the frame’s information to send it where it’s supposed to go.

routers

 

How the router moves data across networks

  • The sending device sends an ARP request for the MAC address of its default gateway.

router-1

  • The router responds with its MAC address.

router-2

  • The sending device sends its traffic to the router.

router-3

  • The router sends an ARP request for the device with the correct IP address on a different IP network.

router-5

  • The receiving device responds with its MAC address.

router-5

The router changes the MAC address in the frame and sends the data to the receiving device.

router-6

  • The source workstation sends a frame to the router.

It sends it to the router since the workstation the traffic is meant for is behind the router.

  • The router changes the source MAC address to its MAC address and changes the destination MAC address to the workstation the traffic is meant for.

If network traffic comes from a router, we can only see the router’s MAC address. All the workstations behind that router make up what we call an IP subnet. All a switch needs to look at to get frames to their destination is the MAC address. A router looks at the IP address from the incoming packet and forwards it if it is intended for a workstation located on the other network. Routers have far less network ports because they tend to connect to other routers or to switches. Computers are generally not connected directly to a router.

The switch decides where to send traffic based on the MAC address, whereas the router based on the IP address.

Q: But I have a DSL router at home, and my computer is directly connected to it. What is that all about?
A: Good observation. There are switches that have routing capability and routers that have switched ports. There is not a real clear line between the two devices. It is more about their primary function. Now, in large networks, there are switching routers. These have software that allow them to work as routers on switched ports. They are great to use and make building large sophisticated networks straightforward, but they are very expensive.

Q: So the difference betweeen my home DSL router and an enterprise switching router is the software?
A: The big difference is the hardware horsepower. Your home DSL router probably uses a small embedded processor or microcontroller which does all the processing. Switching routers and heavy duty routers have specialized processors with individual processors on each port. The name of the game is the speed at which is can move packets. Your home DSL router probably has a throughput of about 20 Mbps (Megabits per second), whereas a high end switching router can have a throughput of hundreds of Gbps (Gigabits per second) or more.

Hook Wireshark up to the switch

  • Connect your computer to the switch with a serial cable.

You will use this to communicate with the switch.

  • Open a terminal program such as Hyperterminal and get to the command prompt of the switch. Type in the commands below.

switch-port-monitor

  • Hook up your computer to port 1 on the switch with an Ethernet cable.

You will use this to capture network traffic.

  • Startup Wireshark and capture some network traffic.

wireshark

wireshark-2

 

 

PS:

SELinux security context and its elements – user, role, type identifiers

April 17th, 2014

All operating system access control is based on some type of access control attribute associated with objects and subjects. In SELinux, the access control attribute is called a security context. All objects (files, interprocess communication channels, sockets, network hosts, and so on) and subjects (processes) have a single security context associated with them. A security context has three elements: user, role, and type identifiers. The usual format for specifying or displaying a security context is as follows:

user:role:type

A valid security context must have one valid user, role, and type identifier, and that the identifiers are defined by the policy writer, and the string identifiers for each element are defined in the SELinux policy language.

Here’s the relationship between unix/linux users -> SELinux identifiers -> roles -> domain:

selinux-context

And here’s one example of SELinux transitions:

selinux-transitions

And here’s the code you can set to apache httpd server when SELinux runs in enforcing mode:

chcon -t httpd_sys_content_t /var/www/html
chcon -t httpd_sys_content_t /var/www/html -R
ls -Z /var/www/html

PS:
Some contents of this article is from book <SELinux by Example: Using Security Enhanced Linux>.

Categories: Linux, Security Tags:

resolved – show kitchen sink buttons when wordpress goes to fullscreen mode

April 11th, 2014

When you click the full-screen button of wordpress TinyMCE, wordpress will go to “Distraction-Free Writing mode”, which benefits as the name suggests. However, you’ll also find the toolbox of TinyMCE will only show a limited number of buttons and the second line of the toolbox(kitchen sink) will not show at all(I tried install plugin such as ultimate TinyMCE or advanced TinyMCE, but the issue remained):

full-screenPreviously, you can type ALT+SHIFT+G to go to another type of fullscreen mode, which has all buttons include kitchen sink ones. However, seems now the updated version of wordpress has disabled this feature.

To resolve this issue, we can insert the following code in functions.php of your theme:

function my_mce_fullscreen($buttons) {
$buttons[] = 'fullscreen';
return $buttons;
}
add_filter('mce_buttons', 'my_mce_fullscreen');

Later, the TinyMCE will have two full-screen button:

full-screen buttonsMake sure to click the SECOND full-screen button. When you do so, the editor will transform to the following appearance:

full-screen with kitchen sinkI assume this is what you’re trying for, right?

 

 

Categories: Life Tags:

add horizontal line button in wordpress

April 11th, 2014

There’re three methods for you to add a horizontal line button in wordpress:

Firstly, switch to “Text” mode, and enters <hr />.

Secondly, add the following in functions.php of your wordpress theme:

function enable_more_buttons($buttons) {
$buttons[] = ‘hr’;
return $buttons;
}
add_filter(“mce_buttons”, “enable_more_buttons”);

horizontal line

Thirdly, you can install plugin “Ultimate TinyMCE”, and in its setting, you can enable horizontal line button there in one click! This is my recommendation.

ultimate tinymce

Categories: Life Tags: ,

linux tips

April 10th, 2014
Linux Performance & Troubleshooting
For Linux Performance & Troubeshooting, please refer to another post - Linux tips – Performance and Troubleshooting
Linux system tips
ls -lu(access time, like cat file) -lt(modification time, like vi, ls -l defaults to use this) -lc(change time, chmod), stat ./aa.txt <UTC>
ctrl +z #bg and stopped
%1 & #bg and running
%1 #fg
pgrep -flu oracle  # processes owned by the user oracle
watch free -m #refresh every 2 seconds
pmap -x 30420 #memory mapping.
openssl s_client -connect localhost:636 -showcerts #verify ssl certificates, or 443
openssl x509 -in cacert.pem -noout -text
openssl x509 -in cacert.pem -noout -dates
openssl x509 -in cacert.pem -noout -purpose
openssl req -in robots.req.pem -text -verify -noout
blockdev –getbsz /dev/xvda1 #get blocksize of FS
dumpe2fs /dev/xvda1 |grep ‘Block size’
Strings
ovm svr ls|sort -rn -k 4 #sort by column 4
cat a1|sort|uniq -c |sort #SUS
ovm svr ls|uniq -f3 #skip the first three columns, this will list only 1 server per pool
for i in <all OVMs>;do (test.sh $i &);done #instead of using nohup &
ovm vm ls|egrep “`echo testhost{0\|,1\|,2\|,3\|,4}|tr -d ‘[:space:]‘`”
cat a|awk ‘{print $5}’|tr ‘\n’ ‘ ‘
getopt #getopts is builtin, more on http://wuliangxx.iteye.com/blog/750940
date -d ’1970-1-1 1276059000 sec utc’
date -d ’2010-09-11 23:20′ +%s
find . -name ‘*txt’|xargs tar cvvf a.tar
find . -maxdepth 1
for i in `find /usr/sbin/ -type f ! -perm -u+x`;do chmod +x $i;done #files that has no execute permisson for owner
find ./* -prune -print #-prune,do not cascade
find . -fprint file #put result to file
tar tvf a.tar  –wildcards “*ipp*” #globbing patterns
tar xvf bfiles.tar –wildcards –no-anchored ‘b*’
tar –show-defaults
tar cvf a.tar –totals *.txt #show speed
tar –append –file=collection.tar rock #add rock to collection.tar
tar –update -v -f collection.tar blues folk rock classical #only append new or updated ones, not replace
tar –delete –file=collection.tar blues #not on tapes
tar -c -f archive.tar –mode=’a+rw’
tar -C sourcedir -cf – . | tar -C targetdir -xf – #copy directories
tar -c -f jams.tar grape prune -C food cherry #-C,change dir, foot file cherry under foot directory
find . -size -400 -print > small-files
tar -c -v -z -T small-files -f little.tgz
tar -cf src.tar –exclude=’*.o’ src #multiple –exclude can be specified
expr 5 – 1
rpm2cpio ./ash-1.0.1-1.x86_64.rpm |cpio -ivd
eval $cmd
exec menu.viewcards #same to .
ls . | xargs -0 -i cp ./{} /etc #-i,use \n as separator, just like find -exec. -0 for space in filename. find -print0 use space to separate, not enter.(-i or -I {} for revoking filenames in the middle)
ls | xargs -t -i mv {} {}.old #mv source should exclude /,or unexpected errors may occur
mv –strip-trailing-slashes source destination
ls |xargs file /dev/fd/0 #replace -
ls -l -I “*out*” #not include out
find . -type d |xargs -i du -sh {} |awk ‘$1 ~ /G/’
find . -type f -name “*20120606″ -exec rm {} \; #do not need rm -rf. find . -type f -exec bash -c “ls -l ‘{}’” \;
ps -ef|grep init|sed -n ’1p’
cut -d ‘ ‘ -f1,3 /etc/mtab #first and third
seq 15 21 #print 15 to 21, or echo {15..21}
seq -s” ” 15 21 #use space as separator
Categories: Linux, tips Tags:

Linux tips – Performance and Troubleshooting

April 10th, 2014

System CPU

top
procinfo #yum install procinfo
gnome-system-monitor #can also see network flow rate
mpstat
sar

System Memory

top
free
slabtop
sar
/proc/meminfo #provides the most complete view of system memory usage
procinfo
gnome-system-monitor #can also see network flow rate

Process-specific CPU

time
strace #traces the system calls that a program makes while executing
ltrace #traces the calls(functions) that an application makes to libraries rather than to the kernel. Then use ldd to display which libraries are used, and use objdump to search each of those libraries for the given function.
ps
ld.so #ld

Process-specific Memory

ps
/proc/<pid> #you can refer to http://www.doxer.org/proc-filesystem-day-1/ for more info.

/proc/<PID>/status #provides information about the status of a given process PID
/proc/<PID>/maps #how the process’s virtual address space is used

ipcs #more info on http://www.doxer.org/resolved-semget-failed-with-status-28-failed-oracle-database-starting-up/ and http://www.doxer.org/resolvedload-manager-shared-memory-error-is-28-no-space-left-on-devicefor-apache-pmserver-etc-running-on-linux-solaris-unix/

Disk I/O

vmstat #provides totals rather than the rate of change during the sample
sar
lsof
time sh -c “dd if=/dev/zero of=System2.img bs=1M count=10240 && sync” #10G
time dd if=ddfile of=/dev/null bs=8k
dd if=/dev/zero of=vm1disk bs=1M seek=10240 count=0 #10G

Network

ethtool
ifconfig
ip
iptraf
gkrellm
netstat
gnome-system-monitor #can also see network flow rate
sar #network statistics
/etc/cron.d/sysstat #/var/log/sa/

General Ideas & options & outputs

Run Queue Statistics
In Linux, a process can be either runnable or blocked waiting for an event to complete.

A blocked process may be waiting for data from an I/O device or the results of a system call.

When these processes are runnable, but waiting to use the processor, they form a line called the run queue.
The load on a system is the total amount of running and runnable process.

Context Switches
To create the illusion that a given single processor runs multiple tasks simultaneously, the Linux kernel constantly switches between different processes.
The switch between different processes is called a context switch.
To guarantee that each process receives a fair share of processor time, the kernel periodically interrupts the running process and, if appropriate, the kernel scheduler decides to start another process rather than let the current process continue executing. It is possible that your system will context switch every time this periodic interrupt or timer occurs. (cat /proc/interrupts | grep timer, and do this again after e.g. 10s interval)

Interrupts
In addition, periodically, the processor receives an interrupt by hardware devices.
/proc/interrupts can be examined to show which interrupts are firing on which CPUs

CPU Utilization
At any given time, the CPU can be doing one of seven things:
Idle
Running user code #user time
System time #executing code in the Linux kernel on behalf of the application code
Executing user code that has been “nice”ed or set to run at a lower priority than normal processes
iowait #waiting for I/O (such as disk or network) to complete
irq #means it is in high-priority kernel code handling a hardware interrupt
softirq #executing kernel code that was also triggered by an interrupt, but it is running at a lower priority


Buffers and cache
Alternatively, if your system has much more physical memory than required by your applications, Linux will cache recently used files in physical memory so that subsequent accesses to that file do not require an access to the hard drive. This can greatly speed up applications that access the hard drive frequently, which, obviously, can prove especially useful for frequently launched applications. The first time the application is launched, it needs to be read from the disk; if the application remains in the cache, however, it needs to be read from the much quicker physical memory. This disk cache differs from the processor cache mentioned in the previous chapter. Other than oprofile, valgrind, and kcachegrind, most tools that report statistics about “cache” are actually referring to disk cache.

In addition to cache, Linux also uses extra memory as buffers. To further optimize applications, Linux sets aside memory to use for data that needs to be written to disk. These set-asides are called buffers. If an application has to write something to the disk, which would usually take a long time, Linux lets the application continue immediately but saves the file data into a memory buffer. At some point in the future, the buffer is flushed to disk, but the application can continue immediately.
Active Versus Inactive Memory
Active memory is currently being used by a process. Inactive memory is memory that is allocated but has not been used for a while. Nothing is essentially different between the two types of memory. When required, the Linux kernel takes a process’s least recently used memory pages and moves them from the active to the inactive list. When choosing which memory will be swapped to disk, the kernel chooses from the inactive memory list.
Kernel Usage of Memory (Slabs)
In addition to the memory that applications allocate, the Linux kernel consumes a certain amount for bookkeeping purposes. This bookkeeping includes, for example, keeping track of data arriving from network and disk I/O devices, as well as keeping track of which processes are running and which are sleeping. To manage this bookkeeping, the kernel has a series of caches that contains one or more slabs of memory. Each slab consists of a set of one or more objects. The amount of slab memory consumed by the kernel depends on which parts of the Linux kernel are being used, and can change as the type of load on the machine changes.

slabtop

slabtop shows in real-time how the kernel is allocating its various caches and how full they are. Internally, the kernel has a series of caches that are made up of one or more slabs. Each slab consists of a set of one or more objects. These objects can be active (or used) or inactive (unused). slabtop shows you the status of the different slabs. It shows you how full they are and how much memory they are using.


time

time measures three types of time. First, it measures the real or elapsed time, which is the amount of time between when the program started and finished execution. Next, it measures the user time, which is the amount of time that the CPU spent executing application code on behalf of the program. Finally, time measures system time, which is the amount of time the CPU spent executing system or kernel code on behalf of the application.


Disk I/O

When an application does a read or write, the Linux kernel may have a copy of the file stored into its cache or buffers and returns the requested information without ever accessing the disk. If the Linux kernel does not have a copy of the data stored in memory, however, it adds a request to the disk’s I/O queue. If the Linux kernel notices that multiple requests are asking for contiguous locations on the disk, it merges them into a single big request. This merging increases overall disk performance by eliminating the seek time for the second request. When the request has been placed in the disk queue, if the disk is not currently busy, it starts to service the I/O request. If the disk is busy, the request waits in the queue until the drive is available, and then it is serviced.

iostat

iostat provides a per-device and per-partition breakdown of how many blocks are written to and from a particular disk. (Blocks in iostat are usually sized at 512 bytes.)

lsof
lsof can prove helpful when narrowing down which applications are generating I/O


 top output

S(or STAT) – This is the current status of a process, where the process is either sleeping (S), running (R), zombied (killed but not yet dead) (Z), in an uninterruptable sleep (D), or being traced (T).

TIME – The total amount CPU time (user and system) that this process has used since it started executing.

top options

-b Run in batch mode. Typically, top shows only a single screenful of information, and processes that don’t fit on the screen never display. This option shows all the processes and can be very useful if you are saving top’s output to a file or piping the output to another command for processing.

I This toggles whether top will divide the CPU usage by the number of CPUs on the system. For example, if a process was consuming all of both CPUs on a two-CPU system, this toggles whether top displays a CPU usage of 100% or 200%.

1 (numeral 1) This toggles whether the CPU usage will be broken down to the individual usage or shown as a total.

mpstat options

-P { cpu | ALL } This option tells mpstat which CPUs to monitor. cpu is the number between 0 and the total CPUs minus 1.

The biggest benefit of mpstat is that it shows the time next to the statistics, so you can look for a correlation between CPU usage and time of day.

mpstat can be used to determine whether the CPUs are fully utilized and relatively balanced. By observing the number of interrupts each CPU is handling, it is possible to find an imbalance.

 sar options

-I {irq | SUM | ALL | XALL} This reports the rates that interrupts have been occurring in the system.
-P {cpu | ALL} This option specifies which CPU the statistics should be gathered from. If this isn’t specified, the system totals are reported.
-q This reports information about the run queues and load averages of the machine.
-u This reports information about CPU utilization of the system. (This is the default output.)
-w This reports the number of context switches that occurred in the system.
-o filename This specifies the name of the binary output file that will store the performance statistics.
-f filename This specifies the filename of the performance statistics.

-B – This reports information about the number of blocks that the kernel swapped to and from disk. In addition, for kernel versions after v2.5, it reports information about the number of page faults.
-W – This reports the number of pages of swap that are brought in and out of the system.
-r – This reports information about the memory being used in the system. It includes information about the total free memory, swap, cache, and buffers being used.
-R Report memory statistics

-d –  reports disk activities

-n DEV – Shows statistics about the number of packets and bytes sent and received by each device.
-n EDEV – Shows information about the transmit and receive errors for each device.
-n SOCK – Shows information about the total number of sockets (TCP, UDP, and RAW) in use.
-n ALL – Shows all the network statistics.

sar output

runq-sz This is the size of the run queue when the sample was taken.
plist-sz This is the number of processes present (running, sleeping, or waiting for I/O) when the sample was taken.
proc/s This is the number of new processes created per second. (This is the same as the forks statistic from vmstat.)

tps – Transfers per second. This is the number of reads and writes to the drive/partition per second.
rd_sec/s – Number of disk sectors read per second.
wr_sec/s – Number of disk sectors written per second.


vmstat options

-n print header info only once

-a This changes the default output of memory statistics to indicate the active/inactive amount of memory rather than information about buffer and cache usage.
-s (procps 3.2 or greater) This prints out the vm table. This is a grab bag of different statistics about the system since it has booted. It cannot be run in sample mode. It contains both memory and CPU statistics.

-d – This option displays individual disk statistics at a rate of one sample per interval. The statistics are the totals since system boot, rather than just those that occurred between this sample and the previous sample.
-p partition – This displays performance statistics about the given partition at a rate of one sample per interval. The statistics are the totals since system boot, rather than just those that occurred between this sample and the previous sample.

vmstat output
si – The rate of memory (in KB/s) that has been swapped in from disk during the last sample.
so – The rate of memory (in KB/s) that has been swapped out to disk during the last sample.
pages paged in – The amount of memory (in pages) read from the disk(s) into the system buffers. (On most IA32 systems, a page is 4KB.)
pages paged out – The amount of memory (in pages) written to the disk(s) from the system cache. (On most IA32 systems, a page is 4KB.)
pages swapped in – The amount of memory (in pages) read from swap into system memory.
pages swapped in/out – The amount of memory (in pages) written from system memory to the swap.

bo – This indicates the number of total blocks written to disk in the previous interval. (In vmstat, block size for a disk is typically 1,024 bytes.)
bi – This shows the number of blocks read from the disk in the previous interval. (In vmstat, block size for a disk is typically 1,024 bytes.)
wa – This indicates the amount of CPU time spent waiting for I/O to complete. The rate of disk blocks written per second.
reads: ms – The amount of time (in ms) spent reading from the disk.
writes: ms – The amount of time (in ms) spent writing to the disk.
IO: cur – The total number of I/O that are currently in progress. Note that there is a bug in recent versions of vmstat in which this is incorrectly divided by 1,000, which almost always yields a 0.
IO: s – This is the number of seconds spent waiting for I/O to complete.

iostat options
-d – This displays only information about disk I/O rather than the default display, which includes information about CPU usage as well.
-k – This shows statistics in kilobytes rather than blocks.
-x – This shows extended-performance I/O statistics.
device – If a device is specified, iostat shows only information about that device.

iostat output
tps – Transfers per second. This is the number of reads and writes to the drive/partition per second.
Blk_read/s – The rate of disk blocks read per second.
Blk_wrtn/s – The rate of disk blocks written per second.
Blk_read – The total number of blocks read during the interval.
Blk_wrtn – The total number of blocks written during the interval.
rrqm/s – The number of reads merged before they were issued to the disk.
wrqm/s – The number of writes merged before they were issued to the disk.
r/s – The number of reads issued to the disk per second.
w/s – The number of writes issued to the disk per second.
rsec/s – Disk sectors read per second.
wsec/s – Disk sectors written per second.
avgrq-sz – The average size (in sectors) of disk requests.
avgqu-sz – The average size of the disk request queue.
await – The average time (in ms) for a request to be completely serviced. This average includes the time that the request was waiting in the disk’s queue plus the amount of time it was serviced by the disk.
svctm – The average service time (in ms) for requests submitted to the disk. This indicates how long on average the disk took to complete a request. Unlike await, it does not include the amount of time spent waiting in the queue.

lsof options
+D directory – This causes lsof to recursively search all the files in the given directory and report on which processes are using them.
+d directory – This causes lsof to report on which processes are using the files in the given directory.

lsof output
FD – The file descriptor of the file, or tex for a executable, mem for a memory mapped file.
TYPE – The type of file. REG for a regular file.
DEVICE – Device number in major, minor number.
SIZE – The size of the file.
NODE – The inode of the file.


free options

-s delay – This option causes free to print out new memory statistics every delay seconds.


 strace options

strace [-p <pid>] -s 200 <program>#attach to a process. -s 200 to make the maximum string size to print (the default is 32) to 200. Note that filenames are not considered strings and are always printed in full.

-c – This causes strace to print out a summary of statistics rather than an individual list of all the system calls that are made.

ltrace options
-c – This option causes ltrace to print a summary of all the calls after the command has completed.
-S – ltrace traces system calls in addition to library calls, which is identical to the functionality strace provides.
-p pid – This traces the process with the given PID.


ps options
vsz The virtual set size is the amount of virtual memory that the application is using. Because Linux only allocated physical memory when an application tries to use it, this value may be much greater than the amount of physical memory the application is using.
rss The resident set size is the amount of physical memory the application is currently using.
pmep The percentage of the system memory that the process is consuming.
command This is the command name.

/proc/<PID>/status output
VmSize This is the process’s virtual set size, which is the amount of virtual memory that the application is using. Because Linux only allocates physical memory when an application tries to use it, this value may be much greater than the amount of physical memory the application is actually using. This is the same as the vsz parameter provided by ps.
VmLck This is the amount of memory that has been locked by this process. Locked memory cannot be swapped to disk.
VmRSS This is the resident set size or amount of physical memory the application is currently using. This is the same as the rss statistic provided by ps.

ipcs
Because shared memory is used by multiple processes, it cannot be attributed to any particular process. ipcs provides enough information about the state of the system-wide shared memory to determine which processes allocated the shared memory, which processes are using it, and how often they are using it. This information proves useful when trying to reduce shared memory usage.

ipcs options

lsof –u oracle | grep <shmid> #shmid is from output of ipcs -m. lists the processes under the oracle user attached to the shared memory segment

-t – This shows the time when the shared memory was created, when a process last attached to it, and when a process last detached from it.
-u – This provides a summary about how much shared memory is being used and whether it has been swapped or is in memory.
-l – This shows the system-wide limits for shared memory usage.
-p – This shows the PIDs of the processes that created and last used the shared memory segments.
-c – creator


ifconfig output #more on http://www.thegeekscope.com/linux-ifconfig-command-output-explained/

Errors – Frames with errors (possibly because of a bad network cable or duplex mismatch).
Dropped – Frames that were discarded (most likely because of low amounts of memory or buffers).
Overruns – Frames that may have been discarded by the network card because the kernel or network card was overwhelmed with frames. This should not normally happen.
Frame – These frames were dropped as a result of problems on the physical level. This could be the result of cyclic redundancy check (CRC) errors or other low-level problems.
Compressed – Some lower-level interfaces, such as Point-to-Point Protocol (PPP) or Serial Line Internet Protocol (SLIP) devices compress frames before they are sent over the network. This value indicates the number of these compressed frames. (Compressed packets are usually present during SLIP or PPP connections)

carrier – The number of packets discarded because of link media failure (such as a faulty cable)

ip options
-s [-s] link – If the extra -s is provided to ip, it provides a more detailed list of low-level Ethernet statistics.

iptraf options
-d interface – Detailed statistics for an interface including receive, transmit, and error rates
-s interface – Statistics about which IP ports are being used on an interface and how many bytes are flowing through them
-t <minutes> – Number of minutes that iptraf runs before exiting
-z interface – shows packet counts by size on the specified interface

netstat options
-p – Displays the PID/program name responsible for opening each of the displayed sockets
-c – Continually updates the display of information every second
–interfaces=<name> – Displays network statistics for the given interface
–statistics|-s – IP/UDP/ICMP/TCP statistics
–tcp|-t – Shows only information about TCP sockets
–udp|-u – Shows only information about UDP sockets.
–raw|-w – Shows only information about RAW sockets (IP and ICMP)
–listening|-l – Show only listening sockets. (These are omitted by default.)
–all|-a – Show both listening and non-listening (for TCP this means established connections) sockets. With the –interfaces option, show interfaces that are not marked
–numeric|-n – Show numerical addresses instead of trying to determine symbolic host, port or user names.
–extend|-e – Display additional information. Use this option twice for maximum detail.

netstat output

Active Internet connections (w/o servers)
Proto - The protocol (tcp, udp, raw) used by the socket.
Recv-Q - The count of bytes not copied by the user program connected to this socket.
Send-Q - The count of bytes not acknowledged by the remote host.
Local Address - Address and port number of the local end of the socket. Unless the --numeric (-n) option is specified, the socket address is resolved to its canonical host name (FQDN), and the port number is translated into the corresponding service name.
Foreign Address - Address and port number of the remote end of the socket. Analogous to "Local Address."
State - The state of the socket. Since there are no states in raw mode and usually no states used in UDP, this column may be left blank. Normally this can be one of several values: #more on http://www.doxer.org/tcp-flags-explanation-in-details-syn-ack-fin-rst-urg-psh-and-iptables-for-sync-flood/
    ESTABLISHED
        The socket has an established connection.
    SYN_SENT
        The socket is actively attempting to establish a connection.
    SYN_RECV
        A connection request has been received from the network.
    FIN_WAIT1
        The socket is closed, and the connection is shutting down.
    FIN_WAIT2
        Connection is closed, and the socket is waiting for a shutdown from the remote end.
    TIME_WAIT
        The socket is waiting after close to handle packets still in the network.
    CLOSED
        The socket is not being used.
    CLOSE_WAIT
        The remote end has shut down, waiting for the socket to close.
    LAST_ACK
        The remote end has shut down, and the socket is closed. Waiting for acknowledgement.
    LISTEN
        The socket is listening for incoming connections. Such sockets are not included in the output unless you specify the --listening (-l) or --all (-a) option.
    CLOSING
        Both sockets are shut down but we still don't have all our data sent.
    UNKNOWN
        The state of the socket is unknown.
User - The username or the user id (UID) of the owner of the socket.
PID/Program name - Slash-separated pair of the process id (PID) and process name of the process that owns the socket. --program causes this column to be included. You will also need superuser privileges to see this information on sockets you don't own. This identification information is not yet available for IPX sockets.

Example

[ezolt@scrffy ~/edid]$ vmstat 1 | tee /tmp/output
procs -----------memory---------- ---swap-- -----io----  --system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo    in    cs  us sy id wa
0  1 201060  35832  26532 324112    0    0     3     2     6     2  5  1  94  0
0  0 201060  35888  26532 324112    0    0    16     0  1138   358  0  0  99  0
0  0 201060  35888  26540 324104    0    0     0    88  1163   371  0  0 100  0

The number of context switches looks good compared to the number of interrupts. The scheduler is switching processes less than the number of timer interrupts that are firing. This is most likely because the system is nearly idle, and most of the time when the timer interrupt fires, the scheduler does not have any work to do, so it does not switch from the idle process.

[ezolt@scrffy manuscript]$ sar -w -c -q 1 2
Linux 2.6.8-1.521smp (scrffy)   10/20/2004

08:23:29 PM    proc/s
08:23:30 PM      0.00

08:23:29 PM   cswch/s
08:23:30 PM    594.00

08:23:29 PM   runq-sz  plist-sz   ldavg-1    ldavg-5  ldavg-15
08:23:30 PM         0       163      1.12       1.17      1.17

08:23:30 PM    proc/s
08:23:31 PM      0.00

08:23:30 PM   cswch/s
08:23:31 PM    812.87

08:23:30 PM   runq-sz  plist-sz   ldavg-1    ldavg-5  ldavg-15
08:23:31 PM         0       163      1.12       1.17      1.17

Average:       proc/s
Average:         0.00

Average:      cswch/s
Average:       703.98

Average:      runq-sz  plist-sz   ldavg-1    ldavg-5  ldavg-15
Average:            0       163      1.12       1.17      1.17

In this case, we ask sar to show us the total number of context switches and process creations that occur every second. We also ask sar for information about the load average. We can see in this example that this machine has 163 process that are in memory but not running. For the past minute, on average 1.12 processes have been ready to run.

bash-2.05b$ vmstat -a
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free  inact active   si   so    bi    bo   in    cs us sy id wa
 2  1 514004   5640 79816 1341208   33   31   204   247 1111  1548  8  5 73 14

The amount of inactive pages indicates how much of the memory could be swapped to disk and how much is currently being used. In this case, we can see that 1310MB of memory is active, and only 78MB is considered inactive. This machine has a large amount of memory, and much of it is being actively used.


bash-2.05b$ vmstat -s

      1552528  total memory
      1546692  used memory
      1410448  active memory
        11100  inactive memory
         5836  free memory
         2676  buffer memory
       645864  swap cache
      2097096  total swap
       526280  used swap
      1570816  free swap
     20293225 non-nice user cpu ticks
     18284715 nice user cpu ticks
     17687435 system cpu ticks
    357314699 idle cpu ticks
     67673539 IO-wait cpu ticks
       352225 IRQ cpu ticks
      4872449 softirq cpu ticks
    495248623 pages paged in
    600129070 pages paged out
     19877382 pages swapped in
     18874460 pages swapped out
   2702803833 interrupts
   3763550322 CPU context switches
   1094067854 boot time
     20158151 forks

It can be helpful to know the system totals when trying to figure out what percentage of the swap and memory is currently being used. Another interesting statistic is the pages paged in, which indicates the total number of pages that were read from the disk. This statistic includes the pages that are read starting an application and those that the application itself may be using.


[ezolt@wintermute tmp]$ ps -o etime,time,pcpu,cmd 10882
    ELAPSED     TIME %CPU CMD
      00:06 00:00:05 88.0 ./burn

This example shows a test application that is consuming 88 percent of the CPU and has been running for 6 seconds, but has only consumed 5 seconds of CPU time.


[ezolt@wintermute tmp]$ ps –o vsz,rss,tsiz,dsiz,majflt,minflt,cmd 10882
VSZ RSS TSIZ DSIZ MAJFLT MINFLT CMD
11124 10004 1 11122 66 2465 ./burn

The burn application has a very small text size (1KB), but a very large data size (11,122KB). Of the total virtual size (11,124KB), the process has a slightly smaller resident set size (10,004KB), which represents the total amount of physical memory that the process is actually using. In addition, most of the faults generated by burn were minor faults, so most of the memory faults were due to memory allocation rather than loading in a large amount of text or data from the program image on the disk.


[ezolt@wintermute tmp]$ cat /proc/4540/status
Name: burn
State: T (stopped)
Tgid: 4540
Pid: 4540
PPid: 1514
TracerPid: 0
Uid: 501 501 501 501
Gid: 501 501 501 501
FDSize: 256
Groups: 501 9 502
VmSize: 11124 kB
VmLck: 0 kB
VmRSS: 10004 kB
VmData: 9776 kB
VmStk: 8 kB
VmExe: 4 kB
VmLib: 1312 kB
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000

The VmLck size of 0KB means that the process has not locked any pages into memory, making them unswappable. The VmRSS size of 10,004KB means that the application is currently using 10,004KB of physical memory, although it has either allocated or mapped the VmSize or 11,124KB. If the application begins to use the memory that it has allocated but is not currently using, the VmRSS size increases but leaves the VmSize unchanged.

[ezolt@wintermute test_app]$ cat /proc/4540/maps
08048000-08049000 r-xp 00000000 21:03 393730 /tmp/burn
08049000-0804a000 rw-p 00000000 21:03 393730 /tmp/burn
0804a000-089d3000 rwxp 00000000 00:00 0
40000000-40015000 r-xp 00000000 21:03 1147263 /lib/ld-2.3.2.so
40015000-40016000 rw-p 00015000 21:03 1147263 /lib/ld-2.3.2.so
4002e000-4002f000 rw-p 00000000 00:00 0
4002f000-40162000 r-xp 00000000 21:03 2031811 /lib/tls/libc-2.3.2.so
40162000-40166000 rw-p 00132000 21:03 2031811 /lib/tls/libc-2.3.2.so
40166000-40168000 rw-p 00000000 00:00 0
bfffe000-c0000000 rwxp fffff000 00:00 0

The burn application is using two libraries: ld and libc. The text section (denoted by the permission r-xp) of libc has a range of 0x4002f000 through 0×40162000 or a size of 0×133000 or 1,257,472 bytes.
The data section (denoted by permission rw-p) of libc has a range of 40162000 through 40166000 or a size of 0×4000 or 16,384 bytes. The text size of libc is bigger than ld’s text size of 0×15000 or 86,016 bytes. The data size of libc is also bigger than ld’s text size of 0×1000 or 4,096 bytes. libc is the big library that burn is linking in.


[ezolt@wintermute tmp]$ ipcs -u

------ Shared Memory Status --------
segments allocated 21
pages allocated 1585
pages resident 720
pages swapped 412
Swap performance: 0 attempts 0 successes

------ Semaphore Status --------
used arrays = 0
allocated semaphores = 0

------ Messages: Status --------
allocated queues = 0
used headers = 0
used space = 0 bytes

In this case, we can see that 21 different segments or pieces of shared memory have been allocated. All these segments consume a total of 1,585 pages of memory; 720 of these exist in physical memory and 412 have been swapped to disk.

[ezolt@wintermute tmp]$ ipcs

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 0 root 777 49152 1
0x00000000 32769 root 777 16384 1
0x00000000 65538 ezolt 600 393216 2 dest

we ask ipcs for a general overview of all the shared memory segments in the system. This indicates who is using each memory segment. In this case, we see a list of all the shared segments. For one in particular, the one with a share memory ID of 65538, the user (ezolt) is the owner. It has a permission of 600 (a typical UNIX permission), which in this case, means that only ezolt can read and write to it. It has 393,216 bytes, and 2 processes are attached to it.

[ezolt@wintermute tmp]$ ipcs -p

------ Shared Memory Creator/Last-op --------
shmid owner cpid lpid
0 root 1224 11954
32769 root 1224 11954
65538 ezolt 1229 11954

Finally, we can figure out exactly which processes created the shared memory segments and which other processes are using them. For the segment with shmid 32769, we can see that the PID 1229 created it and 11954 was the last to use it.


[ezolt@wintermute procps-3.2.0]$ ./vmstat 1 3

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 0 197020 81804 29920 0 0 236 25 1017 67 1 1 93 4
1 1 0 172252 106252 29952 0 0 24448 0 1200 395 1 36 0 63
0 0 0 231068 50004 27924 0 0 19712 80 1179 345 1 34 15 49

During one of the samples, the system read 24,448 disk blocks. As mentioned previously, the block size for a disk is 1,024 bytes(or 4,096 bytes), so this means that the system is reading in data at about 23MB per second. We can also see that during this sample, the CPU was spending a significant portion of time waiting for I/O to complete. The CPU waits on I/O 63 percent of the time during the sample in which the disk was reading at ~23MB per second, and it waits on I/O 49 percent for the next sample, in which the disk was reading at ~19MB per second.

[ezolt@wintermute procps-3.2.0]$ ./vmstat -D
3 disks
5 partitions
53256 total reads
641233 merged reads
4787741 read sectors
343552 milli reading
14479 writes
17556 merged writes
257208 written sectors
7237771 milli writing
0 inprogress IO
342 milli spent IO

In this example, a large number of the reads issued to the system were merged before they were issued to the device. Although there were ~640,000 merged reads, only ~53,000 read commands were actually issued to the drives. The output also tells us that a total of 4,787,741 sectors have been read from the disk, and that since system boot, 343,552ms (or 344 seconds) were spent reading from the disk. The same statistics are available for write performance.

[ezolt@wintermute procps-3.2.0]$ ./vmstat -p hde3 1 3
hde3 reads read sectors writes requested writes
18999 191986 24701 197608
19059 192466 24795 198360
- 19161 193282 24795 198360

Shows that 60 (19,059 – 18,999) reads and 94 writes (24,795 – 24,795) have been issued to partition hde3. This view can prove particularly useful if you are trying to determine which partition of a disk is seeing the most usage.


 

[ezolt@localhost sysstat-5.0.2]$ ./iostat -x -dk 1 5 /dev/hda2
Linux 2.4.22-1.2188.nptl (localhost.localdomain) 05/01/2004
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 11.22 44.40 3.15 4.20 115.00 388.97 57.50 194.49
68.52 1.75 237.17 11.47 8.43

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 1548.00 0.00 100.00 0.00 13240.00 0.00 6620.00
132.40 55.13 538.60 10.00 100.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 1365.00 0.00 131.00 0.00 11672.00 0.00 5836.00
89.10 53.86 422.44 7.63 100.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 1483.00 0.00 84.00 0.00 12688.00 0.00 6344.00
151.0 39.69 399.52 11.90 100.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 2067.00 0.00 123.00 0.00 17664.00 0.00 8832.00
143.61 58.59 508.54 8.13 100.00

you can see that the average queue size is pretty high (~237 to 538) and, as a result, the amount of time that a request must wait (~422.44ms to 538.60ms) is much greater than the amount of time it takes to service the request (7.63ms to 11.90ms). These high average service times, along with the fact that the utilization is 100 percent, show that the disk is completely saturated.


[ezolt@wintermute sysstat-5.0.2]$ sar -n SOCK 1 2

Linux 2.4.22-1.2174.nptlsmp (wintermute.phil.org) 06/07/04
21:32:26 totsck tcpsck udpsck rawsck ip-frag
21:32:27 373 118 8 0 0
21:32:28 373 118 8 0 0
Average: 373 118 8 0 0

We can see the total number of open sockets and the TCP, RAW, and UDP sockets. sar also displays the number of fragmented IP packets.

PS: