Author Archive

general networking tips

April 18th, 2014 No comments

How Autonegotiation Works

First, let’s cover what autonegotiation does not do: when autonegotiation is enabled on a port, it does not automatically determine the configuration of the port on the other side of the Ethernet cable and then match it. This is a common misconception that often leads to problems.

Autonegotiation is a protocol and, as with any protocol, it only works if it’s running on both sides of the link. In other words, if one side of a link is running autonegotiation and the other side of the link is not, autonegotiation cannot determine the speed and duplex configuration of the other side. If autonegotiation is running on the other side of the link, the two devices decide together on the best speed and duplex mode. Each interface advertises the speeds and duplex modes at which it can operate, and the best match is selected (higher speeds and full duplex are preferred).

The confusion exists primarily because autonegotiation always seems to work. This is because of a feature called parallel detection, which kicks in when the autonegotiation process fails to find autonegotiation running on the other end of the link. Parallel detection works by sending the signal being received to the local 10Base-T, 100Base-TX, and 100Base-T4 drivers. If any one of these drivers detects the signal, the interface is set to that speed.

Parallel detection determines only the link speed, not the supported duplex modes. This is an important consideration because the common modes of Ethernet have differing levels of duplex support:

10Base-T was originally designed without full-duplex support. Some implementations of 10Base-T support full duplex, but many do not.

100Base-T has long supported full duplex, which has been the preferred method for connecting 100 Mbps links for as long as the technology has existed. However, the default behavior of 100Base-T is usually half duplex, and full-duplex support must be set manually.

Gigabit Ethernet has a much more robust autonegotiation protocol than 10M or 100M Ethernet. Gigabit interfaces should be left to autonegotiate in most situations.

10 Gigabit
10 Gigabit (10G) connections are generally dependent on fiber transceivers or special copper connections that differ from the RJ-45 connections seen on other Ethernet types. The hardware usually dictates how 10G connects. On a 6500, 10G interfaces usually require XENPAKs, which only run at 10G. On a Nexus 5000 switch, some of the ports are 1G/10G and can be changed with the speed command.

Because of the lack of widespread full-duplex support on 10Base-T and the typical default behavior of 100Base-T, when autonegotiation falls through to the parallel detection phase (which only detects speed), the safest thing for the driver to do is to choose half-duplex mode for the link.

As networks and networking hardware evolve, higher-speed links with more robust negotiation protocols will likely make negotiation problems a thing of the past. That being said, I still see 20-year-old routers in service, so knowing how autonegotiation works will be a valuable skill for years to come.

 When Autonegotiation Fails

In a half-duplex environment, the receiving (RX) line is monitored. If a frame is present on the RX link, no frames are sent until the RX line is clear. If a frame is received on the RX line while a frame is being sent on the transmitting (TX) line, a collision occurs. Collisions cause the collision error counter to be incremented—and the sending frame to be retransmitted—after a random back-off delay. This may seem counterintuitive in a modern switched environment, but remember that Ethernet was originally designed to work over a single wire. Switches and twisted pair came along later.


In full-duplex operation, the RX line is not monitored, and the TX line is always considered available. Collisions do not occur in full-duplex mode because the RX and TX lines are completely independent.

When one side of the link is full duplex and the other side is half duplex, a large number of collisions will occur on the half-duplex side. The issue may not be obvious, because a half-duplex interface normally shows collisions, while a full-duplex interface does not. Since full duplex means never having to test for a clear-to-send condition, a full-duplex interface will not record any errors in this situation. The problem should present itself as excessive collisions, but only on the half-duplex side.

Gigabit Ethernet uses a substantially more robust autonegotiation mechanism than the one described in this chapter. Gigabit Ethernet should thus always be set to autonegotiation, unless there is a compelling reason not to do so (such as an interface that will not properly negotiate). Even then, this should be considered a temporary workaround until the misbehaving part can be replaced.


When expanding a network using VLANs, you face the same limitations. If you connect another switch to a port that is configured for VLAN 20, the new switch will be able to forward frames only to or from VLAN 20. If you wanted to connect two switches, each containing four VLANs, you would need four links between the switches: one for each VLAN. A solution to this problem is to deploy trunks between switches. Trunks are links that carry frames for more than one VLAN.


Another way to route between VLANs is commonly known as the router-on-a-stick configuration. Instead of running a link from each VLAN to a router interface, you can run a single trunk from the switch to the router. All the VLANs will then pass over a single link.

Deploying a router on a stick saves a lot of interfaces on both the switch and the router. The downside is that the trunk is only one link, and the total bandwidth available on that link is only 10 Mbps. In contrast, when each VLAN has its own link, each VLAN has 10 Mbps to itself. Also, don’t forget that the router is passing traffic between VLANs, so chances are each frame will be seen twice on the same link—once to get to the router, and once to get back to the destination VLAN.

Jack is connected to VLAN 20 on Switch B, and Diane is connected to VLAN 20 on Switch A. Because there is a trunk connecting these two switches together, assuming the trunk is allowed to carry traffic for all configured VLANs, Jack will be able to communicate with Diane. Notice that the ports to which the trunk is connected are not assigned VLANs. These ports are trunk ports and, as such, do not belong to a single VLAN.



 Possible switch port modes related to trunking



 VTP #VLAN Trunking Protocol


VTP allows VLAN configurations to be managed on a single switch. Those changes are then propagated to every switch in the VTP domain. A VTP domain is a group of connected switches with the same VTP domain string configured. Interconnected switches with differently configured VTP domains will not share VLAN information. A switch can only be in one VTP domain; the VTP domain is null by default. Switches with mismatched VTP domains will not negotiate trunk protocols. If you wish to establish a trunk between switches with mismatched VTP domains, you must have their trunk ports set to mode trunk.


The main idea of VTP is that changes are made on VTP servers. These changes are then propagated to VTP clients, and any other VTP servers in the domain. Switches can be configured manually as VTP servers, VTP clients, or the third possibility, VTP transparent. A VTP transparent switch receives and forwards VTP updates but does not update its configuration to reflect the changes they contain. Some switches default to VTP server, while others default to VTP transparent. VLANs cannot be locally configured on a switch in client mode.

There is actually a fourth state for a VTP switch: off. A switch in VTP mode off will not accept VTP packets, and therefore will not forward them either. This can be handy if you want to stop the forwarding of VTP updates at some point in the network.


SW1 and SW2 are both VTP servers. SW3 is set to VTP transparent, and SW4 is a VTP client. Any changes to the VLAN information on SW1 will be propagated to SW2 and SW4. The changes will be passed through SW3 but will not be acted upon by that switch. Because the switch does not act on VTP updates, its VLANs must be configured manually if users on that switch are to interact with the rest of the network.

When a switch receives a VTP update, the first thing it does is compare the VTP domain name in the update to its own. If the domains are different, the update is ignored. If they are the same, the switch compares the update’s configuration revision number to its own. If the revision number of the update is lower than or equal to the switch’s own revision number, the update is ignored. If the update has a higher revision number, the switch sends an advertisement request. The response to this request is another summary advertisement, followed by subset advertisements. Once it has received the subset advertisements, the switch has all the information necessary to implement the required changes in the VLAN configuration.

When a switch’s VTP domain is null, if it receives a VTP advertisement over a trunk link, it will inherit the VTP domain and VLAN configuration from the switch on the other end of the trunk. This will happen only over manually configured trunks, as DTP negotiations cannot take place unless a VTP domain is configured. Be careful of this behavior, as it can cause serious heartache, nausea, and potential job loss if you’re not (or the person before you wasn’t).

VTP Pruning
On large or congested networks, VTP can create a problem when excess traffic is sent across trunks needlessly. The switches in the gray box all have ports assigned to VLAN 100, while the rest of the switches do not. With VTP active, all of the switches will have VLAN 100 configured, and as such will receive broadcasts initiated on that VLAN. However, those without ports assigned to VLAN 100 have no use for the broadcasts.

On a busy VLAN, broadcasts can amount to a significant percentage of traffic. In this case, all that traffic is being needlessly sent over the entire network, and is taking up valuable bandwidth on the interswitch trunks.
VTP pruning prevents traffic originating from a particular VLAN from being sent to switches on which that VLAN is not active (i.e., switches that do not have ports connected and configured for that VLAN). With VTP pruning enabled, the VLAN 100 broadcasts will be restricted to switches on which VLAN 100 is actively in use.

VTP pruning must be enabled or disabled throughout the entire VTP domain. Failure to configure VTP pruning properly can result in instability in the network. By default, all VLANs up to VLAN 1001 are eligible for pruning, except VLAN 1, which can never be pruned. VTP does not support the extended VLANs above VLAN 1001, so VLANs higher than 1001 cannot be pruned. If you enable VTP pruning on a VTP server, VTP pruning will automatically be enabled for the entire domain.

Dangers of VTP
Remember that many switches are VTP servers by default. Remember, also, that when a switch participating in VTP receives an update that has a higher revision number than its own configuration’s revision number, the switch will implement the new scheme. In our scenario, the lab’s 3750s had been functioning as a standalone network with the same VTP domain as the regular network. Multiple changes were made to their VLAN configurations, resulting in a high configuration revision number. When these switches, which were VTP servers, were connected to the more stable production network, they automatically sent out updates. Each switch on the main network, including the core 6509s, received an update with a higher revision number than its current configuration. Consequently, they all requested the VLAN configuration from the rogue 3750s and implemented that design.

 Link Aggregation

EtherChannel is the Cisco term for the technology that enables the bonding of up to eight physical Ethernet links into a single logical link. The non-Cisco term used for link aggregation is generally Link Aggregation, or LAG for short.


The default behavior is to assign one of the physical links to each packet that traverses the EtherChannel, based on the packet’s destination MAC address. This means that if one workstation talks to one server over an EtherChannel, only one of the physical links will be used. In fact, all of the traffic destined for that server will traverse a single physical link in the EtherChannel. This means that a single user will only ever get 1 Gbps from the EtherChannel at a time. This behavior can be changed to send each packet over a different physical link, but as you’ll see, there are limits to how well this works for applications like VoIP. The benefit arises when there are multiple destinations, which can each use a different path.

You can change the method the switch uses to determine which path to assign. The default behavior is to use the destination MAC address. However, depending on the version of the software and hardware in use, the options may include:

The source MAC address
The destination MAC address
The source and destination MAC addresses
The source IP address
The destination IP address
The source and destination IP addresses
The source port
The destination port
The source and destination ports

There is another terminology problem that can create many headaches for network administrators. While a group of physical Ethernet links bonded together is called an EtherChannel in Cisco parlance, Unix admins sometimes refer to the same configuration as a trunk. Of course, in the Cisco world the term “trunk” refers to something completely different: a link that labels frames with VLAN information so that multiple VLANs can traverse it. Some modern Unixes sometimes create a bond interface when performing link aggregation, and Windows admins often use the term teaming when combining links.
EtherChannel protocols


EtherChannel can negotiate with the device on the other side of the link. Two protocols are supported on Cisco devices. The first is the Link Aggregation Control Protocol (LACP), which is defined in IEEE specification 802.3ad. LACP is used when you’re connecting to non-Cisco devices, such as servers. The other protocol used in negotiating EtherChannel links is the Port Aggregation Control Protocol (PAgP). Since PAgP is Cisco-proprietary, it is used only when you’re connecting two Cisco devices via an EtherChannel. Each protocol supports two modes: a passive mode (auto in PAgP and passive in LACP), and an active mode (desirable in PAgP and active in LACP). Alternatively, you can set the mode to on, thus forcing the creation of the EtherChannel.

 Spanning Tree

The Spanning Tree Protocol (STP) is used to ensure that no Layer-2 loops exist in a LAN. Spanning tree is designed to prevent loops among bridges. A bridge is a device that connects multiple segments within a single collision domain. Switches are considered bridges—hubs are not.


When a switch receives a broadcast, it repeats the broadcast on every port (except the one on which it was received). In a looped environment, the broadcasts are repeated forever. The result is called a broadcast storm, and it will quickly bring a network to a halt. Spanning tree is an automated mechanism used to discover and break loops of this kind.

A useful tool when you’re troubleshooting a broadcast storm is the show processes cpu history command.
Here is the output from the show process cpu history command on switch B, which shows 0–3 percent CPU utilization over the course of the last minute:


The numbers on the left side of the graph are the CPU utilization percentages. The numbers on the bottom are seconds in the past (0 = the time of command execution). The numbers on the top of the graph show the integer values of CPU utilization for that time period on the graph. For example, according to the preceding graph, CPU utilization was normally 0 percent, but increased to 1 percent 5 seconds ago and to 3 percent 20 seconds ago. When the values exceed 10 percent, you’ll see visual peaks in the graph itself.

3550-IOS#sho mac-address-table | include 0030.1904.da60 #Another problem caused by a looped environment is MAC address tables (CAM tables in CatOS) being constantly updated.

Spanning tree elects a root bridge (switch) in the network. The root bridge is the bridge that all other bridges need to reach via the shortest path possible. Spanning tree calculates the cost for each path from each bridge in the network to the root bridge. The path with the lowest cost is kept intact, while all others are broken. Spanning tree breaks paths by putting ports into a blocking state.

Routing and Routers

In a Cisco router, the routing table is called the route information base (RIB).

Administrative Distance


Administrative distance is a value assigned to every routing protocol. In the event of two protocols reporting the same route, the routing protocol with the lowest administrative distance will win, and its version of the route will be inserted into the RIB.


You can see that RIP has an administrative distance of 120, while OSPF has an administrative distance of 110. This means that even though the RIP route has a better metric in Figure 10-7, the route inserted into the routing table will be the one provided by OSPF.


A tunnel is a means whereby a local device can communicate with a remote device as if the remote device were local as well. There are many types of tunnels. Virtual Private Networks (VPNs) are tunnels. Generic Routing Encapsulation (GRE) creates tunnels. Secure Shell (SSH) is also a form of tunnel, though different from the other two.

Tunnels can encrypt data so that only the other side can see it, as with SSH; or they can make a remote network appear local, as with GRE; or they can do both, as is the case with VPN.

GRE tunnels allow remote networks to appear to be locally connected. GRE offers no encryption, but it does forward broadcasts and multicasts. If you want a routing protocol to establish a neighbor adjacency or exchange routes through a tunnel, you’ll probably need to configure GRE. GRE tunnels are often built within VPN tunnels to take advantage of encryption. GRE is described in RFCs 1701 and 2784.

VPN tunnels also allow remote networks to appear as if they were locally connected. VPN encrypts all information before sending it across the network, but it will not usually forward multicasts and broadcasts. Consequently, GRE tunnels are often built within VPNs to allow routing protocols to function. VPNs are often used for remote access to secure networks.
There are two main types of VPNs: point-to-point and remote access. Point-to-point VPNs offer connectivity between two remote routers, creating a virtual link between them. Remote access VPNs are single-user tunnels between a user and a router, firewall, or VPN concentrator (a specialized VPN-only device).
Remote access VPNs usually require VPN client software to be installed on a personal computer. The client communicates with the VPN device to establish a personal virtual link.

SSH is a client/server application that allows secure connectivity to servers. In practice, it is usually used just like Telnet. The advantage of SSH over Telnet is that it encrypts all data before sending it. While not originally designed to be a tunnel in the sense that VPN or GRE would be considered a tunnel, SSH can be used to access remote devices in addition to the one to which you have connected. While this does not have a direct application on Cisco routers, the concept is similar to that of VPN and GRE tunnels, and thus worth mentioning. I use SSH to access my home network instead of a VPN.

Multilayer Switches

Switches, in the traditional sense, operate at Layer 2 of the OSI stack. The first multilayer switches were called Layer-3 switches because they added the capability to route between VLANs. These days, switches can do just about anything a router can do, including protocol testing and manipulation all the way up to Layer 7. Thus, we now refer to switches that operate above Layer 2 as multilayer switches.

The core benefit of the multilayer switch is the capability to route between VLANs, which is made possible through the addition of virtual interfaces within the switch. These virtual interfaces are tied to VLANs, and are called switched virtual interfaces (SVIs).

Another option for some switches is to change a switch port into a router port—that is, to make a port directly addressable with a Layer-3 protocol such as IP. To do this, you must have a switch that is natively IOS or running in IOS native mode. Nexus 7000 switch ports default to router mode.





Q: What’s the difference between bandwidth and speed?

A: Bandwidth is a capacity; speed is a rate. Bandwidth tells you the maximum amount of data that your network can transmit. Speed tells you the rate at which the data can travel. The bandwidth for a CAT-5 cable is 10/100 Base-T. The speed of a CAT-5 cable changes depending on conditions.

Q: What is Base-T?
A: Base-T refers to the different standards for Ethernet transmission rates. The 10 Base-T standard transfers data at 10 megabits per second (Mbps). The 100 Base-T standard transfers data at 100 Mbps. The 1000 Base-T standard transfers data at a massive 1000 Mbps.

Q: What is a crossover cable used for?
A: Suppose you want to connect a laptop to a desktop computer. One way of doing this is to use a switch or a hub to connect the two devices, and another way of doing this would be to use a crossover cable, a cable that can send and receive data on both ends at the same time. A crossover cable is different from a straight-through cable in that a straight-through cable can only send or receive data on one end at a time.

Q: Aren’t packets and frames really the same thing?
A: No. We call data transmitting over Ethernet frames. Inside those frames, in the data field, are packets. Generally frames have to due with the transmission protocol, i.e., Ethernet, ATM, Token Ring, etc. But, as you read more about networking, you will see that there is some confusion on this.

Q: A guy in my office calls packets datagrams. Are they the same?
A: Not really. Packets refer to any data sent in as packets. Whereas datagrams are used to refer to data sent in packets by an unreliable protocol such as UDP or ICMP.

Q: What’s the difference between megabits per second (Mbps) and megabytes per second (MBps)?
A: Megabits per second (Mbps) is a bandwidth rate used in the telecommunications and computer networking field. One megabit equals one million bursts of electrical current (aka binary pulses). Megabytes per second (MBps) is a data transfer rate used in computing. One megabyte equals 1, 048, 576 bytes, and one byte equals 8 binary digits (aka bits).
The order of the wires in an RJ-45 connector conforms to one of two standards. These standards are 568A and 568B.






We can convert to ASCII using hex

Once you learn to use hexadecimal, you realize just how cool it is. Hex and binary make great partners, which simplifies conversions between binary and ASCII. Hex is like a bridge between the weird world of binary and our world (the human, readable world).
Here’s what we do:

  • Break the byte in half.

Each half-byte is called a nibble. [Note from Editor: you’re kidding, right?]


  • Convert each half into its hexadecimal equivalent.

Because the binary number is broken into halves, the highest number you can get is 15 (which is “F” in hex).

  • Concatenate the two numbers.

Concatenate is a programmer’s word that simply means “put them beside each other from left to right.”

  • Look the number up in an ASCII table.(you can man ascii to see the full ascii hex character table)



Hubs – Switches – Routers


A hub receives incoming signals and sends them out on all the other ports. When several devices start sending signals, the hub’s incessant repetition creates heavy traffic and collisions. A collision happens when two signals run into one another, creating an error. The sending network device has to back off and wait to send the signal again.

A hub contains no processors, and this means that a hub has no real understanding of network data. It doesn’t understand MAC addresses or frames. It sees an incoming networking signal as a purely electrical signal, and passes it on.

A hub is really just an electrical repeater. It takes whatever signal comes in, and sends it out on all the other ports.


  • The source workstation sends a frame.

A frame carries the payload of data and keeps track of the time sent, as well as the MAC address of the source and the MAC address of the target.

  • The switch updates its MAC address table with the MAC address and the port it’s on.

Switches maintain MAC address tables. As frames come in, the switch’s knowledge of the traffic gets more descript. The switch matches ports with MAC addresses.

  • The switch forwards the frame to its target MAC address using information from its table.

It does this by sending the frame out the port where that MAC address is located as the MAC address table indicates.

Switches avoid collisions by storing and forwarding frames on the intranet. Switches are able to do this by using the MAC address of the frame. Instead of repeating the signal on all ports, it sends it on to the device that needs it.

A switch reads the signal as a frame and uses the frame’s information to send it where it’s supposed to go.



How the router moves data across networks

  • The sending device sends an ARP request for the MAC address of its default gateway.


  • The router responds with its MAC address.


  • The sending device sends its traffic to the router.


  • The router sends an ARP request for the device with the correct IP address on a different IP network.


  • The receiving device responds with its MAC address.


The router changes the MAC address in the frame and sends the data to the receiving device.


  • The source workstation sends a frame to the router.

It sends it to the router since the workstation the traffic is meant for is behind the router.

  • The router changes the source MAC address to its MAC address and changes the destination MAC address to the workstation the traffic is meant for.

If network traffic comes from a router, we can only see the router’s MAC address. All the workstations behind that router make up what we call an IP subnet. All a switch needs to look at to get frames to their destination is the MAC address. A router looks at the IP address from the incoming packet and forwards it if it is intended for a workstation located on the other network. Routers have far less network ports because they tend to connect to other routers or to switches. Computers are generally not connected directly to a router.

The switch decides where to send traffic based on the MAC address, whereas the router based on the IP address.

Q: But I have a DSL router at home, and my computer is directly connected to it. What is that all about?
A: Good observation. There are switches that have routing capability and routers that have switched ports. There is not a real clear line between the two devices. It is more about their primary function. Now, in large networks, there are switching routers. These have software that allow them to work as routers on switched ports. They are great to use and make building large sophisticated networks straightforward, but they are very expensive.

Q: So the difference betweeen my home DSL router and an enterprise switching router is the software?
A: The big difference is the hardware horsepower. Your home DSL router probably uses a small embedded processor or microcontroller which does all the processing. Switching routers and heavy duty routers have specialized processors with individual processors on each port. The name of the game is the speed at which is can move packets. Your home DSL router probably has a throughput of about 20 Mbps (Megabits per second), whereas a high end switching router can have a throughput of hundreds of Gbps (Gigabits per second) or more.

Hook Wireshark up to the switch

  • Connect your computer to the switch with a serial cable.

You will use this to communicate with the switch.

  • Open a terminal program such as Hyperterminal and get to the command prompt of the switch. Type in the commands below.


  • Hook up your computer to port 1 on the switch with an Ethernet cable.

You will use this to capture network traffic.

  • Startup Wireshark and capture some network traffic.






SELinux security context and its elements – user, role, type identifiers

April 17th, 2014 No comments

All operating system access control is based on some type of access control attribute associated with objects and subjects. In SELinux, the access control attribute is called a security context. All objects (files, interprocess communication channels, sockets, network hosts, and so on) and subjects (processes) have a single security context associated with them. A security context has three elements: user, role, and type identifiers. The usual format for specifying or displaying a security context is as follows:


A valid security context must have one valid user, role, and type identifier, and that the identifiers are defined by the policy writer, and the string identifiers for each element are defined in the SELinux policy language.

Here’s the relationship between unix/linux users -> SELinux identifiers -> roles -> domain:


And here’s one example of SELinux transitions:


And here’s the code you can set to apache httpd server when SELinux runs in enforcing mode:

chcon -t httpd_sys_content_t /var/www/html
chcon -t httpd_sys_content_t /var/www/html -R
ls -Z /var/www/html

Some contents of this article is from book <SELinux by Example: Using Security Enhanced Linux>.

Categories: Linux, Security Tags:

resolved – show kitchen sink buttons when wordpress goes to fullscreen mode

April 11th, 2014 No comments

When you click the full-screen button of wordpress TinyMCE, wordpress will go to “Distraction-Free Writing mode”, which benefits as the name suggests. However, you’ll also find the toolbox of TinyMCE will only show a limited number of buttons and the second line of the toolbox(kitchen sink) will not show at all(I tried install plugin such as ultimate TinyMCE or advanced TinyMCE, but the issue remained):

full-screenPreviously, you can type ALT+SHIFT+G to go to another type of fullscreen mode, which has all buttons include kitchen sink ones. However, seems now the updated version of wordpress has disabled this feature.

To resolve this issue, we can insert the following code in functions.php of your theme:

function my_mce_fullscreen($buttons) {
$buttons[] = 'fullscreen';
return $buttons;
add_filter('mce_buttons', 'my_mce_fullscreen');

Later, the TinyMCE will have two full-screen button:

full-screen buttonsMake sure to click the SECOND full-screen button. When you do so, the editor will transform to the following appearance:

full-screen with kitchen sinkI assume this is what you’re trying for, right?



Categories: Life Tags:

add horizontal line button in wordpress

April 11th, 2014 No comments

There’re three methods for you to add a horizontal line button in wordpress:

Firstly, switch to “Text” mode, and enters <hr />.

Secondly, add the following in functions.php of your wordpress theme:

function enable_more_buttons($buttons) {
$buttons[] = ‘hr’;
return $buttons;
add_filter(“mce_buttons”, “enable_more_buttons”);

horizontal line

Thirdly, you can install plugin “Ultimate TinyMCE”, and in its setting, you can enable horizontal line button there in one click! This is my recommendation.

ultimate tinymce

Categories: Life Tags: ,

linux tips

April 10th, 2014 No comments
Linux Performance & Troubleshooting
For Linux Performance & Troubeshooting, please refer to another post - Linux tips – Performance and Troubleshooting
Linux system tips
ls -lu(access time, like cat file) -lt(modification time, like vi, ls -l defaults to use this) -lc(change time, chmod), stat ./aa.txt <UTC>
ctrl +z #bg and stopped
%1 & #bg and running
%1 #fg
pgrep -flu oracle  # processes owned by the user oracle
watch free -m #refresh every 2 seconds
pmap -x 30420 #memory mapping.
openssl s_client -connect localhost:636 -showcerts #verify ssl certificates, or 443
openssl x509 -in cacert.pem -noout -text
openssl x509 -in cacert.pem -noout -dates
openssl x509 -in cacert.pem -noout -purpose
openssl req -in robots.req.pem -text -verify -noout
blockdev –getbsz /dev/xvda1 #get blocksize of FS
dumpe2fs /dev/xvda1 |grep ‘Block size’
ovm svr ls|sort -rn -k 4 #sort by column 4
cat a1|sort|uniq -c |sort #SUS
ovm svr ls|uniq -f3 #skip the first three columns, this will list only 1 server per pool
for i in <all OVMs>;do ( $i &);done #instead of using nohup &
ovm vm ls|egrep “`echo testhost{0\|,1\|,2\|,3\|,4}|tr -d ‘[:space:]‘`”
cat a|awk ‘{print $5}’|tr ‘\n’ ‘ ‘
getopt #getopts is builtin, more on
date -d ’1970-1-1 1276059000 sec utc’
date -d ’2010-09-11 23:20′ +%s
find . -name ‘*txt’|xargs tar cvvf a.tar
find . -maxdepth 1
for i in `find /usr/sbin/ -type f ! -perm -u+x`;do chmod +x $i;done #files that has no execute permisson for owner
find ./* -prune -print #-prune,do not cascade
find . -fprint file #put result to file
tar tvf a.tar  –wildcards “*ipp*” #globbing patterns
tar xvf bfiles.tar –wildcards –no-anchored ‘b*’
tar –show-defaults
tar cvf a.tar –totals *.txt #show speed
tar –append –file=collection.tar rock #add rock to collection.tar
tar –update -v -f collection.tar blues folk rock classical #only append new or updated ones, not replace
tar –delete –file=collection.tar blues #not on tapes
tar -c -f archive.tar –mode=’a+rw’
tar -C sourcedir -cf – . | tar -C targetdir -xf – #copy directories
tar -c -f jams.tar grape prune -C food cherry #-C,change dir, foot file cherry under foot directory
find . -size -400 -print > small-files
tar -c -v -z -T small-files -f little.tgz
tar -cf src.tar –exclude=’*.o’ src #multiple –exclude can be specified
expr 5 – 1
rpm2cpio ./ash-1.0.1-1.x86_64.rpm |cpio -ivd
eval $cmd
exec menu.viewcards #same to .
ls . | xargs -0 -i cp ./{} /etc #-i,use \n as separator, just like find -exec. -0 for space in filename. find -print0 use space to separate, not enter.(-i or -I {} for revoking filenames in the middle)
ls | xargs -t -i mv {} {}.old #mv source should exclude /,or unexpected errors may occur
mv –strip-trailing-slashes source destination
ls |xargs file /dev/fd/0 #replace -
ls -l -I “*out*” #not include out
find . -type d |xargs -i du -sh {} |awk ‘$1 ~ /G/’
find . -type f -name “*20120606″ -exec rm {} \; #do not need rm -rf. find . -type f -exec bash -c “ls -l ‘{}’” \;
ps -ef|grep init|sed -n ’1p’
cut -d ‘ ‘ -f1,3 /etc/mtab #first and third
seq 15 21 #print 15 to 21, or echo {15..21}
seq -s” ” 15 21 #use space as separator
Categories: Linux, tips Tags:

Linux tips – Performance and Troubleshooting

April 10th, 2014 No comments

System CPU

procinfo #yum install procinfo
gnome-system-monitor #can also see network flow rate

System Memory

/proc/meminfo #provides the most complete view of system memory usage
gnome-system-monitor #can also see network flow rate

Process-specific CPU

strace #traces the system calls that a program makes while executing
ltrace #traces the calls(functions) that an application makes to libraries rather than to the kernel. Then use ldd to display which libraries are used, and use objdump to search each of those libraries for the given function.
ps #ld

Process-specific Memory

/proc/<pid> #you can refer to for more info.

/proc/<PID>/status #provides information about the status of a given process PID
/proc/<PID>/maps #how the process’s virtual address space is used

ipcs #more info on and

Disk I/O

vmstat #provides totals rather than the rate of change during the sample
time sh -c “dd if=/dev/zero of=System2.img bs=1M count=10240 && sync” #10G
time dd if=ddfile of=/dev/null bs=8k
dd if=/dev/zero of=vm1disk bs=1M seek=10240 count=0 #10G


gnome-system-monitor #can also see network flow rate
sar #network statistics
/etc/cron.d/sysstat #/var/log/sa/

General Ideas & options & outputs

Run Queue Statistics
In Linux, a process can be either runnable or blocked waiting for an event to complete.

A blocked process may be waiting for data from an I/O device or the results of a system call.

When these processes are runnable, but waiting to use the processor, they form a line called the run queue.
The load on a system is the total amount of running and runnable process.

Context Switches
To create the illusion that a given single processor runs multiple tasks simultaneously, the Linux kernel constantly switches between different processes.
The switch between different processes is called a context switch.
To guarantee that each process receives a fair share of processor time, the kernel periodically interrupts the running process and, if appropriate, the kernel scheduler decides to start another process rather than let the current process continue executing. It is possible that your system will context switch every time this periodic interrupt or timer occurs. (cat /proc/interrupts | grep timer, and do this again after e.g. 10s interval)

In addition, periodically, the processor receives an interrupt by hardware devices.
/proc/interrupts can be examined to show which interrupts are firing on which CPUs

CPU Utilization
At any given time, the CPU can be doing one of seven things:
Running user code #user time
System time #executing code in the Linux kernel on behalf of the application code
Executing user code that has been “nice”ed or set to run at a lower priority than normal processes
iowait #waiting for I/O (such as disk or network) to complete
irq #means it is in high-priority kernel code handling a hardware interrupt
softirq #executing kernel code that was also triggered by an interrupt, but it is running at a lower priority

Buffers and cache
Alternatively, if your system has much more physical memory than required by your applications, Linux will cache recently used files in physical memory so that subsequent accesses to that file do not require an access to the hard drive. This can greatly speed up applications that access the hard drive frequently, which, obviously, can prove especially useful for frequently launched applications. The first time the application is launched, it needs to be read from the disk; if the application remains in the cache, however, it needs to be read from the much quicker physical memory. This disk cache differs from the processor cache mentioned in the previous chapter. Other than oprofile, valgrind, and kcachegrind, most tools that report statistics about “cache” are actually referring to disk cache.

In addition to cache, Linux also uses extra memory as buffers. To further optimize applications, Linux sets aside memory to use for data that needs to be written to disk. These set-asides are called buffers. If an application has to write something to the disk, which would usually take a long time, Linux lets the application continue immediately but saves the file data into a memory buffer. At some point in the future, the buffer is flushed to disk, but the application can continue immediately.
Active Versus Inactive Memory
Active memory is currently being used by a process. Inactive memory is memory that is allocated but has not been used for a while. Nothing is essentially different between the two types of memory. When required, the Linux kernel takes a process’s least recently used memory pages and moves them from the active to the inactive list. When choosing which memory will be swapped to disk, the kernel chooses from the inactive memory list.
Kernel Usage of Memory (Slabs)
In addition to the memory that applications allocate, the Linux kernel consumes a certain amount for bookkeeping purposes. This bookkeeping includes, for example, keeping track of data arriving from network and disk I/O devices, as well as keeping track of which processes are running and which are sleeping. To manage this bookkeeping, the kernel has a series of caches that contains one or more slabs of memory. Each slab consists of a set of one or more objects. The amount of slab memory consumed by the kernel depends on which parts of the Linux kernel are being used, and can change as the type of load on the machine changes.


slabtop shows in real-time how the kernel is allocating its various caches and how full they are. Internally, the kernel has a series of caches that are made up of one or more slabs. Each slab consists of a set of one or more objects. These objects can be active (or used) or inactive (unused). slabtop shows you the status of the different slabs. It shows you how full they are and how much memory they are using.


time measures three types of time. First, it measures the real or elapsed time, which is the amount of time between when the program started and finished execution. Next, it measures the user time, which is the amount of time that the CPU spent executing application code on behalf of the program. Finally, time measures system time, which is the amount of time the CPU spent executing system or kernel code on behalf of the application.

Disk I/O

When an application does a read or write, the Linux kernel may have a copy of the file stored into its cache or buffers and returns the requested information without ever accessing the disk. If the Linux kernel does not have a copy of the data stored in memory, however, it adds a request to the disk’s I/O queue. If the Linux kernel notices that multiple requests are asking for contiguous locations on the disk, it merges them into a single big request. This merging increases overall disk performance by eliminating the seek time for the second request. When the request has been placed in the disk queue, if the disk is not currently busy, it starts to service the I/O request. If the disk is busy, the request waits in the queue until the drive is available, and then it is serviced.


iostat provides a per-device and per-partition breakdown of how many blocks are written to and from a particular disk. (Blocks in iostat are usually sized at 512 bytes.)

lsof can prove helpful when narrowing down which applications are generating I/O

 top output

S(or STAT) – This is the current status of a process, where the process is either sleeping (S), running (R), zombied (killed but not yet dead) (Z), in an uninterruptable sleep (D), or being traced (T).

TIME – The total amount CPU time (user and system) that this process has used since it started executing.

top options

-b Run in batch mode. Typically, top shows only a single screenful of information, and processes that don’t fit on the screen never display. This option shows all the processes and can be very useful if you are saving top’s output to a file or piping the output to another command for processing.

I This toggles whether top will divide the CPU usage by the number of CPUs on the system. For example, if a process was consuming all of both CPUs on a two-CPU system, this toggles whether top displays a CPU usage of 100% or 200%.

1 (numeral 1) This toggles whether the CPU usage will be broken down to the individual usage or shown as a total.

mpstat options

-P { cpu | ALL } This option tells mpstat which CPUs to monitor. cpu is the number between 0 and the total CPUs minus 1.

The biggest benefit of mpstat is that it shows the time next to the statistics, so you can look for a correlation between CPU usage and time of day.

mpstat can be used to determine whether the CPUs are fully utilized and relatively balanced. By observing the number of interrupts each CPU is handling, it is possible to find an imbalance.

 sar options

-I {irq | SUM | ALL | XALL} This reports the rates that interrupts have been occurring in the system.
-P {cpu | ALL} This option specifies which CPU the statistics should be gathered from. If this isn’t specified, the system totals are reported.
-q This reports information about the run queues and load averages of the machine.
-u This reports information about CPU utilization of the system. (This is the default output.)
-w This reports the number of context switches that occurred in the system.
-o filename This specifies the name of the binary output file that will store the performance statistics.
-f filename This specifies the filename of the performance statistics.

-B – This reports information about the number of blocks that the kernel swapped to and from disk. In addition, for kernel versions after v2.5, it reports information about the number of page faults.
-W – This reports the number of pages of swap that are brought in and out of the system.
-r – This reports information about the memory being used in the system. It includes information about the total free memory, swap, cache, and buffers being used.
-R Report memory statistics

-d –  reports disk activities

-n DEV – Shows statistics about the number of packets and bytes sent and received by each device.
-n EDEV – Shows information about the transmit and receive errors for each device.
-n SOCK – Shows information about the total number of sockets (TCP, UDP, and RAW) in use.
-n ALL – Shows all the network statistics.

sar output

runq-sz This is the size of the run queue when the sample was taken.
plist-sz This is the number of processes present (running, sleeping, or waiting for I/O) when the sample was taken.
proc/s This is the number of new processes created per second. (This is the same as the forks statistic from vmstat.)

tps – Transfers per second. This is the number of reads and writes to the drive/partition per second.
rd_sec/s – Number of disk sectors read per second.
wr_sec/s – Number of disk sectors written per second.

vmstat options

-n print header info only once

-a This changes the default output of memory statistics to indicate the active/inactive amount of memory rather than information about buffer and cache usage.
-s (procps 3.2 or greater) This prints out the vm table. This is a grab bag of different statistics about the system since it has booted. It cannot be run in sample mode. It contains both memory and CPU statistics.

-d – This option displays individual disk statistics at a rate of one sample per interval. The statistics are the totals since system boot, rather than just those that occurred between this sample and the previous sample.
-p partition – This displays performance statistics about the given partition at a rate of one sample per interval. The statistics are the totals since system boot, rather than just those that occurred between this sample and the previous sample.

vmstat output
si – The rate of memory (in KB/s) that has been swapped in from disk during the last sample.
so – The rate of memory (in KB/s) that has been swapped out to disk during the last sample.
pages paged in – The amount of memory (in pages) read from the disk(s) into the system buffers. (On most IA32 systems, a page is 4KB.)
pages paged out – The amount of memory (in pages) written to the disk(s) from the system cache. (On most IA32 systems, a page is 4KB.)
pages swapped in – The amount of memory (in pages) read from swap into system memory.
pages swapped in/out – The amount of memory (in pages) written from system memory to the swap.

bo – This indicates the number of total blocks written to disk in the previous interval. (In vmstat, block size for a disk is typically 1,024 bytes.)
bi – This shows the number of blocks read from the disk in the previous interval. (In vmstat, block size for a disk is typically 1,024 bytes.)
wa – This indicates the amount of CPU time spent waiting for I/O to complete. The rate of disk blocks written per second.
reads: ms – The amount of time (in ms) spent reading from the disk.
writes: ms – The amount of time (in ms) spent writing to the disk.
IO: cur – The total number of I/O that are currently in progress. Note that there is a bug in recent versions of vmstat in which this is incorrectly divided by 1,000, which almost always yields a 0.
IO: s – This is the number of seconds spent waiting for I/O to complete.

iostat options
-d – This displays only information about disk I/O rather than the default display, which includes information about CPU usage as well.
-k – This shows statistics in kilobytes rather than blocks.
-x – This shows extended-performance I/O statistics.
device – If a device is specified, iostat shows only information about that device.

iostat output
tps – Transfers per second. This is the number of reads and writes to the drive/partition per second.
Blk_read/s – The rate of disk blocks read per second.
Blk_wrtn/s – The rate of disk blocks written per second.
Blk_read – The total number of blocks read during the interval.
Blk_wrtn – The total number of blocks written during the interval.
rrqm/s – The number of reads merged before they were issued to the disk.
wrqm/s – The number of writes merged before they were issued to the disk.
r/s – The number of reads issued to the disk per second.
w/s – The number of writes issued to the disk per second.
rsec/s – Disk sectors read per second.
wsec/s – Disk sectors written per second.
avgrq-sz – The average size (in sectors) of disk requests.
avgqu-sz – The average size of the disk request queue.
await – The average time (in ms) for a request to be completely serviced. This average includes the time that the request was waiting in the disk’s queue plus the amount of time it was serviced by the disk.
svctm – The average service time (in ms) for requests submitted to the disk. This indicates how long on average the disk took to complete a request. Unlike await, it does not include the amount of time spent waiting in the queue.

lsof options
+D directory – This causes lsof to recursively search all the files in the given directory and report on which processes are using them.
+d directory – This causes lsof to report on which processes are using the files in the given directory.

lsof output
FD – The file descriptor of the file, or tex for a executable, mem for a memory mapped file.
TYPE – The type of file. REG for a regular file.
DEVICE – Device number in major, minor number.
SIZE – The size of the file.
NODE – The inode of the file.

free options

-s delay – This option causes free to print out new memory statistics every delay seconds.

 strace options

strace [-p <pid>] -s 200 <program>#attach to a process. -s 200 to make the maximum string size to print (the default is 32) to 200. Note that filenames are not considered strings and are always printed in full.

-c – This causes strace to print out a summary of statistics rather than an individual list of all the system calls that are made.

ltrace options
-c – This option causes ltrace to print a summary of all the calls after the command has completed.
-S – ltrace traces system calls in addition to library calls, which is identical to the functionality strace provides.
-p pid – This traces the process with the given PID.

ps options
vsz The virtual set size is the amount of virtual memory that the application is using. Because Linux only allocated physical memory when an application tries to use it, this value may be much greater than the amount of physical memory the application is using.
rss The resident set size is the amount of physical memory the application is currently using.
pmep The percentage of the system memory that the process is consuming.
command This is the command name.

/proc/<PID>/status output
VmSize This is the process’s virtual set size, which is the amount of virtual memory that the application is using. Because Linux only allocates physical memory when an application tries to use it, this value may be much greater than the amount of physical memory the application is actually using. This is the same as the vsz parameter provided by ps.
VmLck This is the amount of memory that has been locked by this process. Locked memory cannot be swapped to disk.
VmRSS This is the resident set size or amount of physical memory the application is currently using. This is the same as the rss statistic provided by ps.

Because shared memory is used by multiple processes, it cannot be attributed to any particular process. ipcs provides enough information about the state of the system-wide shared memory to determine which processes allocated the shared memory, which processes are using it, and how often they are using it. This information proves useful when trying to reduce shared memory usage.

ipcs options

lsof –u oracle | grep <shmid> #shmid is from output of ipcs -m. lists the processes under the oracle user attached to the shared memory segment

-t – This shows the time when the shared memory was created, when a process last attached to it, and when a process last detached from it.
-u – This provides a summary about how much shared memory is being used and whether it has been swapped or is in memory.
-l – This shows the system-wide limits for shared memory usage.
-p – This shows the PIDs of the processes that created and last used the shared memory segments.
-c – creator

ifconfig output #more on

Errors – Frames with errors (possibly because of a bad network cable or duplex mismatch).
Dropped – Frames that were discarded (most likely because of low amounts of memory or buffers).
Overruns – Frames that may have been discarded by the network card because the kernel or network card was overwhelmed with frames. This should not normally happen.
Frame – These frames were dropped as a result of problems on the physical level. This could be the result of cyclic redundancy check (CRC) errors or other low-level problems.
Compressed – Some lower-level interfaces, such as Point-to-Point Protocol (PPP) or Serial Line Internet Protocol (SLIP) devices compress frames before they are sent over the network. This value indicates the number of these compressed frames. (Compressed packets are usually present during SLIP or PPP connections)

carrier – The number of packets discarded because of link media failure (such as a faulty cable)

ip options
-s [-s] link – If the extra -s is provided to ip, it provides a more detailed list of low-level Ethernet statistics.

iptraf options
-d interface – Detailed statistics for an interface including receive, transmit, and error rates
-s interface – Statistics about which IP ports are being used on an interface and how many bytes are flowing through them
-t <minutes> – Number of minutes that iptraf runs before exiting
-z interface – shows packet counts by size on the specified interface

netstat options
-p – Displays the PID/program name responsible for opening each of the displayed sockets
-c – Continually updates the display of information every second
–interfaces=<name> – Displays network statistics for the given interface
–statistics|-s – IP/UDP/ICMP/TCP statistics
–tcp|-t – Shows only information about TCP sockets
–udp|-u – Shows only information about UDP sockets.
–raw|-w – Shows only information about RAW sockets (IP and ICMP)
–listening|-l – Show only listening sockets. (These are omitted by default.)
–all|-a – Show both listening and non-listening (for TCP this means established connections) sockets. With the –interfaces option, show interfaces that are not marked
–numeric|-n – Show numerical addresses instead of trying to determine symbolic host, port or user names.
–extend|-e – Display additional information. Use this option twice for maximum detail.

netstat output

Active Internet connections (w/o servers)
Proto - The protocol (tcp, udp, raw) used by the socket.
Recv-Q - The count of bytes not copied by the user program connected to this socket.
Send-Q - The count of bytes not acknowledged by the remote host.
Local Address - Address and port number of the local end of the socket. Unless the --numeric (-n) option is specified, the socket address is resolved to its canonical host name (FQDN), and the port number is translated into the corresponding service name.
Foreign Address - Address and port number of the remote end of the socket. Analogous to "Local Address."
State - The state of the socket. Since there are no states in raw mode and usually no states used in UDP, this column may be left blank. Normally this can be one of several values: #more on
        The socket has an established connection.
        The socket is actively attempting to establish a connection.
        A connection request has been received from the network.
        The socket is closed, and the connection is shutting down.
        Connection is closed, and the socket is waiting for a shutdown from the remote end.
        The socket is waiting after close to handle packets still in the network.
        The socket is not being used.
        The remote end has shut down, waiting for the socket to close.
        The remote end has shut down, and the socket is closed. Waiting for acknowledgement.
        The socket is listening for incoming connections. Such sockets are not included in the output unless you specify the --listening (-l) or --all (-a) option.
        Both sockets are shut down but we still don't have all our data sent.
        The state of the socket is unknown.
User - The username or the user id (UID) of the owner of the socket.
PID/Program name - Slash-separated pair of the process id (PID) and process name of the process that owns the socket. --program causes this column to be included. You will also need superuser privileges to see this information on sockets you don't own. This identification information is not yet available for IPX sockets.


[ezolt@scrffy ~/edid]$ vmstat 1 | tee /tmp/output
procs -----------memory---------- ---swap-- -----io----  --system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo    in    cs  us sy id wa
0  1 201060  35832  26532 324112    0    0     3     2     6     2  5  1  94  0
0  0 201060  35888  26532 324112    0    0    16     0  1138   358  0  0  99  0
0  0 201060  35888  26540 324104    0    0     0    88  1163   371  0  0 100  0

The number of context switches looks good compared to the number of interrupts. The scheduler is switching processes less than the number of timer interrupts that are firing. This is most likely because the system is nearly idle, and most of the time when the timer interrupt fires, the scheduler does not have any work to do, so it does not switch from the idle process.

[ezolt@scrffy manuscript]$ sar -w -c -q 1 2
Linux 2.6.8-1.521smp (scrffy)   10/20/2004

08:23:29 PM    proc/s
08:23:30 PM      0.00

08:23:29 PM   cswch/s
08:23:30 PM    594.00

08:23:29 PM   runq-sz  plist-sz   ldavg-1    ldavg-5  ldavg-15
08:23:30 PM         0       163      1.12       1.17      1.17

08:23:30 PM    proc/s
08:23:31 PM      0.00

08:23:30 PM   cswch/s
08:23:31 PM    812.87

08:23:30 PM   runq-sz  plist-sz   ldavg-1    ldavg-5  ldavg-15
08:23:31 PM         0       163      1.12       1.17      1.17

Average:       proc/s
Average:         0.00

Average:      cswch/s
Average:       703.98

Average:      runq-sz  plist-sz   ldavg-1    ldavg-5  ldavg-15
Average:            0       163      1.12       1.17      1.17

In this case, we ask sar to show us the total number of context switches and process creations that occur every second. We also ask sar for information about the load average. We can see in this example that this machine has 163 process that are in memory but not running. For the past minute, on average 1.12 processes have been ready to run.

bash-2.05b$ vmstat -a
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free  inact active   si   so    bi    bo   in    cs us sy id wa
 2  1 514004   5640 79816 1341208   33   31   204   247 1111  1548  8  5 73 14

The amount of inactive pages indicates how much of the memory could be swapped to disk and how much is currently being used. In this case, we can see that 1310MB of memory is active, and only 78MB is considered inactive. This machine has a large amount of memory, and much of it is being actively used.

bash-2.05b$ vmstat -s

      1552528  total memory
      1546692  used memory
      1410448  active memory
        11100  inactive memory
         5836  free memory
         2676  buffer memory
       645864  swap cache
      2097096  total swap
       526280  used swap
      1570816  free swap
     20293225 non-nice user cpu ticks
     18284715 nice user cpu ticks
     17687435 system cpu ticks
    357314699 idle cpu ticks
     67673539 IO-wait cpu ticks
       352225 IRQ cpu ticks
      4872449 softirq cpu ticks
    495248623 pages paged in
    600129070 pages paged out
     19877382 pages swapped in
     18874460 pages swapped out
   2702803833 interrupts
   3763550322 CPU context switches
   1094067854 boot time
     20158151 forks

It can be helpful to know the system totals when trying to figure out what percentage of the swap and memory is currently being used. Another interesting statistic is the pages paged in, which indicates the total number of pages that were read from the disk. This statistic includes the pages that are read starting an application and those that the application itself may be using.

[ezolt@wintermute tmp]$ ps -o etime,time,pcpu,cmd 10882
      00:06 00:00:05 88.0 ./burn

This example shows a test application that is consuming 88 percent of the CPU and has been running for 6 seconds, but has only consumed 5 seconds of CPU time.

[ezolt@wintermute tmp]$ ps –o vsz,rss,tsiz,dsiz,majflt,minflt,cmd 10882
11124 10004 1 11122 66 2465 ./burn

The burn application has a very small text size (1KB), but a very large data size (11,122KB). Of the total virtual size (11,124KB), the process has a slightly smaller resident set size (10,004KB), which represents the total amount of physical memory that the process is actually using. In addition, most of the faults generated by burn were minor faults, so most of the memory faults were due to memory allocation rather than loading in a large amount of text or data from the program image on the disk.

[ezolt@wintermute tmp]$ cat /proc/4540/status
Name: burn
State: T (stopped)
Tgid: 4540
Pid: 4540
PPid: 1514
TracerPid: 0
Uid: 501 501 501 501
Gid: 501 501 501 501
FDSize: 256
Groups: 501 9 502
VmSize: 11124 kB
VmLck: 0 kB
VmRSS: 10004 kB
VmData: 9776 kB
VmStk: 8 kB
VmExe: 4 kB
VmLib: 1312 kB
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000

The VmLck size of 0KB means that the process has not locked any pages into memory, making them unswappable. The VmRSS size of 10,004KB means that the application is currently using 10,004KB of physical memory, although it has either allocated or mapped the VmSize or 11,124KB. If the application begins to use the memory that it has allocated but is not currently using, the VmRSS size increases but leaves the VmSize unchanged.

[ezolt@wintermute test_app]$ cat /proc/4540/maps
08048000-08049000 r-xp 00000000 21:03 393730 /tmp/burn
08049000-0804a000 rw-p 00000000 21:03 393730 /tmp/burn
0804a000-089d3000 rwxp 00000000 00:00 0
40000000-40015000 r-xp 00000000 21:03 1147263 /lib/
40015000-40016000 rw-p 00015000 21:03 1147263 /lib/
4002e000-4002f000 rw-p 00000000 00:00 0
4002f000-40162000 r-xp 00000000 21:03 2031811 /lib/tls/
40162000-40166000 rw-p 00132000 21:03 2031811 /lib/tls/
40166000-40168000 rw-p 00000000 00:00 0
bfffe000-c0000000 rwxp fffff000 00:00 0

The burn application is using two libraries: ld and libc. The text section (denoted by the permission r-xp) of libc has a range of 0x4002f000 through 0×40162000 or a size of 0×133000 or 1,257,472 bytes.
The data section (denoted by permission rw-p) of libc has a range of 40162000 through 40166000 or a size of 0×4000 or 16,384 bytes. The text size of libc is bigger than ld’s text size of 0×15000 or 86,016 bytes. The data size of libc is also bigger than ld’s text size of 0×1000 or 4,096 bytes. libc is the big library that burn is linking in.

[ezolt@wintermute tmp]$ ipcs -u

------ Shared Memory Status --------
segments allocated 21
pages allocated 1585
pages resident 720
pages swapped 412
Swap performance: 0 attempts 0 successes

------ Semaphore Status --------
used arrays = 0
allocated semaphores = 0

------ Messages: Status --------
allocated queues = 0
used headers = 0
used space = 0 bytes

In this case, we can see that 21 different segments or pieces of shared memory have been allocated. All these segments consume a total of 1,585 pages of memory; 720 of these exist in physical memory and 412 have been swapped to disk.

[ezolt@wintermute tmp]$ ipcs

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 0 root 777 49152 1
0x00000000 32769 root 777 16384 1
0x00000000 65538 ezolt 600 393216 2 dest

we ask ipcs for a general overview of all the shared memory segments in the system. This indicates who is using each memory segment. In this case, we see a list of all the shared segments. For one in particular, the one with a share memory ID of 65538, the user (ezolt) is the owner. It has a permission of 600 (a typical UNIX permission), which in this case, means that only ezolt can read and write to it. It has 393,216 bytes, and 2 processes are attached to it.

[ezolt@wintermute tmp]$ ipcs -p

------ Shared Memory Creator/Last-op --------
shmid owner cpid lpid
0 root 1224 11954
32769 root 1224 11954
65538 ezolt 1229 11954

Finally, we can figure out exactly which processes created the shared memory segments and which other processes are using them. For the segment with shmid 32769, we can see that the PID 1229 created it and 11954 was the last to use it.

[ezolt@wintermute procps-3.2.0]$ ./vmstat 1 3

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 0 197020 81804 29920 0 0 236 25 1017 67 1 1 93 4
1 1 0 172252 106252 29952 0 0 24448 0 1200 395 1 36 0 63
0 0 0 231068 50004 27924 0 0 19712 80 1179 345 1 34 15 49

During one of the samples, the system read 24,448 disk blocks. As mentioned previously, the block size for a disk is 1,024 bytes(or 4,096 bytes), so this means that the system is reading in data at about 23MB per second. We can also see that during this sample, the CPU was spending a significant portion of time waiting for I/O to complete. The CPU waits on I/O 63 percent of the time during the sample in which the disk was reading at ~23MB per second, and it waits on I/O 49 percent for the next sample, in which the disk was reading at ~19MB per second.

[ezolt@wintermute procps-3.2.0]$ ./vmstat -D
3 disks
5 partitions
53256 total reads
641233 merged reads
4787741 read sectors
343552 milli reading
14479 writes
17556 merged writes
257208 written sectors
7237771 milli writing
0 inprogress IO
342 milli spent IO

In this example, a large number of the reads issued to the system were merged before they were issued to the device. Although there were ~640,000 merged reads, only ~53,000 read commands were actually issued to the drives. The output also tells us that a total of 4,787,741 sectors have been read from the disk, and that since system boot, 343,552ms (or 344 seconds) were spent reading from the disk. The same statistics are available for write performance.

[ezolt@wintermute procps-3.2.0]$ ./vmstat -p hde3 1 3
hde3 reads read sectors writes requested writes
18999 191986 24701 197608
19059 192466 24795 198360
- 19161 193282 24795 198360

Shows that 60 (19,059 – 18,999) reads and 94 writes (24,795 – 24,795) have been issued to partition hde3. This view can prove particularly useful if you are trying to determine which partition of a disk is seeing the most usage.


[ezolt@localhost sysstat-5.0.2]$ ./iostat -x -dk 1 5 /dev/hda2
Linux 2.4.22-1.2188.nptl (localhost.localdomain) 05/01/2004
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 11.22 44.40 3.15 4.20 115.00 388.97 57.50 194.49
68.52 1.75 237.17 11.47 8.43

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 1548.00 0.00 100.00 0.00 13240.00 0.00 6620.00
132.40 55.13 538.60 10.00 100.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 1365.00 0.00 131.00 0.00 11672.00 0.00 5836.00
89.10 53.86 422.44 7.63 100.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 1483.00 0.00 84.00 0.00 12688.00 0.00 6344.00
151.0 39.69 399.52 11.90 100.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
hda2 0.00 2067.00 0.00 123.00 0.00 17664.00 0.00 8832.00
143.61 58.59 508.54 8.13 100.00

you can see that the average queue size is pretty high (~237 to 538) and, as a result, the amount of time that a request must wait (~422.44ms to 538.60ms) is much greater than the amount of time it takes to service the request (7.63ms to 11.90ms). These high average service times, along with the fact that the utilization is 100 percent, show that the disk is completely saturated.

[ezolt@wintermute sysstat-5.0.2]$ sar -n SOCK 1 2

Linux 2.4.22-1.2174.nptlsmp ( 06/07/04
21:32:26 totsck tcpsck udpsck rawsck ip-frag
21:32:27 373 118 8 0 0
21:32:28 373 118 8 0 0
Average: 373 118 8 0 0

We can see the total number of open sockets and the TCP, RAW, and UDP sockets. sar also displays the number of fragmented IP packets.


perl tips

April 2nd, 2014 No comments
use strict;
use warnings;##arrays
my @animals = (“dog”, “pig”, “cat”);
print “The last element of array \$animals is : “.$animals[$#animals].”\n”;
print “more than 2 animals found\n”;
print “less than 2 animals found\n”
print $_.”\n”;
my %fruit_color=(“apple”, “red”, “banana”, “yellow”);
print “Color of banana is : “.$fruit_color{“banana”}.”\n”;

for $char (keys %fruit_color)
print(“$char => $fruit_color{$char}\n”);

my $variables = {
scalar  =>  {
description => “single item”,
sigil => ‘$’,
array   =>  {
description => “ordered list of items”,
sigil => ‘@’,
hash    =>  {
description => “key/value pairs”,
sigil => ‘%’,
print “Scalars begin with a $variables->{‘scalar’}->{‘sigil’}\n”;

##Files and I/O
##regular expressions
open (my $passwd, “<”, “/etc/passwd2″) or die (“can  a not open”);
while (<$passwd>) {
print $_ if $_ =~ “test”;
close $passwd or die “$passwd: $!”;
my $next = “doing a first”;
$next =~ s/first/second/;
print $next.”\n”;

my $email = “testaccount\”;
if ($email =~ /([^@]+)@(.+)/) {
print “Username is : $1\n”;
print “Hostname is : $2\n”;

sub multiply{
my ($num1, $num2) = @_;
my $result = $num1 * $num2;
return $result;

my $result2 = multiply(3, 5);
print “3 * 5 = $result2\n”;

! system(‘date’) or die(“failed it”); #if a subroutine returns ok, it’ll return 0
Categories: Perl, Programming, tips Tags:

resolved – /lib/ bad ELF interpreter: No such file or directory

April 1st, 2014 No comments

When I ran perl command today, I met problem below:

[root@test01 bin]# /usr/local/bin/perl5.8
-bash: /usr/local/bin/perl5.8: /lib/ bad ELF interpreter: No such file or directory

Now let’s check which package /lib/ belongs to on a good linux box:

[root@test02 ~]# rpm -qf /lib/

So here’s the resolution to the issue:

[root@test01 bin]# yum install -y glibc.x86_64 glibc.i686 glibc-devel.i686 glibc-devel.x86_64 glibc-headers.x86_64

Categories: Kernel, Linux, Systems Tags:

resolved – sudo: sorry, you must have a tty to run sudo

April 1st, 2014 2 comments

The error message below sometimes will occur when you run a sudo <command>:

sudo: sorry, you must have a tty to run sudo

To resolve this, you may comment out “Defaults requiretty” in /etc/sudoers(revoked by running visudo). Here is more info about this method:

However, sometimes it’s not convenient or even not possible to modify /etc/sudoers, then you can consider the following:

echo -e “<password>\n”|sudo -S <sudo command>

For -S parameter of sudo, you may refer to sudo man page:

-S‘ The -S (stdin) option causes sudo to read the password from the standard input instead of the terminal device. The password must be followed by a newline character.

So here -S bypass tty(terminal device) to read the password from the standard input. And by this, we can now pipe password to sudo.

Categories: Linux, Programming, SHELL, Systems Tags: ,

Document databases and Graph databases

March 27th, 2014 No comments
  • Document databases

Document databases are not document management systems. More often than not, developers starting out with NoSQL confuse document databases with document and content management systems. The worddocument in document databases connotes loosely structured sets of key/value pairs in documents, typically JSON (JavaScript Object Notation), and not documents or spreadsheets (though these could be stored too).

Document databases treat a document as a whole and avoid splitting a document into its constituent name/value pairs. At a collection level, this allows for putting together a diverse set of documents into a single collection. Document databases allow indexing of documents on the basis of not only its primary identifier but also its properties. A few different open-source document databases are available today but the most prominent among the available options are MongoDB and CouchDB.


  • Official Online Resources —
  • History — Created at 10gen.
  • Technologies and Language — Implemented in C++.
  • Access Methods — A JavaScript command-line interface. Drivers exist for a number of languages including C, C#, C++, Erlang. Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, and Scala.
  • Query Language — SQL-like query language.
  • Open-Source License — GNU Affero GPL (
  • Who Uses It — FourSquare, Shutterfly, Intuit, Github, and more.


  • Official Online Resources — and www.couchbase. Most of the authors are part of Couchbase, Inc.
  • History — Work started in 2005 and it was incubated into Apache in 2008.
  • Technologies and Language — Implemented in Erlang with some C and a JavaScript execution environment.
  • Access Methods — Upholds REST above every other mechanism. Use standard web tools and clients to access the database, the same way as you access web resources.
  • Open-Source License — Apache License version 2.
  • Who Uses It — Apple, BBC, Canonical, Cern, and more at

A lot of details on document databases are covered starting in the next chapter.

  • Graph Databases

So far I have listed most of the mainstream open-source NoSQL products. A few other products like Graph databases and XML data stores could also qualify as NoSQL databases. This book does not cover Graph and XML databases. However, I list the two Graph databases that may be of interest and something you may want to explore beyond this book: Neo4j and FlockDB:

Neo4J is an ACID-compliant graph database. It facilitates rapid traversal of graphs.


  • Official Online Resources —
  • History — Created at Neo Technologies in 2003. (Yes, this database has been around before the term NoSQL was known popularly.)
  • Technologies and Language — Implemented in Java.
  • Access Methods — A command-line access to the store is provided. REST interface also available. Client libraries for Java, Python, Ruby, Clojure, Scala, and PHP exist.
  • Query Language — Supports SPARQL protocol and RDF Query Language.
  • Open-Source License — AGPL.
  • Who Uses It —


  • Official Online Resources —
  • History — Created at Twitter and open sourced in 2010. Designed to store the adjacency lists for followers on Twitter.
  • Technologies and Language — Implemented in Scala.
  • Access Methods — A Thrift and Ruby client.
  • Open-Source License — Apache License version 2.
  • Who Uses It — Twitter.

A number of NoSQL products have been covered so far. Hopefully, it has warmed you up to learn more about these products and to get ready to understand how you can leverage and use them effectively in your stack.


This article is from book <Professional NoSQL>.


Categories: Clouding, Databases, IT Architecture Tags:

sorted ordered column-oriented stores VS key/value stores in NoSQL

March 27th, 2014 No comments
  • sorted ordered column-oriented stores

Google’s Bigtable espouses a model where data in stored in a column-oriented way. This contrasts with the row-oriented format in RDBMS. The column-oriented storage allows data to be stored effectively. It avoids consuming space when storing nulls by simply not storing a column when a value doesn’t exist for that column.

Each unit of data can be thought of as a set of key/value pairs, where the unit itself is identified with the help of a primary identifier, often referred to as the primary key. Bigtable and its clones tend to call this primary key the row-key. Also, as the title of this subsection suggests, units are stored in an ordered-sorted manner. The units of data are sorted and ordered on the basis of the row-key. To explain sorted ordered column-oriented stores, an example serves better than a lot of text, so let me present an example to you. Consider a simple table of values that keeps information about a set of people. Such a table could have columns like first_name, last_name, occupation, zip_code, and gender. A person’s information in this table could be as follows:

first_name: John
last_name: Doe
zip_code: 10001
gender: male

Another set of data in the same table could be as follows:

first_name: Jane
zip_code: 94303

The row-key of the first data point could be 1 and the second could be 2. Then data would be stored in a sorted ordered column-oriented store in a way that the data point with row-key 1 will be stored before a data point with row-key 2 and also that the two data points will be adjacent to each other.

Next, only the valid key/value pairs would be stored for each data point. So, a possible column-family for the example could be name with columns first_name and last_name being its members. Another column-family could be location with zip_code as its member. A third column-family could be profile. The gender column could be a member of the profile column-family. In column-oriented stores similar to Bigtable, data is stored on a column-family basis. Column-families are typically defined at configuration or startup time. Columns themselves need no a-priori definition or declaration. Also, columns are capable of storing any data types as far as the data can be persisted to an array of bytes.

So the underlying logical storage for this simple example consists of three storage buckets: name, location, and profile. Within each bucket, only key/value pairs with valid values are stored. Therefore, the name column-family bucket stores the following values:

For row-key: 1

first_name: John
last_name: Doe

For row-key: 2

first_name: Jane

The location column-family stores the following:

For row-key: 1

zip_code: 10001

For row-key: 2

zip_code: 94303

The profile column-family has values only for the data point with row-key 1 so it stores only the following:

For row-key: 1

gender: male

In real storage terms, the column-families are not physically isolated for a given row. All data pertaining to a row-key is stored together. The column-family acts as a key for the columns it contains and the row-key acts as the key for the whole data set.

Data in Bigtable and its clones is stored in a contiguous sequenced manner. As data grows to fill up one node, it is spilt into multiple nodes. The data is sorted and ordered not only on each node but also across nodes providing one large continuously sequenced set. The data is persisted in a fault-tolerant manner where three copies of each data set are maintained. Most Bigtable clones leverage a distributed filesystem to persist data to disk. Distributed filesystems allow data to be stored among a cluster of machines.

The sorted ordered structure makes data seek by row-key extremely efficient. Data access is less random and ad-hoc and lookup is as simple as finding the node in the sequence that holds the data. Data is inserted at the end of the list. Updates are in-place but often imply adding a newer version of data to the specific cell rather than in-place overwrites. This means a few versions of each cell are maintained at all times. The versioning property is usually configurable.

A bullet-point enumeration of some of the Bigtable open-source clones’ properties is listed next.

  • HBase

Official Online Resources —
History — Created at Powerset (now part of Microsoft) in 2007. Donated to the Apache foundation before Powerset was acquired by Microsoft.
Technologies and Language — Implemented in Java.
Access Methods — A JRuby shell allows command-line access to the store. Thrift, Avro, REST, and protobuf clients exist. A few language bindings are also available. A Java API is available with the distribution.
Query Language — No native querying language. Hive ( provides a SQL-like interface for HBase.
Open-Source License — Apache License version 2.
Who Uses It — Facebook, StumbleUpon, Hulu, Ning, Mahalo, Yahoo!, and others.

  • Hypertable

Official Online Resources —
History — Created at Zvents in 2007. Now an independent open-source project.
Technologies and Language — Implemented in C++, uses Google RE2 regular expression library. RE2 provides a fast and efficient implementation. Hypertable promises performance boost over HBase, potentially serving to reduce time and cost when dealing with large amounts of data.
Access Methods — A command-line shell is available. In addition, a Thrift interface is supported. Language bindings have been created based on the Thrift interface. A creative developer has even created a JDBC-compliant interface for Hypertable.
Query Language — HQL (Hypertable Query Language) is a SQL-like abstraction for querying Hypertable data. Hypertable also has an adapter for Hive.
Open-Source License — GNU GPL version 2.
Who Uses It — Zvents, Baidu (China’s biggest search engine), Rediff (India’s biggest portal).

  • Cloudata

Official Online Resources —
History — Created by a Korean developer named YK Kwon ( Not much is publicly known about its origins.
Technologies and Language — Implemented in Java.
Access Methods — A command-line access is available. Thrift, REST, and Java API are available.
Query Language — CQL (Cloudata Query Language) defines a SQL-like query language.
Open-Source License — Apache License version 2.
Who Uses It — Not known.

Sorted ordered column-family stores form a very popular NoSQL option. However, NoSQL consists of a lot more variants of key/value stores and document databases. Next, I introduce the key/value stores.

  • key/value stores

A HashMap or an associative array is the simplest data structure that can hold a set of key/value pairs. Such data structures are extremely popular because they provide a very efficient, big O(1) average algorithm running time for accessing data. The key of a key/value pair is a unique value in the set and can be easily looked up to access the data.

Key/value pairs are of varied types: some keep the data in memory and some provide the capability to persist the data to disk. Key/value pairs can be distributed and held in a cluster of nodes.

A simple, yet powerful, key/value store is Oracle’s Berkeley DB. Berkeley DB is a pure storage engine where both key and value are an array of bytes. The core storage engine of Berkeley DB doesn’t attach meaning to the key or the value. It takes byte array pairs in and returns the same back to the calling client. Berkeley DB allows data to be cached in memory and flushed to disk as it grows. There is also a notion of indexing the keys for faster lookup and access. Berkeley DB has existed since the mid-1990s. It was created to replace AT&T’s NDBM as a part of migrating from BSD 4.3 to 4.4. In 1996, Sleepycat Software was formed to maintain and provide support for Berkeley DB.

Another type of key/value store in common use is a cache. A cache provides an in-memory snapshot of the most-used data in an application. The purpose of cache is to reduce disk I/O. Cache systems could be rudimentary map structures or robust systems with a cache expiration policy. Caching is a popular strategy employed at all levels of a computer software stack to boost performance. Operating systems, databases, middleware components, and applications use caching.

Robust open-source distributed cache systems like EHCache ( are widely used in Java applications. EHCache could be considered as a NoSQL solution. Another caching system popularly used in web applications is Memcached (, which is an open-source, high-performance object caching system. Brad Fitzpatrick created Memcached for LiveJournal in 2003. Apart from being a caching system, Memcached also helps effective memory management by creating a large virtual pool and distributing memory among nodes as required. This prevents fragmented zones where one node could have excess but unused memory and another node could be starved for memory.

As the NoSQL movement has gathered momentum, a number of key/value pair data stores have emerged. Some of these newer stores build on the Memcached API, some use Berkeley DB as the underlying storage, and a few others provide alternative solutions built from scratch.

Many of these key/value pairs have APIs that allow get-and-set mechanisms to get and set values. A few, like Redis (, provide richer abstractions and powerful APIs. Redis could be considered as a data structure server because it provides data structures like string (character sequences), lists, and sets, apart from maps. Also, Redis provides a very rich set of operations to access data from these different types of data structures.

This book covers a lot of details on key/value pairs. For now, I list a few important ones and list out important attributes of these stores. Again, the presentation resorts to a bullet-point-style enumeration of a few important characteristics.

  • Membase (Proposed to be merged into Couchbase, gaining features from CouchDB after the creation of Couchbase, Inc.)

Official Online Resources —
History — Project started in 2009 by NorthScale, Inc. (later renamed as Membase). Zygna and NHN have been contributors since the beginning. Membase builds on Memcached and supports Memcached’s text and binary protocol. Membase adds a lot of additional features on top of Memcached. It adds disk persistence, data replication, live cluster reconfiguration, and data rebalancing. A number of core Membase creators are also Memcached contributors.
Technologies and Language — Implemented in Erlang, C, and C++.
Access Methods — Memcached-compliant API with some extensions. Can be a drop-in replacement for Memcached.
Open-Source License — Apache License version 2.
Who Uses It — Zynga, NHN, and others.

  • Kyoto Cabinet

Official Online Resources —
History — Kyoto Cabinet is a successor of Tokyo Cabinet ( The database is a simple data file containing records; each is a pair of a key and a value. Every key and value are serial bytes with variable length.
Technologies and Language — Implemented in C++.
Access Methods — Provides APIs for C, C++, Java, C#, Python, Ruby, Perl, Erlang, OCaml, and Lua. The protocol simplicity means there are many, many clients.
Open-Source License — GNU GPL and GNU LGPL.
Who Uses It — Mixi, Inc. sponsored much of its original work before the author left Mixi to join Google. Blog posts and mailing lists suggest that there are many users but no public list is available.

  • Redis

Official Online Resources —
History — Project started in 2009 by Salvatore Sanfilippo. Salvatore created it for his startup LLOOGG ( Though still an independent project, Redis primary author is employed by VMware, who sponsor its development.
Technologies and Language — Implemented in C.
Access Methods — Rich set of methods and operations. Can access via Redis command-line interface and a set of well-maintained client libraries for languages like Java, Python, Ruby, C, C++, Lua, Haskell, AS3, and more.
Open-Source License — BSD.
Who Uses It — Craigslist.

The three key/value pairs listed here are nimble, fast implementations that provide storage for real-time data, temporary frequently used data, or even full-scale persistence.

The key/value pairs listed so far provide a strong consistency model for the data it stores. However, a few other key/value pairs emphasize availability over consistency in distributed deployments. Many of these are inspired by Amazon’s Dynamo, which is also a key/value pair. Amazon’s Dynamo promises exceptional availability and scalability, and forms the backbone for Amazon’s distributed fault tolerant and highly available system. Apache Cassandra, Basho Riak, and Voldemort are open-source implementations of the ideas proposed by Amazon Dynamo.

Amazon Dynamo brings a lot of key high-availability ideas to the forefront. The most important of the ideas is that of eventual consistency. Eventual consistency implies that there could be small intervals of inconsistency between replicated nodes as data gets updated among peer-to-peer nodes. Eventual consistency does not mean inconsistency. It just implies a weaker form of consistency than the typical ACID type consistency found in RDBMS.

For now I will list the Amazon Dynamo clones and introduce you to a few important characteristics of these data stores.

  • Cassandra

Official Online Resources —
History — Developed at Facebook and open sourced in 2008, Apache Cassandra was donated to the Apache foundation.
Technologies and Language — Implemented in Java.
Access Methods — A command-line access to the store. Thrift interface and an internal Java API exist. Clients for multiple languages including Java, Python, Grails, PHP, .NET. and Ruby are available. Hadoop integration is also supported.
Query Language — A query language specification is in the making.
Open-Source License — Apache License version 2.
Who Uses It — Facebook, Digg, Reddit, Twitter, and others.

  • Voldemort

Official Online Resources —
History — Created by the data and analytics team at LinkedIn in 2008.
Technologies and Language — Implemented in Java. Provides for pluggable storage using either Berkeley DB or MySQL.
Access Methods — Integrates with Thrift, Avro, and protobuf ( interfaces. Can be used in conjunction with Hadoop.
Open-Source License — Apache License version 2.
Who Uses It — LinkedIn.

  • Riak

Official Online Resources —
History — Created at Basho, a company formed in 2008.
Technologies and Language — Implemented in Erlang. Also, uses a bit of C and JavaScript.
Access Methods — Interfaces for JSON (over HTTP) and protobuf clients exist. Libraries for Erlang, Java, Ruby, Python, PHP, and JavaScript exist.
Open-Source License — Apache License version 2.
Who Uses It — Comcast and Mochi Media.

All three — Cassandra, Riak and Voldemort — provide open-source Amazon Dynamo capabilities. Cassandra and Riak demonstrate dual nature as far their behavior and properties go. Cassandra has properties of both Google Bigtable and Amazon Dynamo. Riak acts both as a key/value store and a document database.


This article is from book <Professional NoSQL>.

Categories: Clouding, Databases, IT Architecture Tags:

map/reduce framework definition and introduction

March 27th, 2014 No comments

MapReduce is a parallel programming model that allows distributed processing on large data sets on a cluster of computers. The MapReduce framework is patented (,650,331.PN.&OS=PN/7,650,331&RS=PN/7,650,331) by Google, but the ideas are freely shared and adopted in a number of open-source implementations.

MapReduce derives its ideas and inspiration from concepts in the world of functional programming. Map and reduce are commonly used functions in the world of functional programming. In functional programming, a map function applies an operation or a function to each element in a list. For example, a multiply-by-two function on a list [1, 2, 3, 4] would generate another list as follows: [2, 4, 6, 8]. When such functions are applied, the original list is not altered. Functional programming believes in keeping data immutable and avoids sharing data among multiple processes or threads. This means the map function that was just illustrated, trivial as it may be, could be run via two or more multiple threads on the list and these threads would not step on each other, because the list itself is not altered.

Like the map function, functional programming has a concept of a reduce function. Actually, a reduce function in functional programming is more commonly known as a fold function. A reduce or a fold function is also sometimes called an accumulate, compress, or inject function. A reduce or fold function applies a function on all elements of a data structure, such as a list, and produces a single result or output. So applying a reduce function-like summation on the list generated out of the map function, that is, [2, 4, 6, 8], would generate an output equal to 20.

So map and reduce functions could be used in conjunction to process lists of data, where a function is first applied to each member of a list and then an aggregate function is applied to the transformed and generated list.

This same simple idea of map and reduce has been extended to work on large data sets. The idea is slightly modified to work on collections of tuples or key/value pairs. The map function applies a function on every key/value pair in the collection and generates a new collection. Then the reduce function works on the new generated collection and applies an aggregate function to compute a final output. This is better understood through an example, so let me present a trivial one to explain the flow. Say you have a collection of key/value pairs as follows:

[{ "94303": "Tom"}, {"94303": "Jane"}, {"94301": "Arun"}, {"94302": "Chen"}]

This is a collection of key/value pairs where the key is the zip code and the value is the name of a person who resides within that zip code. A simple map function on this collection could get the names of all those who reside in a particular zip code. The output of such a map function is as follows:

[{"94303":["Tom", "Jane"]}, {“94301″:["Arun"]}, {“94302″:["Chen"]}]

Now a reduce function could work on this output to simply count the number of people who belong to particular zip code. The final output then would be as follows:

[{"94303": 2}, {"94301": 1}, {"94302": 1}]

This example is extremely simple and a MapReduce mechanism seems too complex for such a manipulation, but I hope you get the core idea behind the concepts and the flow.


This article is from book <Professional NoSQL>.

AWS – relationship between Elastic Load Balancing, CloudWatch, and Auto Scale

March 20th, 2014 No comments


The monitoring, auto scaling, and elastic load balancing features of the Amazon EC2 services give you easy on-demand access to capabilities that once required a complicated system architecture and a large hardware investment.
Any real-world web application must have the ability to scale. This can take the form of vertical scaling, where larger and higher capacity servers are rolled in to replace the existing ones, or horizontal scaling, where additional servers are placed side-by-side (architecturally speaking) with the existing resources. Vertical scaling is sometimes called a scale-up model, and horizontal scaling is sometimes called a scale-out model.

Vertical Scaling

At first, vertical scaling appears to be the easiest way to add capacity. You start out with a server of modest means and use it until it no longer meets your needs. You purchase a bigger one, move your code and data over to it, and abandon the old one. Performance is good until the newer, larger system reaches its capacity. You purchase again, repeating the process until your hardware supplier informs you that you’re running on the largest hardware that they have, and that you’ve no more room to grow. At this point you’ve effectively painted yourself into a corner.
Vertical scaling can be expensive. Each time you upgrade to a bigger system you also make a correspondingly larger investment. If you’re actually buying hardware, your first step-ups cost you thousands of dollars; your later ones cost you tens or even hundreds of thousands of dollars. At some point you may have to invest in a similarly expensive backup system, which will remain idle unless the unthinkable happens and you need to use it to continue operations.

Horizontal Scaling

Horizontal scaling is slightly more complex, but far more flexible and scalable in the long term. Instead of upgrading to a bigger server, you obtain another one (presumably of the same size, although there’s no requirement for this to be the case) and arrange to share the storage and processing load across two servers. When two servers no longer meet your needs, you add a third, a fourth, and so on. This scale-out model allows you to add resources incrementally and economically. As your fleet of servers grow, you can actually increase the reliability of your system by eliminating dependencies on any particular server.
Of course, sharing the storage and processing load across a fleet of servers is sometimes easier said than done. Loosely coupled systems tied together with SQS message queues like those we saw and built in the previous chapter can usually scale easily. Systems with a reliance on a traditional relational database or another centralized storage can be more difficult.

Monitoring, Scaling, and Load Balancing

We’ll need several services in order to build a horizontally scaled system that automatically scales to handle load.
First, we need to know how hard each server is working. We have to establish how much data is moving in and out across the network, how many disk reads and writes are taking place, and how much of the time the CPU (Central Processing Unit) is busy. This functionality is provided by Amazon CloudWatch. After CloudWatch has been enabled for an EC2 instance or an elastic load balancer, it captures and stores this information so that it can be used to control scaling decisions.
Second, we require a way to observe the system performance, using it to make decisions to add more EC2 instances (because the system is too busy) or to remove some running instances (because there’s too little work for them to do). This functionality is provided by the EC2 auto scaling feature. The auto scaling feature uses a rule-driven system to encode the logic needed to add and remove EC2 instances.
Third, we need a method for routing traffic to each of the running instances. This is handled by the EC2 elastic load balancing feature. Working in conjunction with auto scaling, elastic load balancing distributes traffic to EC2 instances located in one or more Availability Zones within an EC2 region. It also uses configurable health checks to detect failing instances and to route traffic away from them.
Figure 7-1 depicts how these features relate to each other.
An incoming HTTP load is balanced across a collection of EC2 instances. CloudWatch captures and stores system performance data from the instances. This data is used by auto scale to regulate the number of EC2 instances in the collection.
As you’ll soon see, you can use each of these features on their own or you can use them together. This modular model gives you a lot of flexibility and also allows you to learn about the features in an incremental fashion.


This article is from book <Host Your Web Site In The Cloud: Amazon Web Services Made Easy>.


Categories: Clouding Tags:

Resolved – print() on closed filehandle $fh at ./ line 6.

March 19th, 2014 No comments

You may find that print sometimes won’t work as expected in perl, for example:

[root@centos-doxer test]# cat
use warnings;
select $fh;
close $fh;
print “test”;

You may expect “test” to be printed, but actually you got error message:

print() on closed filehandle $fh at ./ line 6.

So how’s this happened? Please see my explanation:

[root@centos-doxer test]# cat
use warnings;
select $fh;
close $fh; #here you closed $fh filehandle, but you should now reset filehandle to STDOUT
print “test”;

Now here’s the updated script:

use warnings;
select $fh;
close $fh;
select STDOUT;
print “test”;

This way, you’ll get “test” as expected!


Categories: Perl, Programming Tags:

set vnc not asking for OS account password

March 18th, 2014 No comments

As you may know, vncpasswd(belongs to package vnc-server) is used to set password for users when connecting to vnc using a vnc client(such as tightvnc). When you connect to vnc-server, it’ll ask for the password:

vnc-0After you connect to the host using VNC, you may also find that the remote server will ask again for OS password(this is set by passwd):

vnc-01For some cases, you may not want the second one. So here’s the way to cancel this behavior:




Categories: Linux, Systems Tags: ,

stuck in PXE-E51: No DHCP or proxyDHCP offers were received, PXE-M0F: Exiting Intel Boot Agent, Network boot canceled by keystroke

March 17th, 2014 No comments

If you installed your OS and tried booting up it but stuck with the following messages:


Then one possibility is that, the configuration for your host’s storage array is not right. For instance, it should be JBOD but you had configured it to RAID6.

Please note that this is only one possibility for this error, you may search for PXE Error Codes you encoutered for more details.


  • Sometimes, DHCP snooping may prevent PXE functioning, you can read more
  • STP(Spanning-Tree Protocol) makes each port wait up to 50 seconds before data is allowed to be sent on the port. This Delay in turn can cause problems with some applications/protocols (PXE, Bootworks, etc.). To alleviate the problem, Porfast was implemented on Cisco devices, the terminology might differ between different vendor devices. You can read more
  • ARP caching
Categories: Hardware, Storage, Systems Tags:

Oracle BI Publisher reports – send mail when filesystems getting full

March 17th, 2014 No comments

Let’s assume you have one Oracle BI Publisher report for filesystem checking. And now you want to write script for checking that report page and send mail to system admins when filesystems are getting full. As the default output of Oracle BI Publisher report needs javascript to work, and as you may know javascript is evil that wget/curl can not get them, so after log on, the next step you need to do is to find the html version’s url of that report for you to use in your script(and the html page has all records when javascript one has only part of them):




Let’s assume that the html’s url is “”, and the display of it was like the following:

bi report

Then here goes the script that will check this page for hosts that has less than 10% available space and send mail to system admins:

use HTML::Strip;
system(“rm -f spacereport.html”);
system(“wget -q –no-proxy –no-check-certificate –post-data ‘id=admin&passwd=password’ ‘’ -O spacereport.html”);

#or just @spacereport=<$fh>;

#change array to hash
map {$pos{$index++}=$_} @spacereport;

#get location of <table> and </table>
#sort numerically ascending
for $char (sort {$a<=>$b} (keys %pos))
if($pos{$char} =~ /<table class=”c27″>/)

if($pos{$char} =~ /<\/table>/)


#get contents between <table> and </table>

#get clear text between <table> and </table>
my $hs=HTML::Strip->new();
my $clean_text = $hs->parse($table_htmlstr);


#remove empty array element
@array_filtered=grep { !/^\s+$/ } @array_filtered;
system(“rm -f space_mail_warning.txt”);
select $fh_mail_warning;
#put lines that has free space lower than 10% to space_mail_warning.txt
if($array_filtered[$j+2] <= 10){
print “Host: “.$array_filtered[$j].”\n”;
print “Part: “.$array_filtered[$j+1].”\n”;
print “Free(%): “.$array_filtered[$j+2].”\n”;
print “Free(GB): “.$array_filtered[$j+3].”\n”;
print “============\n\n”;
close $fh_mail_warning;

system(“rm -f space_mail_info.txt”);
select $fh_mail_info;
#put lines that has free space lower than 15% to space_mail_info.txt
if($array_filtered[$j+2] <= 15){
print “Host: “.$array_filtered[$j].”\n”;
print “Part: “.$array_filtered[$j+1].”\n”;
print “Free(%): “.$array_filtered[$j+2].”\n”;
print “Free(GB): “.$array_filtered[$j+3].”\n”;
print “============\n\n”;
close $fh_mail_info;

#send mail
#select STDOUT;
if(-s “space_mail_warning.txt”){
system(‘cat space_mail_warning.txt | /bin/mailx -s “Space Warning – please work with component owners to free space” [email protected]’);
} elsif(-s “space_mail_info.txt”){
system(‘cat space_mail_info.txt | /bin/mailx -s “Space Info – Space checking mail” [email protected]’);

Categories: Perl, Programming Tags:

wget and curl tips

March 14th, 2014 No comments

Imagine you want to download all files under, and not files under except for directory ‘downloads’, then you can do this:

wget -r –level 100 -nd –no-proxy –no-parent –reject “index.htm*” –reject “*gif” ‘’ #–level 100 is large enough, as I’ve seen no site has more than 100 levels of sub-directories so far.

wget -p -k –no-proxy –no-check-certificate –post-data ‘id=username&passwd=password’ <url> -O output.html

wget –no-proxy –no-check-certificate –save-cookies cookies.txt <url>

wget –no-proxy –no-check-certificate –load-cookies cookies.txt <url>

curl -k -u ‘username:password’ <url>

curl -k -L -d id=username -d passwd=password <url>

curl –data “loginform:id=username&loginform:passwd=password” -k -L <url>


Categories: Linux, Programming, SHELL Tags:

resolved – ssh Read from socket failed: Connection reset by peer and Write failed: Broken pipe

March 13th, 2014 No comments

If you met following errors when ssh to linux box:

Read from socket failed: Connection reset by peer

Write failed: Broken pipe

Then there’s one possibility that the linux box’s filesystem was corrupted. As in my case there’s output to stdout:

EXT3-fs error ext3_lookup: deleted inode referenced

To resolve this, you need make linux go to single user mode and fsck -y <filesystem>. You can get corrupted filesystem names when booting:

[/sbin/fsck.ext3 (1) -- /usr] fsck.ext3 -a /dev/xvda2
/usr contains a file system with errors, check forced.
/usr: Directory inode 378101, block 0, offset 0: directory corrupted

(i.e., without -a or -p options)

[/sbin/fsck.ext3 (1) -- /oem] fsck.ext3 -a /dev/xvda5
/oem: recovering journal
/oem: clean, 8253/1048576 files, 202701/1048233 blocks
[/sbin/fsck.ext3 (1) -- /u01] fsck.ext3 -a /dev/xvdb
u01: clean, 36575/14548992 files, 2122736/29081600 blocks

So in this case, I did fsck -y /dev/xvda2 && fsck -y /dev/xvda5. Later reboot host, and then everything went well.


If two VMs are booted up in two hypervisors and these VMs shared the same filesystem(like NFS), then after fsck -y one FS and booted up the VM, the FS will corrupt soon as there’re other copies of itself is using that FS. So you need first make sure that only one copy of VM is running on hypervisors of the same server pool.

Categories: Kernel, Linux Tags:

tcpdump tips

March 13th, 2014 No comments

tcpdump [ -AdDefIKlLnNOpqRStuUvxX ] [ -B buffer_size ] [ -c count ]

[ -C file_size ] [ -G rotate_seconds ] [ -F file ]
[ -i interface ] [ -m module ] [ -M secret ]
[ -r file ] [ -s snaplen ] [ -T type ] [ -w file ]
[ -W filecount ]
[ -E spi@ipaddr algo:secret,... ]
[ -y datalinktype ] [ -z postrotate-command ] [ -Z user ] [ expression ]

#general format of a tcp protocol line

src > dst: flags data-seqno ack window urgent options
Src and dst are the source and destination IP addresses and ports.
Flags are some combination of S (SYN), F (FIN), P (PUSH), R (RST), W (ECN CWR) or E (ECN-Echo), or a single ‘.’(means no flags were set)
Data-seqno describes the portion of sequence space covered by the data in this packet.
Ack is sequence number of the next data expected the other direction on this connection.
Window is the number of bytes of receive buffer space available the other direction on this connection.
Urg indicates there is ‘urgent’ data in the packet.
Options are tcp options enclosed in angle brackets (e.g., <mss 1024>).

tcpdump -D #list of the network interfaces available
tcpdump -e #Print the link-level header on each dump line
tcpdump -S #Print absolute, rather than relative, TCP sequence numbers
tcpdump -s <snaplen> #Snarf snaplen bytes of data from each packet rather than the default of 65535 bytes
tcpdump -i eth0 -nn -XX vlan
tcpdump -i eth0 -nn -XX arp
tcpdump -i bond0 -nn -vvv udp dst port 53
tcpdump -i bond0 -nn -vvv host testhost
tcpdump -nn -vvv “dst host and (dst port 1521 or dst port 6200)”

Categories: Life Tags:

psftp through a proxy

March 5th, 2014 No comments

You may know that, we can set proxy in putty for ssh to remote host, as shown below:

putty_proxyAnd if you want to scp files from remote site to your local box, you can use putty’s psftp.exe. There’re many options for psftp.exe:

C:\Users\test>d:\PuTTY\psftp.exe -h
PuTTY Secure File Transfer (SFTP) client
Release 0.62
Usage: psftp [options] [user@]host
-V print version information and exit
-pgpfp print PGP key fingerprints and exit
-b file use specified batchfile
-bc output batchfile commands
-be don’t stop batchfile processing if errors
-v show verbose messages
-load sessname Load settings from saved session
-l user connect with specified username
-P port connect to specified port
-pw passw login with specified password
-1 -2 force use of particular SSH protocol version
-4 -6 force use of IPv4 or IPv6
-C enable compression
-i key private key file for authentication
-noagent disable use of Pageant
-agent enable use of Pageant
-batch disable all interactive prompts

Although there’s proxy setting option for putty.exe, there’s no proxy setting for psftp.exe! So what should you do if you want to copy files back to local box, and there’s firewall blocking you from doing this directly, and you must use a proxy?

As you may notice, there’s “-load sessname” option in psftp.exe:

-load sessname Load settings from saved session

This option means that, if you have session opened by putty.exe, then you can use psftp.exe -load <session name> to copy files from remote site. For example, suppose you opened one session named mysession in putty.exe in which you set proxy there, then you can use “psftp.exe -load mysession” to copy files from remote site(no need for username/password, as you must have entered that in putty.exe session):

C:\Users\test>d:\PuTTY\psftp.exe -load mysession
Using username “root”.
Remote working directory is /root
psftp> ls
Listing directory /root
drwx—— 3 ec2-user ec2-user 4096 Mar 4 09:27 .
drwxr-xr-x 3 root root 4096 Dec 10 23:47 ..
-rw——- 1 ec2-user ec2-user 388 Mar 5 05:07 .bash_history
-rw-r–r– 1 ec2-user ec2-user 18 Sep 4 18:23 .bash_logout
-rw-r–r– 1 ec2-user ec2-user 176 Sep 4 18:23 .bash_profile
-rw-r–r– 1 ec2-user ec2-user 124 Sep 4 18:23 .bashrc
drwx—— 2 ec2-user ec2-user 4096 Mar 4 09:21 .ssh
psftp> help
! run a local command
bye finish your SFTP session
cd change your remote working directory
chmod change file permissions and modes
close finish your SFTP session but do not quit PSFTP
del delete files on the remote server
dir list remote files
exit finish your SFTP session
get download a file from the server to your local machine
help give help
lcd change local working directory
lpwd print local working directory
ls list remote files
mget download multiple files at once
mkdir create directories on the remote server
mput upload multiple files at once
mv move or rename file(s) on the remote server
open connect to a host
put upload a file from your local machine to the server
pwd print your remote working directory
quit finish your SFTP session
reget continue downloading files
ren move or rename file(s) on the remote server
reput continue uploading files
rm delete files on the remote server
rmdir remove directories on the remote server

Now you can get/put files as we used to now.


If you do not need proxy connecting to remote site, then you can use psftp.exe CLI to get remote files directly. For example:

d:\PuTTY\psftp.exe [email protected] -i d:\PuTTY\aws.ppk -b d:\PuTTY\script.scr -bc -be -v

And in d:\PuTTY\script.scr is script for put/get files:

cd /backup
lcd c:\
mget *.tar.gz

Categories: Linux, Systems Tags: ,

notes on Ten Steps to ITSM Success

February 25th, 2014 No comments
  • Step 1 – Setting the stage

1.   Draft a creditable Business Plan, complete with:

1.1   Clear executive sponsorship

1.2   Rudimentary financial analysis

1.3   Risk analysis

1.4   Organizational impact

1.5   Analysis of alternatives

1.6   Assumptions and constraints

1.7   Recommended implementation approach.

2.   Offer a proposed execution plan.

3.   Identify required resources.

4.   Execute a training and awareness campaign.

  • Step 2 - Inventory the current service offering

1. Gain agreement on current service offerings.
2. Develop cost types and categories.
3. Quantify the cost of each service.
4. Interview key stakeholders.
5. Validate findings with the business sponsor and stakeholders.

  • Step 3 - Validate the current service model

1. Identify and engage with key stakeholders across functional areas.
2. Develop a needs/services questionnaire jointly with customer representatives.
3. Decide on a “best means to an end” – i.e. conduct one-on-one interviews, or facilitate group workshops.
4. Agree and document business value-based, rank-ordered IT service requirements using a tool such as CTQ Tree.
5. Analyze results and develop Heat Maps and service maps.
6. Discuss results with the business, highlighting cost/trade-off areas.

  • Step 4 - Establish an itsm steering committee

1. Assemble an ITSM Steering Committee with cross-organizational representation.
2. Draft a charter outlining the Committee’s role, responsibilities, and scope of authority.
3. Educate the Committee members on their duties and areas of responsibility.
4. Formalize a standardized, repeatable communication strategy, and use it consistently.
5. Create a repository for housing and maintaining a historical record of Committee decisions and issues.
6. Ensure the ITSM Steering Committee is properly aligned with the organization’s enterprise governance model, including other groups with which it must interact.

  • Step 5 - Define the ideal target state

1. Articulate the company’s vision and mission statements. If they do not exist, engage your senior leadership and create them.
2. List the organization’s strategic goals.
3. If one does not already exist, create an organizational strategic plan that incorporates the ITSM Transformation effort.
4. Define specific, measurable, achievable, realistic and time-bound objectives that will achieve the articulated goals.
5. Plan the tasks necessary to achieve the objectives.
6. Create an IT Ecosystem detailing the interactions and relationships in the target state you wish to achieve.
7. Validate that your service management system:
— a. Supports delivery of target state services and agreed service levels
— b. Conforms to architectural policies, principles and guidelines

—c. Defines interfaces and integration points (people, processes, tools and information).

  • Step 6 - Create the IT strategic and tactical plans

1. Negotiate the order in which prioritized capabilities will be developed.
2. Achieve an optimal outcome for your ITSM Transformation by advising on trade-offs and emphasizing shared services, infrastructure and processes.
3. Issue a Notice of Decision when negotiations are complete.
4. Produce an IT Strategic Plan that addresses how IT will build, operate and sustain the capabilities required to deliver customer requirements.
5. Build a Goal Linking matrix.
6. Develop a Program Management Plan that defines the portfolio of projects required to execute the ITSM Steering Committee’s priorities.
7. Generate and publish the ITSM Transformation Roadmap to project staff and all relevant stakeholders, as well as to the Business Sponsor.
8. Construct, approve and publish tactical project plans.

  • Step 7 - Define organizational roles and responsibilities

1. Assess staff skills, functions, authority, accountability, roles, responsibilities and required level of supervision.
2. Schedule executive off-site session(s) with clearly established and enforced “rules of the game.”
3. Build out a top-level enterprise RACI that aligns to enterprise governance.
4. Validate and continue Organizational Change Management activities.

  • Step 8 - Standardized development approach

1. Decide upon and implement a standard process design framework.
2. Agree upon – and then publicize – enterprise standards.
3. Reach consensus on the broad activities each service and underlying process must execute.
4. Charter and staff integrated development teams.
5. Document tool automation requirements and procure licenses for the selected suite of tools.
6. Incorporate project management practices into your design plans.

  • Step 9 - Strategy and planning

1. Assess the planned capability development and prioritization.
2. Update previous assumptions and constraints.
3. Validate the scope of each planned capability.
4. Review and validate stipulated timelines.
5. Combine development activities, where applicable.
6. Create working project plan with milestones and agreed-upon deliverables.
7. Ensure proper allocation and utilization of planned resources.
8. Build a fullyloaded operational project plan.

  • Step 10 - Logical and physical design

1. Construct a business-specific logical design for business users, processing systems and data.
2. Define an enterprise data classification scheme and governance model aligned to security policies.
3. Create policies controlling how and under what circumstances data may be accessed (by people, systems and other data elements).
4. Convene an Administrative Review Session with stakeholders to validate the logical model.
5. Initiate physical design activities.
6. Draft initial transition readiness plan.

  • Step 11 - Build and test

1. Prepare the test facility (development and test environments).
2. Configure, integrate and test the selected tool suite.
3. Create a representative bed of test data.
4. Build the unit and integration test plans.
5. Create an initial draft of user and operator training plans.
6. Design the deployment plan.
7. Conduct the formal Acceptance Testing criteria.

  • Step 12 - Conduct service and process health assessment

1. Pause development activities to take the pulse of your ITSM Improvement Initiative.
2. Approach the Business Sponsor and request an independent Third-Party Service and Process Health Assessment.
3. Validate/refine the list of pertinent questions and oversee data collection.
4. Analyze Assessment results and remediate any identified gaps.

  • Step 13 - Analysis and deployment

1. Remediate discrepancies discovered during Service and Process health assessment.
2. Conduct simulation exercises(if feasible).
3. Execute approved training plans.
4. Prepare the environment for the upcoming change.
5. Schedule the deployment date.
6. Review and validate the back-out plan.
7. Deploy the approved Release Package.

  • Step 14 - Operation and sustainment

1. Ensure operational staff can sustain the new or modified environment.
2. Monitor activities to validate services are performing as desired.
3. Address and correct unforeseen capacity and availability issues.
4. Add or update service and operational documentation to the enterprise Knowledge Repository.

  • Step 15 - Balanced scorecard and continual improvement

1. Verify the target operational steady state has been achieved.
2. Develop and implement an IT Balanced Scorecard (IT BSC):
— Create measures that show that IT’s tangible and intangible to the business.
— Define a balanced set of metrics across the four key areas.
3. Position Continual Improvement as an overall Quality Management approach focused on the voice of the customer (VOC).
4. Create an enterprise Continual Improvement (CSI) Strategy:
— Define a standardized, data-driven, cross-organizational approach to continual improvement across the enterprise.
— Define key goals and objectives in alignment with enterprise strategy.
— Establish strong executive sponsorship and stakeholder buy-in.
— Anticipate argument against the CSI effort and develop countermeasures.
5. Create an enterprise Continual Improvement (CSI) Plan:
— Define the detailed Scope, Stakeholders, and Standards (three S’s).
— Develop an integrated CSI model combining “best of breed” frameworks.
— Design an organizational construct for a dedicated CSI Office/Team.
— Define and staff key CSI roles to execute the Plan.
— Define the Maturity Model, Process Model, Project Selection, Metrics, Auditing and Tools.
6. Use standardized templates to document, analyze, prioritize and manage enterprise improvement opportunities.

  • Step 16 - Putting it all together


This article are excerpts from book <Ten Steps to ITSM Success>.

Categories: IT Architecture Tags: ,

ITIL – Looking at an Example of a Service Design Project

February 18th, 2014 No comments

Dummy Co. is a commercial IT service provider. Dummy Co. has implemented ITIL and allocated many of the service management roles to members of the technical teams, as Figure 12-4 illustrates.

figure 12-4

An account manager has found a new customer. The account manager is Henry, and one of his roles is to act as the business relationship manager. The customer requires an invoicing service and doesn’t want to acquire and run the system itself. It wants to utilise it as part of a cloud (see the earlier section ‘Considering system architecture’ for an explanation of a cloud) from a software as a service (SaaS) provider. This means that the users will access the invoicing service from their Internet browsers over a private Internet connection, and use Dummy Co’s software application and hardware.

Henry has established the customer’s business requirements. These have been reviewed using the service portfolio management process (see Chapter 4). Dummy Co. has an existing application that can be used to deliver the service, and this will fulfil most of the utility requirements. However, the warranty requirements are quite demanding and exceed any service levels offered to existing customers. A business case has been established and approved via service portfolio management with a little help from the financial management and demand management processes (see Chapter 4). There is a considerable return on investment (ROI) to be gained from the venture. The design project is initiated. Charlie is the service delivery manager, and she has been allocated the role of design coordination manager, so she takes charge of the project.

Henry has invited Charlie, who also acts as the service level manager, to a meeting with the customer to identify the service level requirements. The service level requirements are as follows:

  • The service will be used at several locations worldwide, so it must be available 24/7, five days a week. Any single failure leading to loss of service at a site must be fixed within 30 minutes, and it will not break more than twice in any one week.
  • The maximum number of people using the service at any one time is likely to be around 3,000. Because the users are spread across various sites, demand will not fluctuate much throughout the day. The response time of the system must be less than two seconds.
  • As the service provider, you need to have appropriate disaster recovery facilities in place in the event that you have a disaster. As the customer, in the event that we have a disaster and want to relocate our staff, we require the ability for the service to be accessible at alternative locations in less than one hour.
  • The data protection act requires us to protect our client’s information. All invoice data must be protected and must be restorable in the event of a cyber-attack. Our data must be backed up and recoverable in the event that you suffer a security breach.

Charlie takes these requirements and documents them as the SLRs(Service Level Requirements). She follows a process flow like the one in Figure 12-2. Charlie reviews the new requirements and compares them with the targets in the existing OLA(Operational Level Agreement) and UC(Underpinning Contract) to see whether there are other services that are delivered to these service levels. If it turns out that these service levels have not been offered before, or there is doubt as to whether they can be achieved, then the technical designers must be consulted.

Well, Dummy Co.’s IT department has just one technical architect, and he is called Fred. Fred does what he always does when he has a new design project to get to grips with: he looks at the architecture of the infrastructure. He looks to see whether there is enough network equipment to cope with the additional load that the new service will need. He looks at the servers in the data centre to see whether there are enough of the right type. He looks at the data storage to see how much is in use. Pretty much what any technical architect would do.

However, Fred has many roles. For the projects that he undertakes, he performs the roles of availability manager, capacity manager, security manager and IT service continuity manager. So when he is reviewing the infrastructure, he takes the SLRs and looks at the requirements from the four warranty points of view. In fact, Fred takes the SLRs and converts them into detailed requirements for availability, capacity, continuity and security, and then considers the requirements in his musings. The work that Fred is doing is sometimes called a capability review. The results of this are that Fred produces an outline design and proposal of how the new requirements will be met. This of course comes at a cost, so Fred will obtain costs and quotes for any additional resources that are needed.

The main content of Fred’s proposal is as follows:

  • To fulfil the 24/7 availability requirement, additional server equipment will be required to provide the necessary resilience. The network currently operates at this level, and no additional resilience is required.
  • To meet the capacity and performance requirement, additional data storage will be added to the existing storage area network. This will also contribute towards the security requirement.
  • In addition, to meet the security requirement, additional data backups are required, and an additional off-site storage facility is needed.
  • The continuity requirement can be met with the existing recovery facility, but the recovery plans must be updated to ensure that the new serv- ice is recovered within the target time.

Fred sends the proposal back to Charlie. The next step is for Dummy Co. to decide whether it wants to spend the money required by Fred’s proposal in order to upgrade the infrastructure, agree to the customer’s requirements and win the business. This involves Charlie, Henry and Dummy Co. senior management. The decision is made to go ahead. Hurrah!

Henry and Charlie arrange to see the customer, and get into negotiation about the finer points of the SLA and, of course, the commercial stuff. Once the SLA is agreed, the detailed design work is started and Charlie liaises with Fred and others in the technical management team to help with any issues as they arise. When the design work is complete, Fred creates an SDP(Service Design Package). This marks the end of the design project. Of course the next step is to build, test and implement the solution – but that’s another story that comes under service transition, which I cover in Chapters 7 and 13.

Figure 12-5 provides an overview of the example in this section. It is not complete as I need a much bigger piece of paper than a page of this book to show it all, but hopefully it gives you an idea.

A swimlane flow diagram like the one in Figure 12-5 is a really good way to visualise how the processes will work in your organisation.

figure 12-5

PS: This article is from book <ITIL For Dummies, 2011 Edition>.


Categories: IT Architecture Tags: ,

checking MTU or Jumbo Frame settings with ping

February 14th, 2014 No comments

You may set your linux box’s MTU to jumbo frame sized 9000 bytes or larger, but if the switch your box connected to does not have jumbo frame enabled, then your linux box may met problems when sending & receiving packets.

So how can we get an idea of whether Jumbo Frame enabled on switch or linux box?

Of course you can log on switch and check, but we can also verify this from linux box that connects to switch.

On linux box, you can see the MTU settings of each interface using ifconfig:

[root@centos-doxer ~]# ifconfig eth0
eth0 Link encap:Ethernet HWaddr 08:00:27:3F:C5:08
RX packets:50502 errors:0 dropped:0 overruns:0 frame:0
TX packets:4579 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:9835512 (9.3 MiB) TX bytes:1787223 (1.7 MiB)
Base address:0xd010 Memory:f0000000-f0020000

As stated above, 9000 here doesn’t mean that Jumbo Frame enabled on your box to switch. As you can verify with below command:

[root@testbox ~]# ping -c 2 -M do -s 1472 testbox2
PING ( 1472(1500) bytes of data. #so here 1500 bytes go through the network
1480 bytes from ( icmp_seq=1 ttl=252 time=0.319 ms
1480 bytes from ( icmp_seq=2 ttl=252 time=0.372 ms

— ping statistics —
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.319/0.345/0.372/0.032 ms
[root@testbox ~]#
[root@testbox ~]#
[root@testbox ~]# ping -c 2 -M do -s 1473 testbox2
PING ( 1473(1501) bytes of data. #so here 1501 bytes can not go through. From here we can see that MTU for this box is 1500, although ifconfig says it’s 9000
From ( icmp_seq=1 Frag needed and DF set (mtu = 1500)
From ( icmp_seq=1 Frag needed and DF set (mtu = 1500)

— ping statistics —
0 packets transmitted, 0 received, +2 errors

Also, if your the switch is Cisco one, you can verify whether the switch port connecting server has enabled jumbo frame or not by sniffing CDP (Cisco discover protocol) packet. Here’s one example:

-bash-4.1# tcpdump -i eth0 -nn -v -c 1 ether[20:2] == 0×2000 #ether[20:2] == 0×2000 means capture only packets that have a 2 byte value of hex 2000 starting at byte 20
tcpdump: WARNING: eth0: no IPv4 address assigned
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
03:44:14.221022 CDPv2, ttl: 180s, checksum: 692 (unverified), length 287
Device-ID (0×01), length: 46 bytes: ‘’
Address (0×02), length: 13 bytes: IPv4 (1)
Port-ID (0×03), length: 16 bytes: ‘Ethernet111/1/12′
Capability (0×04), length: 4 bytes: (0×00000228): L2 Switch, IGMP snooping
Version String (0×05), length: 66 bytes:
Cisco Nexus Operating System (NX-OS) Software, Version 5.2(1)N1(4)
Platform (0×06), length: 11 bytes: ‘N5K-C5548UP’
Native VLAN ID (0x0a), length: 2 bytes: 123
AVVID trust bitmap (0×12), length: 1 byte: 0×00
AVVID untrusted ports CoS (0×13), length: 1 byte: 0×00
Duplex (0x0b), length: 1 byte: full
MTU (0×11), length: 4 bytes: 1500 bytes #so here MTU size was set to 1500 bytes
System Name (0×14), length: 18 bytes: ‘ucf-c1z3-swi-5k01b’
System Object ID (not decoded) (0×15), length: 14 bytes:
0×0000: 060c 2b06 0104 0109 0c03 0103 883c
Management Addresses (0×16), length: 13 bytes: IPv4 (1)
Physical Location (0×17), length: 13 bytes: 0×00/snmplocation
1 packets captured
1 packets received by filter
0 packets dropped by kernel
110 packets dropped by interface


  1. As for “-M do” parameter for ping, you may refer to man ping for more info. And as for DF(don’t fragment) and Path MTU Discovery mentioned in the manpage, you may read more on and
  2. Here’s more on tcpdump tips and
  3. Maximum packet size is the MTU plus the data-link header length. Packets are not always transmitted at the Maximum packet size. As we can see from output of iptraf -z eth0.
  4. Here’s more about MTU:

The link layer, which is typically Ethernet, sends information into the network as a series of frames. Even though the layers above may have pieces of information much larger than the frame size, the link layer breaks everything up into frames(which in payload encloses IP packet such as TCP/UDP/ICMP) to send them over the network. This maximum size of data in a frame is known as the maximum transfer unit (MTU). You can use network configuration tools such as ip or ifconfig to set the MTU.

The size of the MTU has a direct impact on the efficiency of the network. Each frame in the link layer has a small header, so using a large MTU increases the ratio of user data to overhead (header). When using a large MTU, however, each frame of data has a higher chance of being corrupted or dropped. For clean physical links, a high MTU usually leads to better performance because it requires less overhead; for noisy links, however, a smaller MTU may actually enhance performance because less data has to be re-sent when a single frame is corrupted.

Here’s one image of layers of network frames:



ITIL – From incidents to problems to changes

February 12th, 2014 No comments

The following figure shows the path from incident to problem to change. Keep this figure in mind as you read an example of this journey.

from incidents to problems to changes

from incidents to problems to changes

A user is innocently using her desktop system one day, and all of a sudden one of those dialog boxes pops up on the screen informing her she has a message. The message tells her that the spreadsheet software application has encountered error 3479 and can’t save the current spreadsheet to the user’s home folder on the network. It then gives a choice: ‘do you want to click on OK or Cancel?’ Now, this user lives in the ideal world of ITIL where everyone does things properly. So, what does the user do when faced with this error? She rings up the service desk of course.

The service desk analyst logs the call as an incident and goes through the incident management process. The analyst attempts initial diagnosis by searching the KEDB to look for a resolution, but doesn’t find one. Next the analyst escalates the incident to second-line support, not to investigate the problem but to search for a way to restore the service to the user. The incident is allocated to an engineer who manages to replicate the incident and advises the service desk analyst to tell the user to click cancel and save the file to a local folder instead of the network home folder. The service desk analyst contacts the user and the user saves the file locally. The user and service desk analyst agree that, because the service is now restored, the incident can be closed and no further action is required. The analyst tells the user to ring if it happens again. Are you happy with the story so far? It’s only happened once – IT systems are like that!

The next day the same error happens to other users and the service desk logs six incidents and links them together. In each case, the user has stored a file locally and gone away happy. However, would you like the IT department to do something more? It’s at this point that a problem record should be created. The service desk has the ability to do this, so a problem record is raised. The problem must now be allocated to an engineer or technical team to investigate and find the cause of the incidents. After some time, you hear a shout of ‘eureka!’ as the engineer identifies the root cause. The engineer creates a known error record documenting the cause and confirming that a suitable workaround is to save files locally. This information is passed to the service desk. This will hopefully make it easier for the service desk to deal with other incidents.

Now there is a decision to be made. The engineer who identified the root cause also identified a permanent solution. In this case, the cause was a change that had been made to the configuration of a number of servers to accommodate another system. This has had the effect of denying some users access to this server. The permanent solution is to make another change to the server. A decision must be made as to whether an RFC should be raised to implement the solution. Not all known errors will result in changes being made. Some will be too costly, some will be negated by new version updates. In this case, I’ll assume that the RFC is raised. It is now the role of the change management process (see Chapter 7) to manage the change through to conclusion.

When the change is implemented and reviewed, the change manager will close the change record. This will set off a game of knock down dominoes: the known error record and problem record will be closed, and if any incident records are open, these will be closed. The problem is resolved and no related incidents will happen again. And they all live happily ever after.

This is just one example, and I’m sure you can think of many more. It’s the principle that’s important: you must have a balanced approach to the sequence of activities.


This article is from book <ITIL For Dummies, 2011 Edition>.

Categories: IT Architecture Tags:

Oracle VM operations – poweron, poweroff, status, stat -r

January 27th, 2014 No comments

Here’s the script:

#1.OVM must be running before operations status before running poweroff or poweron
use Net::SSH::Perl;
$host = $ARGV[0];
$operation = $ARGV[1];
$user = ‘root’;
$password = ‘password’;

if($host eq “help”) {
print “$0 OVM-name status|poweron|poweroff|stat-r\n”;

$ssh = Net::SSH::Perl->new($host);

if($operation eq “status”) {
($stdout,$stderr,$exit) = $ssh->cmd(“ovm -uadmin -pwelcome1 vm ls|grep -v VM_test”);
select $host_fd;
print $stdout;
close $host_fd;
} elsif($operation eq “poweroff”) {
if($_ =~ “Server_Pool|OVM|Powered”) {
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+([a-zA-Z]{1,})\s+(.*)/){
$ssh->cmd(“ovm -uadmin -pwelcome1 vm poweroff -n $1 -s $6″);
sleep 12;
} elsif($operation eq “poweron”) {
if($_ =~ “Server_Pool|OVM|Running”) {
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+([a-zA-Z]{1,})\s+Off(.*)/){
$ssh->cmd(“ovm -uadmin -pwelcome1 vm poweron -n $1 -s $6″);
#print “ovm -uadmin -pwelcome1 vm poweron -n $1 -s $6″;
sleep 20;
} elsif($operation eq “stat-r”) {
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+(Shutting\sDown|Initializing)\s+(.*)/){
#print “ovm -uadmin -pwelcome1 vm stat -r -n $1 -s $6″;
$ssh->cmd(“ovm -uadmin -pwelcome1 vm stat -r -n $1 -s $6″);
sleep 1;

You can use the following to make the script run in parallel:

for i in <all OVMs>;do (./ $i status &);done

Categories: Clouding, IT Architecture, Oracle Cloud, Perl Tags:

avoid putty ssh connection sever or disconnect

January 17th, 2014 2 comments

After sometime, ssh will disconnect itself. If you want to avoid this, you can try run the following command:

while [ 1 ];do echo hi;sleep 60;done &

This will print message “hi” every 60 seconds on the standard output.


You can also set some parameters in /etc/ssh/sshd_config, you can refer to

Categories: Linux, SHELL, Unix Tags:

“Include snapshots” made NFS shares from ZFS appliance shrinking

January 17th, 2014 No comments

Today I met one weird issue when checking one NFS share mounted from ZFS appliance. The NFS filesystem mounted on client was shrinking when I removed files as the space on that filesystem was getting low. But what made me confused was that the filesystem’s size would getting lower! Shouldn’t the free space getting larger and the size keep unchanged?

After some debugging, I found that this was caused by ZFS appliance shares’ “Include snapshots”. When I uncheck “Include snapshots”, the issue was gone!


Categories: Hardware, NAS, Storage Tags:

Service Lifecycle in ITIL and applying the service lifecycle to IT projects

January 15th, 2014 No comments





Here’s a brief description of each activity of a typical project and its relation to the service lifecycle:

check.png Business case and project initiation: You use a business case to justify the cost and effort involved in providing the new service or changing an existing service. The business case triggers the project initiation. These activities happen at the service strategy stage.

check.png Requirements gathering and analysis: You identify and analyse the detailed requirements of the service or change. These activities happen in the service design stage.

check.png Design: You produce a design of the service that meets the requirements. This is usually a paper-based design at this point. These activities take place in the service design stage.

check.png Build: The physical bit where you acquire the solution, such as building the hardware, the servers and networks, or programming the software application. These activities happen in the service transition stage.

check.png Test: Testing the service is essential to ensure it meets the needs of the business, works in the way you expected, and can be supported. These activities also take place during the service transition stage.

check.png Implement or deploy: Launching the new or changed service into the live operational environment. This takes place during the service transition stage.

check.png Deliver and support: The service is now in the live or production environment and is being used by the users. The IT organisation must make sure the service is working and fix it quickly when it goes wrong. These activities take place during the service operation stage.

check.png Improve: After a service has been operated for some time, it’s often possible to optimise or improve the way it’s delivered. These activities are part of the CSI stage.

PS: This is from book <ITIL For Dummies, 2011 Edition>.

Categories: IT Architecture Tags:

roles in ITIL

January 14th, 2014 No comments

ITIL is best practice guidance and describes capabilities for managing IT services. The four main elements of these capabilities are: processes, functions, roles and the service lifecycle.

Implementing ITIL in your organisation means using the ITIL service management processes. Processes provide a documented way of doing things. When adopting the ITIL processes you must consider who is going to do what. This brings me to functions. Functions are organisational units, like a team or department, and are the source of the people who perform the process activities. ITIL offers some advice about the functions you may have in your organisation.

The combination of processes, functions and roles allows you to make best use of service management in your organisation.

Knowing who does what is essential to the success of ITIL. The ITIL books suggest lots of roles associated with each process, and I give a brief description of them in the relevant chapters. However, you benefit from knowing a few really important roles from the outset.


The service owner

The service owner owns a service. The service owner is usually someone in the IT provider organisation, and the role provides a point of contact for a given service. The service owner doesn’t necessarily know everything about the service, but he does know a man (or woman) who does.

Here are some responsibilities of the service owner role:

check.png Participates in internal service review meetings

check.png Represents the service across the organisation

check.png Represents the service in change advisory board meetings

check.png Is responsible for continual improvement of the service and management of change in the service

check.png Understands the service and its components

The process owner

A process owner owns a process. This role is accountable for the process. For example, if the incident management process doesn’t achieve its aim of restoring the service to the user, the process owner gets shouted at (hopefully not literally). The process owner is accountable for the process and is responsible for identifying improvements to ensure that the process continues to be effective and efficient. Here are a few responsibilities of the role:

check.png Ensuring that the process is performed in accordance with the agreed and documented process

check.png Documenting and publicising the process

check.png Defining and reviewing the measurement of the process using metrics such as key performance indicators (KPIs)

remember.eps You must ensure that every service management process you adopt has a defined process owner.

The process manager

A process owner (see the previous section) is accountable for the process, but may not get involved in the day-to-day management of the process. This is a separate role often allocated to a different person: the process manager.

itildefinition.eps A process manager is responsible for operational management of a process. The process manager’s responsibilities include planning and coordination of all activities required to carry out, monitor and report on the process.

One process may have several process managers, for example it may have regional change managers or IT service continuity managers for each data centre.

remember.eps You must ensure that every service management process that you adopt has a defined process manager – though this may, of course, be the same person as the process owner.

The process practitioner

The process practitioner is the role that carries out one or many of the process activities. Basically, these people are the ones who do the work. However, it’s important that they have a clear list of responsibilities related to the process that they get involved in.


This is from book <ITIL For Dummies, 2011 Edition>.

Categories: IT Architecture Tags: ,