Archive

Author Archive

resolved – ESXi Failed to lock the file

January 13th, 2014 No comments

When I was power on one VM in ESXi, one error occurred:

An error was received from the ESX host while powering on VM doxer-test.
Cannot open the disk ‘/vmfs/volumes/4726d591-9c3bdf6c/doxer-test/doxer-test_1.vmdk’ or one of the snapshot disks it depends on.
Failed to lock the file

And also:

unable to access file since it is locked

This apparently was caused by some storage issue. I firstly googled and found most of the posts were telling stories about ESXi working mechanism, and I tried some of them but with no luck.

Then I thought of that our storage datastore was using NFS/ZFS, and NFS has file lock issue as you know. So I mount the nfs share which datastore was using and removed one file named lck-c30d000000000000. After this, the VM booted up successfully! (or we can log on ESXi host, and remove lock file there also)

Categories: NAS, Oracle Cloud, Storage Tags:

install java jdk on linux

January 7th, 2014 No comments

Here’s the steps if you want to install java on linux:

wget <path to jre-7u25-linux-x64.rpm> -P /tmp
rpm -ivh /tmp/jre-7u25-linux-x64.rpm
mkdir -p /root/.mozilla/plugins
rm -f /root/.mozilla/plugins/libnpjp2.so
ln -s /usr/java/jre1.7.0_25/lib/amd64/libnpjp2.so /root/.mozilla/plugins/libnpjp2.so
ll /root/.mozilla/plugins/libnpjp2.so

Categories: Java, Linux Tags: ,

add another root user and set password

January 7th, 2014 No comments

In linux, do the following to add another root user and set password:

mkdir -p /home/root2
useradd -u 0 -o -g root -G root -s /bin/bash -d /home/root2 root2
echo password | passwd –stdin root2

Categories: Linux Tags:

self defined timeout for telnet on Linux

December 26th, 2013 No comments

telnet’s default timeout value is relative high, so you may want to change timeout value to lower value such as 5 seconds. Here’s the way that we can fulfill this:

#!/bin/bash

timeout()
{
waitfor=5
command=$*
$command &
commandpid=$!
( sleep $waitfor ; kill -9 $commandpid > /dev/null 2>&1 ) &
watchdog=$!
sleeppid=$PPID
wait $commandpid > /dev/null 2>&1
kill $sleeppid > /dev/null 2>&1
}

timeout telnet slcc29-scan1.us.oracle.com 1521 >> $output

Also, we can use expect and set timeout for expect. When telnet is integrated with expect, we can fulfill timeout for telnet through using expect’s timeout value:

#!/usr/bin/expect

set timeout 30

send “<put telnet command here>\r”

Categories: Programming, SHELL Tags: , ,

Add static routes in linux which will survive reboot and network bouncing

December 24th, 2013 No comments

We can see that in linux, the file /etc/sysconfig/static-routes is revoked by /etc/init.d/network:

[root@test-linux ~]# grep static-routes /etc/init.d/network
# Add non interface-specific static-routes.
if [ -f /etc/sysconfig/static-routes ]; then
grep “^any” /etc/sysconfig/static-routes | while read ignore args ; do

So we can add rules in /etc/sysconfig/static-routes to let network routes survive reboot and network bouncing. The format of /etc/sysconfig/static-routes is like:

any net 10.247.17.0 netmask 255.255.255.192 gw 10.247.10.1
any net 10.247.11.128 netmask 255.255.255.192 gw 10.247.10.1

To make route in effect immediately, you can use route add:

route add -net 192.168.62.0 netmask 255.255.255.0 gw 192.168.1.1

But remember that to change the default gateway, we need modify /etc/sysconfig/network(modify GATEWAY=).

After the modification, bounce the network using service network restart to make the changes in effect.

remove duplicate images using fdupes and expect in linux

December 13th, 2013 No comments

I’ve got several thousands of pictures, but most of them had several exact copies of themselves. So I had to remove duplicate ones by hand firstly.

Later, I thought of that in linux we had md5sum which will give the same string for files with exact same contents. Then I tried to write some program, and that toke me some while.

I searched google and found that in linux, we had fdupes which can do the job very well. fdupes will calculate duplicate files based on file size/md5 value, and will prompt you to reserve one copy or all copies of the duplicates and remove others if you gave -d parameter to it. You can read more about fdupes here http://linux.die.net/man/1/fdupes

As all the pictures were on a windows machine, so I installed cygwin and installed fdupes and expect. Later I wrote a small script to reserve only one copy of the duplicate pictures for me(you will have to enter your option either reserving one copy or all copies by hand if you do not use expect, as there’s no option for reserve one copy by the author of fdupes). Here’s my program:

$ cat fdupes.expect
#!/usr/bin/expect
set timeout 1000000
spawn /home/andy/fdupes.sh
expect “preserve files” {
send “1\r”;exp_continue
}

$ cat /home/andy/fdupes.sh
fdupes.exe -d /cygdrive/d/pictures #yup, my pictures are all on this directory on windows, i.e. d:\pictures

After this, you can just run fdupes.expect, and it will reserve only one copy and remove other duplicates for you.

Categories: Programming, SHELL Tags: ,

Common storage multi path Path-Management Software

December 12th, 2013 No comments
Vendor Path-Management Software URL
Hewlett-Packard AutoPath, SecurePath www.hp.com
Microsoft MPIO www.microsoft.com
Hitachi Dynamic Link Manager www.hds.com
EMC PowerPath www.emc.com
IBM RDAC, MultiPath Driver www.ibm.com
Sun MPXIO www.sun.com
VERITAS Dynamic Multipathing (DMP) www.veritas.com
Categories: HA, Hardware, IT Architecture, SAN, Storage Tags:

SAN ports

December 10th, 2013 No comments

Basic SAN port modes of operation

The port’s mode of operation depends on what’s connected to the other side
of the port. Here are two general examples:
✓ All hosts (servers) and all storage ports operate as nodes (that is, places
where the data either originates or ends up), so their ports are called
N_Ports (node ports).
✓ All hub ports operate as loops (that is, places where the data travels in a
small Fibre Channel loop), so they’re called L_Ports (loop ports).
Switch ports are where it gets tricky. That’s because switch ports have mul-
tiple personalities: They become particular types of ports depending on what
gets plugged into them (check out Table 2-2 to keep all these confusing port
types straight). Here are some ways a switch port changes its function to
match what’s connected to it:
✓ Switch ports usually hang around as G_Ports (global ports) when nothing is
plugged into them. A G_Port doesn’t get a mode of operation until
something is plugged into it.
✓ If you plug a host into a switch port, it becomes an F_Port (fabric port).
The same thing happens if you plug in a storage array that’s running the
Fibre Channel-Switched (FC-SW) Protocol (more about this protocol in
the next section).
✓ If you plug a hub into a switch port, you get an FL_Port (fabric-to-loop
port); hub ports by themselves are always L_Ports (loop ports).
✓ When two switch ports are connected together, they become their own
small fabric, known as an E_Port (switch-to-switch expansion port) or a
T_Port ( Trunk port).
✓ A host port is always an N_Port (node port) — unless it’s attached to a
hub, in which case it’s an NL_port (node-to-loop port).
✓ A storage port, like a host port, is always an N_Port — unless it’s
connected to a hub, in which case it’s an NL_Port.
If that seems confusing, it used to be worse. Believe it or not, different switch
vendors used to name their ports differently, which confused everyone. Then
the Storage Network Industry Association (SNIA) came to save the day and
standardized the names you see in Figure 2-19.
If you want to get a good working handle on what’s going on in your SAN, use
Table 2-2 to find out what the port names mean after all the plugging-in is done.

Protocols used in a Fibre Channel SAN

Protocols are, in effect, an agreed-upon set of terms that different computer
devices use to communicate with one another. A protocol can be thought of
as the common language used by different types of networks. You’ll encoun-
ter three basic protocols in the Fibre Channel world:
✓ FC-AL: Fibre Channel-Arbitrated Loop Protocol is used by two devices
communicating within a Fibre Channel loop (created by plugging the
devices into a hub). Fibre Channel loops use hubs for the cable connec-
tions among all the SAN devices. Newer storage arrays that have internal
fiber disks use Fibre Channel loops to connect the disks to the array,
which is why they can have so many disks inside: Each loop can handle
126 disks, and you can have many loops in the array. The array uses the
FC-AL protocol to talk to the disks.
Each of the possible 126 devices on a Fibre Channel loop takes a turn
communicating with another device on the loop. Only one conversa-
tion can occur at a time; the protocol determines who gets to talk when.
Every device connected to the loop gets a loop address (loop ID) that
determines its priority when it uses the loop to talk.
✓ FC-SW: Fibre Channel-Switched Protocol is used by two devices commu-
nicating on a Fibre Channel switch. Switch ports are connected over a
backplane, which allows any device on the switch to talk to any other
device on the switch at the same time. Many conversations can occur
simultaneously through the switch. A switched fabric is created by con-
necting Fibre Channel switches; such a fabric can have thousands of
devices connected to it.
Each device in a fabric has an address called a World Wide Name (WWN)
that’s hard-coded at the factory onto the host bus adapter (HBA) that
goes into every server and every storage port. The WWN is like the
telephone number of a device in the fabric (or like the MAC address of
a network card) When the device is connected to the fabric, it logs in to
the fabric port, and its WWN registers in the name server so the switch

knows it’s connected to that port. The WWN is also sometimes called a
WWPN, or World Wide Port Name.
The WWN and a WWPN are the exact same thing, the actual address
for a Fibre Channel port. In some cases, large storage arrays can also
have what is known as a WWNN, or World Wide Node Name. Some Fibre
Channel storage manufactures use the WWNN for the entire array, and
then use an offset of the WWN for each port in the array for the WWPN.
I guess this is a Fibre Channel storage manufactures way of making the
World Wide Names they were given by the standards bodies last longer.
You can think of the WWNN as the device itself, and the WWPN as the
actual port within the device, but in the end, it’s all just a WWN.
The name server is like a telephone directory. When one device wants
to talk to another in the fabric, it uses the other device’s phone number
to call it up. The switch protocol acts like the telephone operator. The
first device asks the operator what the other device’s phone number is.
The operator locates the number in the directory (the name server) in
the switch, and then routes the call to the port where the other device is
located.
There is a trick you can use to determine whether the WWN refers to a
server on the fabric or a storage port on the fabric. Most storage ports’
WWN always start with the number 5, and most host bus adapters’ start
with either a 10 or a 21 as the first hexadecimal digits in the WWN. Think
of it like the area code for the phone number. If you see a number like
50:06:03:81:D6:F3:10:32, its probably a port on a storage array. A
number like 10:00:00:01:a9:42:fc:06 will be a servers’ HBA WWN.
✓ SCSI: The SCSI protocol is used by a computer application to talk to its
disk-storage devices. In a SAN, the SCSI protocol is layered on top of
either the FC-AL or FC-SW protocol to enable the application to get to
the disk drives within the storage arrays in a Fibre Channel SAN. This
makes Fibre Channel backward-compatible with all the existing applica-
tions that still use the SCSI protocol to talk to disks inside servers. If the
SCSI protocol was not used, all existing applications would have needed
to be recompiled to use a different method of talking to disk drives.
SCSI works a bit differently in a SAN from the way it does when it talks to
a single disk drive inside a server. SCSI inside a server runs over copper
wires, and data is transmitted in parallel across the wires. In a SAN, the
SCSI protocol is serialized, so each bit of data can be transmitted as a
pulse of light over a fiber-optic cable. If you want to connect older parallel
SCSI-based devices in a SAN, you have to use a data router, which acts as a
bridge between the serial SCSI used in a SAN and the parallel SCSI used in
the device. (See “Data routers,” earlier in this chapter, for the gory details.)
Although iSCSI and Infiniband protocols can also be used in storage networks,
the iSCSI protocol is used over an IP network and then usually bridged into

a Fibre Channel SAN. Infiniband, on the other hand, is used over a dedicated
Infiniband network as a server interconnect, and then bridged into a Fibre
Channel SAN for storage access. But the field is always changing: Infiniband
and iSCSI storage arrays are now becoming available, but they still use either
an IP or IB interface rather than FC.

Fabric addressing

The addressing scheme used in SAN fabrics is quite different than that in SAN
loops. A fabric can contain thousands of devices rather than the maximum
127 in a loop. Each device in the fabric must have a unique address, just as
every phone number in the world is unique. This is done by assigning every
device in a SAN fabric a World Wide Name (WWN).
What in the world is a World Wide Name?
Each device on the network has a World Wide Name, a 64-bit hexadecimal
number coded into it by its manufacturer. The WWN is often assigned via a
standard block of addresses made available for manufacturers to use. Thus
every device in a SAN fabric has a built-in address assigned by a central
naming authority — in this case, one of the standard-setting organizations
that control SAN standards — the Institute of Electrical and Electronics
Engineers (IEEE, pronounced eye triple-e). The WWN is sometimes referred to
by its IEEE address. A typical WWN in a SAN will look something like this:
20000000C8328FE6
On some devices, such as large storage arrays, the storage array itself is
assigned the WWN and the manufacturer then uses the assigned WWN as the
basis for virtual WWNs, which add sequential numbers to identify ports.
The WWN of the storage array is known as the World Wide Node Name or
WWNN. The resulting WWN of the port on the storage array is known as the

World Wide Port Name or WWPN. If the base WWN is (say) 20000000C8328F00
and the storage array has four ports, the array manufacturer could use the
assigned WWN as the base, and then use offsets to create the WWPN for each
port, like this:
20000000C8328F01 for port 1
20000000C8328F02 for port 2
20000000C8328F03 for port 3
20000000C8328F04 for port 4
The manufacturers can use offsets to create World Wide Names as long as
the offsets used do not overlap with any other assigned WWNs from the
block of addresses assigned to the manufacturer.
When it comes to Fibre Channel addressing, the term WWN always refers
to the WWPN of the actual ports, which are like the MAC addresses of an
Ethernet network card. The WWPN (now forever referred to as the WWN for
short) is always used in the name server in the switch to identify devices on
the SAN.

The name server

The name server is a logical service (a specialized program that runs in the
SAN switches) used by the devices connected to the SAN to locate other
devices. The name server in the switched fabric acts like a telephone direc-
tory listing. When a device is plugged into a switch, it logs in to the switch (a
process like logging in to your PC) and registers itself with the name server.
The name server uses its own database to store the WWN information for
every device connected to the fabric, as well as the switch port information
and the associated WWN of each device. When one device wants to talk to
another in the fabric, it looks up that device’s address (its WWN) in the name
server, finds out which port the device is located on, and communication is
then routed between the two devices.
Figure 3-5 shows the name server’s lookup operation in action. The arrows
show how the server on Switch 1 (with address 20000000C8328FE6)
locates the address of the storage device on Switch 2 (at address
50000000B2358D34). After it finds the storage device’s address in the name
server, it knows which switch it’s located on and how to get to the device.
When a network gets big enough to have a few hundred devices connected to a
bunch of switches, the use of a directory listing inside the fabric makes sense.

The switches’ name server information can be used to troubleshoot problems
in a SAN. If your device is connected to a switch but doesn’t get registered in
the name server table, then you know that the problem is somewhere between
the server and the switch; you may have a bad cable. (See Chapter 12 for more
SAN troubleshooting tips.)

fabric name server

Note: this article is from book <Storage Area Networks For Dummies®>.

Categories: Hardware, SAN, Storage Tags:

Understanding the Benefits of a SAN

December 10th, 2013 No comments

The typical benefits of using a SAN are a very high return on investment (ROI),
a reduction in the total cost of ownership (TCO) of computing capabilities, and
a pay-back period (PBP) of months rather than years. Here are some specific
ways you can expect a SAN to be beneficial:

Removes the distance limits of SCSI-connected disks: The maximum
length of a SCSI bus is around 25 meters. Fibre Channel SANs allow you
to connect your disks to your servers over much greater distances.
✓ Greater performance: Current Fibre Channel SANs allow connection
to disks at hundreds of megabytes per second; the near future will see
speeds in multiple gigabytes to terabytes per second.
✓ Increased disk utilization: SANs enable more than one server to access
the same physical disk, which lets you allocate the free space on those
disks more effectively.
✓ Higher availability to storage by use of multiple access paths: A SAN
allows for multiple physical connections to disks from a single or mul-
tiple servers.
✓ Deferred disk procurement: That’s business-speak for not having to buy
disks as often as you used to before getting a SAN. Because you can use
disk space more effectively, no space goes to waste.
✓ Reduced data center rack/floor space: Because you don’t need to buy big
servers with room for lots of disks, you can buy fewer, smaller servers —
an arrangement that takes up less room.
✓ New disaster-recovery capabilities: This is a major benefit. SAN devices
can mirror the data on the disks to another location. This thorough
backup capability can make your data safe if a disaster occurs.
✓ Online recovery: By using online mirrors of your data in a SAN device,
or new continuous data protection solutions, you can instantly recover
your data if it becomes lost, damaged, or corrupted.

✓ Better staff utilization: SANs enable fewer people to manage much
more data.
✓ Reduction of management costs as a percentage of storage costs:
Because you need fewer people, your management costs go down.
✓ Improved overall availability: This is another big one. SAN storage is
much more reliable than internal, server-based disk storage. Things
break a lot less often.
✓ Reduction of servers: You won’t need as many file servers with a SAN.
And because SANs are so fast, even your existing servers run faster
when connected to the SAN. You get more out of your current servers
and don’t need to buy new ones as often.
✓ Improved network performance and fewer network upgrades: You can
back up all your data over the SAN (which is dedicated to that purpose)
rather than over the LAN (which has other duties). Since you use less
bandwidth on the LAN, you can get more out of it.
✓ Increased input/output (I/O) performance and bulk data movement:
Yup, SANs are fast. They move data much faster than do internal drives
or devices attached to the LAN. In high-performance computing envi-
ronments, for example, IB (Infiniband) storage-network technology can
move a single data stream at multiple gigabytes per second.
✓ Reduced/eliminated backup windows: A backup window is the time it
takes to back up all your data. When you do your backups over the SAN
instead of over the LAN, you can do them at any time, day or night. If
you use CDP (Continuous Data Protection) solutions over the SAN, you
can pretty much eliminate backup as a separate process (it just happens
all the time).
✓ Protected critical data: SAN storage devices use advanced technology
to ensure that your critical data remains safe and available.
✓ Nondisruptive scalability: Sounds impressive, doesn’t it? It means you
can add storage to a storage network at any time without affecting the
devices currently using the network.
✓ Easier development and testing of applications: By using SAN-based
mirror copies of production data, you can easily use actual production
data to test new applications while the original application stays online.
✓ Support for server clusters: Server clustering is a method of making two
individual servers look like one and guard each other’s back. If one of
them has a heart attack, the other one takes over automatically to keep
the applications running. Clusters require access to a shared disk drive;
a SAN makes this possible.
✓ Storage on demand: Because SAN disks are available to any server in
the storage network, free storage space can be allocated on demand to
any server that needs it, any time. Storage virtualization can simplify
storage provisioning across storage arrays from multiple vendors.

Note:

This is from book <Storage Area Networks For Dummies®>.

Categories: Hardware, SAN, Storage Tags:

Network Performance Analysis

December 5th, 2013 No comments
  • Collisions and network saturation

Ethernet is similar to an old party-line telephone: everybody listens at once, everybody talks at once, and sometimes two talkers start at the same time. In a well-conditioned network, with only two hosts on it, it’s possible to use close to the maximum network’s bandwidth. However, NFS clients and servers live in a burst-filled environment, where many machines try to use the network at the same time. When you remove the well-behaved conditions, usable network bandwidth decreases rapidly.

On the Ethernet, a host first checks for a transmission in progress on the network before attempting one of its own. This process is known as carrier sense. When two or more hosts transmit packets at exactly the same time, neither can sense a carrier, and a collision results. Each host recognizes that a collision has occurred, and backs off for a period of time, t, before attempting to transmit again. For each successive retransmission attempt that results in a collision, t is increased exponentially, with a small random variation. The variation in back-off periods ensures that machines generating collisions do not fall into lock step and seize the network.

As machines are added to the network, the probability of a collision increases. Network utilization is measured as a percentage of the ideal bandwidth consumed by the traffic on the cable at the point of measurement. Various levels of utilization are usually compared on a logarithmic scale. The relative decrease in usable bandwidth going from 5% utilization to 10% utilization, is about the same as going from 10% all the way to 30% utilization.

Measuring network utilization requires a LAN analyzer or similar device. Instead of measuring the traffic load directly, you can use the average collision rate as seen by all hosts on the network as a good indication of whether the network is overloaded or not. The collision rate, as a percentage of output packets, is one of the best measures of network utilization. The Collis field in the output of netstat -in shows the number of collisions:

Code View: Scroll / Show All
% netstat -in 
Name Mtu  Net/Dest     Address      Ipkts   Ierrs  Opkts   Oerrs Collis Queue  
lo0  8232 127.0.0.0    127.0.0.1    7188    0      7188     0     0      0     
hme0 1500 129.144.8.0  129.144.8.3  139478  11     102155   0     3055   0

The collision rate for a host is the number of collisions seen by that host divided by the number of packets it writes, as shown in Figure 17-1.

Figure 17-1. Collision rate calculation

Figure 17-1. Collision rate calculation

Collisions are counted only when the local host is transmitting; the collision rate experienced by the host is dependent on its network usage. Because network transmissions are random events, it’s possible to see small numbers of collisions even on the most lightly loaded networks. A collision rate upwards of 5% is the first sign of network loading, and it’s an indication that partitioning the network may be advisable.

  • Network partitioning hardware

Network partitioning involves dividing a single backbone into multiple segments, joined by some piece of hardware that forwards packets. There are multiple types of these devices: repeaters, hubs, bridges, switches, routers, and gateways. These terms are sometimes used interchangeably although each device has a specific set of policies regarding packet forwarding, protocol filtering, and transparency on the network:

Repeaters
A repeater joins two segments at the physical layer. It is a purely electrical connection, providing signal amplification and pulse “clean up” functions without regard for the semantics of the signals. Repeaters are primarily used to exceed the single-cable length limitation in networks based on bus topologies, such as 10Base5 and 10Base2. There is a maximum to the number of repeaters that can exist between any two nodes on the same network, keeping the minimum end-to-end transit time for a packet well within the Ethernet specified maximum time-to-live. Because repeaters do not look at the contents of packets (or packet fragments), they pass collisions on one segment through to the other, making them of little use to relieve network congestion.

Hubs
A hub joins multiple hosts by acting as a wiring concentrator in networks based on star topologies, such as 10BaseT. A hub has the same function as a repeater, although in a different kind of network topology. Each computer is connected, typically over copper, to the hub, which is usually located in a wiring closet. The hub is purely a repeater: it regenerates the signal from one set of wires to the others, but does not process or manage the signal in any way. All traffic is forwarded to all machines connected to the hub.

Bridges
Bridges function at the data link layer, and perform selective forwarding of packets based on their destination MAC addresses. Some delay is introduced into the network by the bridge, as it must receive entire packets and decipher their MAC-layer headers. Broadcast packets are always passed through, although some bridge hardware can be configured to forward only ARP broadcasts and to suppress IP broadcasts such as those emanating from ypbind.Intelligent or learning bridges glean the MAC addresses of machines through observation of traffic on each interface. “Dumb” bridges must be loaded with the Ethernet addresses of machines on each network and impose an administrative burden each time the network topology is modified. With either type of bridge, each new segment is likely to be less heavily loaded than the original network, provided that the most popular inter-host virtual circuits do not run through the bridge.

Switches
You can think of a switch as an intelligent hub having the functionality of a bridge. The switch also functions at the data link layer, and performs selective forwarding of packets based on their destination MAC address. The switch forwards packets only to the intended port of the intended recipient. The switch “learns” the location of the various MAC addresses by observing the traffic on each port. When a switch port receives data packets, it forwards those packets only to the appropriate port for the intended recipient. A hub would instead forward the packet to all other ports on the hub, leaving it to the host connected to the port to determine its interest in the packet. Because the switch only forwards the packet to its destination, it helps reduce competition for bandwidth between the hosts connected to each port.

Routers
Repeaters, hubs, bridges, and switches divide the network into multiple distinct physical pieces, but the collection of backbones is still a single logical network. That is, the IP network number of all hosts on all segments will be the same. It is often necessary to divide a network logically into multiple IP networks, either due to physical constraints (i.e., two offices that are separated by several miles) or because a single IP network has run out of host numbers for new machines.Multiple IP networks are joined by routers that forward packets based on their source and destination IP addresses rather than 48-bit Ethernet addresses. One interface of the router is considered “inside” the network, and the router forwards packets to the “outside” interface. A router usually corrals broadcast traffic to the inside network, although some can be configured to forward broadcast packets to the “outside” network. The networks joined by a router need not be of the same type or physical media, and routers are commonly used to join local area networks to point-to-point long-haul internetwork connections. Routers can also help ensure that packets travel the most efficient paths to their destination. If a link between two routers fails, the sending router can determine an alternate route to keep traffic moving. You can install a dedicated router, or install multiple network interfaces in a host and allow it to route packets in addition to its other duties. Appendix A contains a detailed description of how IP packets are forwarded and how routes are defined to Unix systems.

Gateways
At the top-most level in the network protocol stack, a gateway performs forwarding functions at the application level, and frequently must perform protocol conversion to forward the traffic. A gateway need not be on more than one network; however, gateways are most commonly used to join multiple networks with different sets of native protocols, and to enforce tighter control over access to and from each of the networks.

Replacing an Ethernet hub with a Fast Ethernet hub is like increasing the speed limit of a highway. Replacing a hub with a switch is similar to adding new lanes to the highway. Replacing an Ethernet hub with a Fast Ethernet switch is the equivalent of both improvements, although with a higher cost.

 

PS:

1.This article is from book <Managing NFS and NIS, Second Edition>.

2.Here’s the differences between broadcast domains and collision domains:

Collision Domains
- layer 1 of the OSI model
- a hub is an entire collision domain since it forwards every bit it receives from one interface on every other interfaces
- a bridge is a two interfaces device that creates 2 collision domains, since it forwards the traffic it receives from one interface only to the interface where the destination layer 2 device (based on his mac address) is connected to. A bridge is considered as an “intelligent hub” since it reads the destination mac address in order to forward the traffic only to the interface where it is connected
- a switch is a multi-interface hub, every interface on a switch is a collision domain. A 24 interfaces switch creates 24 collision domains (assuming every interface is connected to something, VLAN don’t have any importance here since VLANs are a layer 2 concept, not layer 1 like collision domains)

Broadcast Domains
- layer 2 of the OSI model
- a switch creates an entire broadcast domain (provided that there’s only one VLAN) since broadcasts are a layer 2 concept (mac address related)
- routers don’t forward layer 2 broadcasts, hence they separate broadcast domains

Categories: Network, Networking Security Tags:

debugging nfs problem with snoop in solaris

December 3rd, 2013 No comments

Network analyzers are ultimately the most useful tools available when it comes to debugging NFS problems. The snoop network analyzer bundled with Solaris was introduced in Section 13.5. This section presents an example of how to use snoop to resolve NFS-related problems.

Consider the case where the NFS client rome attempts to access the contents of the filesystems exported by the server zeus through the /net automounter path:

rome% ls -la /net/zeus/export
total 5
dr-xr-xr-x   3 root     root           3 Jul 31 22:51 .
dr-xr-xr-x   2 root     root           2 Jul 31 22:40 ..
drwxr-xr-x   3 root     other        512 Jul 28 16:48 eng
dr-xr-xr-x   1 root     root           1 Jul 31 22:51 home
rome% ls /net/zeus/export/home
/net/zeus/export/home: Permission denied

 

The client is not able to open the contents of the directory /net/zeus/export/home, although the directory gives read and execute permissions to all users:

Code View: Scroll / Show All
rome% df -k /net/zeus/export/home
filesystem            kbytes    used   avail capacity  Mounted on
-hosts                     0       0       0     0%    /net/zeus/export/home

 

The df command shows the -hosts automap mounted on the path of interest. This means that the NFS filesystem rome:/export/home has not yet been mounted. To investigate the problem further, snoopis invoked while the problematic ls command is rerun:

Code View: Scroll / Show All
 rome# snoop -i /tmp/snoop.cap rome zeus
  1   0.00000      rome -> zeus      PORTMAP C GETPORT prog=100003 (NFS) vers=3 
proto=UDP
  2   0.00314      zeus -> rome      PORTMAP R GETPORT port=2049
  3   0.00019      rome -> zeus      NFS C NULL3
  4   0.00110      zeus -> rome      NFS R NULL3 
  5   0.00124      rome -> zeus      PORTMAP C GETPORT prog=100005 (MOUNT) vers=1 
proto=TCP
  6   0.00283      zeus -> rome      PORTMAP R GETPORT port=33168
  7   0.00094      rome -> zeus      TCP D=33168 S=49659 Syn Seq=1331963017 Len=0 
Win=24820 Options=<nop,nop,sackOK,mss 1460>
  8   0.00142      zeus -> rome      TCP D=49659 S=33168 Syn Ack=1331963018 
Seq=4025012052 Len=0 Win=24820 Options=<nop,nop,sackOK,mss 1460>
  9   0.00003      rome -> zeus      TCP D=33168 S=49659     Ack=4025012053 
Seq=1331963018 Len=0 Win=24820
 10   0.00024      rome -> zeus      MOUNT1 C Get export list
 11   0.00073      zeus -> rome      TCP D=49659 S=33168     Ack=1331963062 
Seq=4025012053 Len=0 Win=24776
 12   0.00602      zeus -> rome      MOUNT1 R Get export list 2 entries
 13   0.00003      rome -> zeus      TCP D=33168 S=49659     Ack=4025012173 
Seq=1331963062 Len=0 Win=24820
 14   0.00026      rome -> zeus      TCP D=33168 S=49659 Fin Ack=4025012173 
Seq=1331963062 Len=0 Win=24820
 15   0.00065      zeus -> rome      TCP D=49659 S=33168     Ack=1331963063 
Seq=4025012173 Len=0 Win=24820
 16   0.00079      zeus -> rome      TCP D=49659 S=33168 Fin Ack=1331963063 
Seq=4025012173 Len=0 Win=24820
 17   0.00004      rome -> zeus      TCP D=33168 S=49659     Ack=4025012174 
Seq=1331963063 Len=0 Win=24820
 18   0.00058      rome -> zeus      PORTMAP C GETPORT prog=100005 (MOUNT) vers=3 
proto=UDP
 19   0.00412      zeus -> rome      PORTMAP R GETPORT port=34582
 20   0.00018      rome -> zeus      MOUNT3 C Null
 21   0.00134      zeus -> rome      MOUNT3 R Null 
 22   0.00056      rome -> zeus      MOUNT3 C Mount /export/home
 23   0.23112      zeus -> rome      MOUNT3 R Mount Permission denied

 

Packet 1 shows the client rome requesting the port number of the NFS service (RPC program number 100003, Version 3, over the UDP protocol) from the server’s rpcbind (portmapper). Packet 2 shows the server’s reply indicating nfsd is running on port 2049. Packet 3 shows the automounter’s call to the server’s nfsd daemon to verify that it is indeed running. The server’s successful reply is shown in packet 4. Packet 5 shows the client’s request for the port number for RPC program number 100005, Version 1, over TCP (the RPC MOUNT program). The server replies with packet 6 with port=33168. Packets 7 through 9 are TCP hand shaking between our NFS client and the server’s mountd. Packet 10 shows the client’s call to the server’s mountd daemon (which implements the MOUNT program) currently running on port 33168. The client is requesting the list of exported entries. The server replies with packet 12 including the names of the two entries exported. Packets 18 and 19 are similar to packets 5 and 6, except that this time the client is asking for the port number of the MOUNT program version 3 running over UDP. Packet 20 and 21 show the client verifying that version 3 of the MOUNT service is up and running on the server. Finally, the client issues the Mount /export/home request to the server in packet 22, requesting the filehandle of the /export/home path. The server’s mountd daemon checks its export list, and determines that the host rome is not present in it and replies to the client with a “Permission Denied” error in packet 23.

The analysis indicates that the “Permission Denied” error returned to the ls command came from the MOUNT request made to the server, not from problems with directory mode bits on the client. Having gathered this information, we study the exported list on the server and quickly notice that the filesystem /export/home is exported only to the host verona:

rome$ showmount -e zeus
export list for zeus:
/export/eng  (everyone)
/export/home verona

 

We could have obtained the same information by inspecting the contents of packet 12, which contains the export list requested during the transaction:

Code View: Scroll / Show All
rome# snoop -i /tmp/cap -v -p 10,12
...
      Packet 10 arrived at 3:32:47.73
RPC:  ----- SUN RPC Header -----
RPC:  
RPC:  Record Mark: last fragment, length = 40
RPC:  Transaction id = 965581102
RPC:  Type = 0 (Call)
RPC:  RPC version = 2
RPC:  Program = 100005 (MOUNT), version = 1, procedure = 5
RPC:  Credentials: Flavor = 0 (None), len = 0 bytes
RPC:  Verifier   : Flavor = 0 (None), len = 0 bytes
RPC:  
MOUNT:----- NFS MOUNT -----
MOUNT:
MOUNT:Proc = 5 (Return export list)
MOUNT:
...
       Packet 12 arrived at 3:32:47.74
RPC:  ----- SUN RPC Header -----
RPC:  
RPC:  Record Mark: last fragment, length = 92
RPC:  Transaction id = 965581102
RPC:  Type = 1 (Reply)
RPC:  This is a reply to frame 10
RPC:  Status = 0 (Accepted)
RPC:  Verifier   : Flavor = 0 (None), len = 0 bytes
RPC:  Accept status = 0 (Success)
RPC:  
MOUNT:----- NFS MOUNT -----
MOUNT:
MOUNT:Proc = 5 (Return export list)
MOUNT:Directory = /export/eng
MOUNT:Directory = /export/home
MOUNT: Group = verona
MOUNT:

 

For simplicity, only the RPC and NFS Mount portions of the packets are shown. Packet 10 is the request for the export list, packet 12 is the reply. Notice that every RPC packet contains the transaction ID (XID), the message type (call or reply), the status of the call, and the credentials. Notice that the RPC header includes the string “This is a reply to frame 10″. This is not part of the network packet. Snoopkeeps track of the XIDs it has processed and attempts to match calls with replies and retransmissions. This feature comes in very handy during debugging. The Mount portion of packet 12 shows the list of directories exported and the group of hosts to which they are exported. In this case, we can see that /export/home was only exported with access rights to the host verona. The problem can be fixed by adding the host rome to the export list on the server.

PS:

explain solaris snoop network analyzer with example

December 2nd, 2013 No comments

Here’s the code:

# snoop -i /tmp/capture -v -p 3
ETHER:  ----- Ether Header -----
ETHER:  
ETHER:  Packet 3 arrived at 15:08:43.35
ETHER:  Packet size = 82 bytes
ETHER:  Destination = 0:0:c:7:ac:56, Cisco
ETHER:  Source      = 8:0:20:b9:2b:f6, Sun
ETHER:  Ethertype = 0800 (IP)
ETHER:  
IP:   ----- IP Header -----
IP:   
IP:   Version = 4
IP:   Header length = 20 bytes
IP:   Type of service = 0x00
IP:         xxx. .... = 0 (precedence)
IP:         ...0 .... = normal delay
IP:         .... 0... = normal throughput
IP:         .... .0.. = normal reliability
IP:   Total length = 68 bytes
IP:   Identification = 35462
IP:   Flags = 0x4
IP:         .1.. .... = do not fragment
IP:         ..0. .... = last fragment
IP:   Fragment offset = 0 bytes
IP:   Time to live = 255 seconds/hops
IP:   Protocol = 17 (UDP)
IP:   Header checksum = 4503
IP:   Source address = 131.40.52.223, caramba
IP:   Destination address = 131.40.52.27, mickey
IP:   No options
IP:   
UDP:  ----- UDP Header -----
UDP:  
UDP:  Source port = 55559
UDP:  Destination port = 2049 (Sun RPC)
UDP:  Length = 48 
UDP:  Checksum = 3685 
UDP:  
RPC:  ----- SUN RPC Header -----
RPC:  
RPC:  Transaction id = 969440111
RPC:  Type = 0 (Call)
RPC:  RPC version = 2
RPC:  Program = 100003 (NFS), version = 3, procedure = 0
RPC:  Credentials: Flavor = 0 (None), len = 0 bytes
RPC:  Verifier   : Flavor = 0 (None), len = 0 bytes
RPC:  
NFS:  ----- Sun NFS -----
NFS:  
NFS:  Proc = 0 (Null procedure)
NFS:

And let’s analyze this:

The Ethernet header displays the source and destination addresses as well as the type of information embedded in the packet. The IP layer displays the IP version number, flags, options, and address of the sender and recipient of the packet. The UDP header displays the source and destination ports, along with the length and checksum of the UDP portion of the packet. Embedded in the UDP frame is the RPC data. Every RPC packet has a transaction ID used by the sender to identify replies to its requests, and by the server to identify duplicate calls. The previous example shows a request from the host caramba to the server mickey. The RPC version = 2 refers to the version of the RPC protocol itself, the program number 100003 and Version 3 apply to the NFS service. NFS procedure 0 is always the NULL procedure, and is most commonly invoked with no authentication information. The NFS NULL procedure does not take any arguments, therefore none are listed in the NFS portion of the packet.

PS:

  1. Here’s more usage about snoop on solaris:

The amount of traffic on a busy network can be overwhelming, containing many irrelevant packets to the problem at hand. The use of filters reduces the amount of noise captured and displayed, allowing you to focus on relevant data. A filter can be applied at the time the data is captured, or at the time the data is displayed. Applying the filter at capture time reduces the amount of data that needs to be stored and processed during display. Applying the filter at display time allows you to further refine the previously captured information. You will find yourself applying different display filters to the same data set as you narrow the problem down, and isolate the network packets of interest.

Snoop uses the same syntax for capture and display filters. For example, the host filter instructs snoop to only capture packets with source or destination address matching the specified host:

Code View: Scroll / Show All
# snoop host caramba
Using device /dev/hme (promiscuous mode)
     caramba -> schooner     NFS C GETATTR3 FH=B083
    schooner -> caramba      NFS R GETATTR3 OK
     caramba -> schooner     TCP D=2049 S=1023     Ack=3647506101 Seq=2611574902 Len=0 Win=24820

 

In this example the host filter instructs snoop to capture packets originating at or addressed to the host caramba. You can specify the IP address or the hostname, and snoop will use the name service switch to do the conversion. Snoop assumes that the hostname specified is an IPv4 address. You can specify an IPv6 address by using the inet6 qualifier in front of the host filter:

Code View: Scroll / Show All
# snoop inet6 host caramba
Using device /dev/hme (promiscuous mode)
     caramba -> 2100::56:a00:20ff:fea0:3390    ICMPv6 Neighbor advertisement
2100::56:a00:20ff:fea0:3390 -> caramba         ICMPv6 Echo request (ID: 1294 Sequence number: 0)
     caramba -> 2100::56:a00:20ff:fea0:3390    ICMPv6 Echo reply (ID: 1294 Sequence number: 0)

 

You can restrict capture of traffic addressed to the specified host by using the to or dst qualifier in front of the host filter:

# snoop to host caramba
Using device /dev/hme (promiscuous mode)
    schooner -> caramba      RPC R XID=1493500696 Success
    schooner -> caramba      RPC R XID=1493500697 Success
    schooner -> caramba      RPC R XID=1493500698 Success

 

Similarly you can restrict captured traffic to only packets originating from the specified host by using the from or src qualifier:

Code View: Scroll / Show All
# snoop from host caramba
Using device /dev/hme (promiscuous mode)
     caramba -> schooner     NFS C GETATTR3 FH=B083
     caramba -> schooner     TCP D=2049 S=1023     Ack=3647527137 Seq=2611841034 Len=0 Win=24820

 

Note that the host keyword is not required when the specified hostname does not conflict with the name of another snoop primitive.The previous snoop from host caramba command could have been invoked without the host keyword and it would have generated the same output:

Code View: Scroll / Show All
 
					# snoop from caramba 
Using device /dev/hme (promiscuous mode)
     caramba -> schooner     NFS C GETATTR3 FH=B083
     caramba -> schooner     TCP D=2049 S=1023     Ack=3647527137 Seq=2611841034 Len=0 Win=24820

 

For clarity, we use the host keyword throughout this book. Two or more filters can be combined by using the logical operators and and or :

# snoop -o /tmp/capture -c 20 from host caramba and rpc nfs 3
Using device /dev/hme (promiscuous mode)
20 20 packets captured

 

Snoop captures all NFS Version 3 packets originating at the host caramba. Here, snoop is invoked with the -c and -o options to save 20 filtered packets into the /tmp/capture file. We can later apply other filters during display time to further analyze the captured information. For example, you may want to narrow the previous search even further by only listing TCP traffic by using the proto filter:

# snoop -i /tmp/capture proto tcp
Using device /dev/hme (promiscuous mode)
  1   0.00000     caramba -> schooner    NFS C GETATTR3 FH=B083
  2   2.91969     caramba -> schooner    NFS C GETATTR3 FH=0CAE
  9   0.37944     caramba -> rea         NFS C FSINFO3 FH=0156
 10   0.00430     caramba -> rea         NFS C GETATTR3 FH=0156
 11   0.00365     caramba -> rea         NFS C ACCESS3 FH=0156 (lookup)
 14   0.00256     caramba -> rea         NFS C LOOKUP3 FH=F244 libc.so.1
 15   0.00411     caramba -> rea         NFS C ACCESS3 FH=772D (lookup)

 

Snoop reads the previously filtered data from /tmp/capture, and applies the new filter to only display TCP traffic. The resulting output is NFS traffic originating at the host caramba over the TCP protocol. We can apply a UDP filter to the same NFS traffic in the /tmp/capture file and obtain the NFS Version 3 traffic over UDP from host caramba without affecting the information in the /tmp/capture file:

# snoop -i /tmp/capture proto udp
Using device /dev/hme (promiscuous mode)
  1   0.00000      caramba -> rea          NFS C NULL3

 

So far, we’ve presented filters that let you specify the information you are interested in. Use the not operator to specify the criteria of packets that you wish to have excluded during capture. For example, you can use the not operator to capture all network traffic, except that generated by the remote shell:

Code View: Scroll / Show All
# snoop not port login
Using device /dev/hme (promiscuous mode)
      rt-086 -> BROADCAST        RIP R (25 destinations)
      rt-086 -> BROADCAST        RIP R (10 destinations)
     caramba -> schooner         NFS C GETATTR3 FH=B083
    schooner -> caramba          NFS R GETATTR3 OK
     caramba -> donald           NFS C GETATTR3 FH=00BD
    jamboree -> donald           NFS R GETATTR3 OK
     caramba -> donald           TCP D=2049 S=657     Ack=3855205229 Seq=2331839250 Len=0 Win=24820
     caramba -> schooner         TCP D=2049 S=1023    Ack=3647569565 Seq=2612134974 Len=0 Win=24820
     narwhal -> 224.2.127.254    UDP D=9875 S=32825 LEN=368

 

On multihomed hosts (systems with more than one network interface device), use the -d option to specify the particular network interface to snoop on:

snoop -d hme2

 

You can snoop on multiple network interfaces concurrently by invoking separate instances of snoop on each device. This is particularly useful when you don’t know what interface the host will use to generate or receive the requests. The -d option can be used in conjunction with any of the other options and filters previously described:

# snoop -o /tmp/capture-hme0 -d hme0 not port login &
# snoop -o /tmp/capture-hme1 -d hme1 not port login &

2.This article is from book <Managing NFS and NIS, Second Edition>

rpc remote procedure call mechanism

December 2nd, 2013 No comments

The rpcbind daemon (also known as the portmapper),[8] exists to register RPC services and to provide their IP port numbers when given an RPC program number. rpcbind itself is an RPC service, but it resides at a well-known IP port (port 111) so that it may be contacted directly by remote hosts. For example, if host fred needs to mount a filesystem from host barney, it must send an RPC request to themountd daemon on barney. The mechanics of making the RPC request are as follows:

[8] The rpcbind daemon and the old portmapper provide the same RPC service. The portmapper implements Version 2 of the portmap protocol (RPC program number 100000), where the rpcbind daemon implements Versions 3 and 4 of the protocol, in addition to Version 2. This means that the rpcbind daemon already implements the functionality provided by the old portmapper. Due to this overlap in functionality and to add to the confusion, many people refer to the rpcbind daemon as the portmapper.

  • fred gets the IP address for barney, using the ipnodes NIS map. fred also looks up the RPC program number for mountd in the rpc NIS map. The RPC program number for mountd is 100005.
  • Knowing that the portmapper lives at port 111, fred sends an RPC request to the portmapper on barney, asking for the IP port (on barney) of RPC program 100005. fred also specifies the particular protocol and version number for the RPC service. barney ‘s portmapper responds to the request with port 704, the IP port at which mountd is listening for incoming mount RPC requests over the specified protocol. Note that it is possible for the portmapper to return an error, if the specified program does not exist or if it hasn’t been registered on the remote host. barney, for example, might not be an NFS server and would therefore have no reason to run the mountd daemon.
  • fred sends a mount RPC request to barney, using the IP port number returned by the portmapper. This RPC request contains an RPC procedure number, which tells the mountd daemon what to do with the request. The RPC request also contains the parameters for the procedure, in this case, the name of the filesystem fred needs to mount.

Note: this is from book <Managing NFS and NIS, Second Edition>

Categories: Kernel, Linux, Network Tags:

resolved – mount clntudp_create: RPC: Program not registered

December 2nd, 2013 No comments

When I did a showmount -e localhost, error occured:

[root@centos-doxer ~]# showmount -e localhost
mount clntudp_create: RPC: Program not registered

So I checked what RPC program number of showmount was using:

[root@centos-doxer ~]# grep showmount /etc/rpc
mountd 100005 mount showmount

As this indicated, we should startup mountd daemon to make showmount -e localhost work. And mountd is part of nfs, so I started up nfs:

[root@centos-doxer ~]# /etc/init.d/nfs start
Starting NFS services: [ OK ]
Starting NFS quotas: [ OK ]
Starting NFS daemon: [ OK ]
Starting NFS mountd: [ OK ]

Now as mountd was running, showmount -e localhost should work.

 

Categories: Kernel, Linux, Network Tags:

Troubleshooting NFS locking problems in solaris

November 29th, 2013 No comments

Lock problems will be evident when an NFS client tries to lock a file, and it fails because someone has it locked. For applications that share access to files, the expectation is that locks will be short-lived. Thus, the pattern your users will notice when something is awry is that yesterday an application started up quite quickly, but today it hangs. Usually it is because an NFS/NLM client holds a lock on a file that your application needs to lock, and the holding client has crashed.

11.3.1. Diagnosing NFS lock hangs

On Solaris, you can use tools like pstack and truss to verify that processes are hanging in a lock request:

client1% ps -eaf | grep SuperApp
     mre 23796 10031  0 11:13:22 pts/6    0:00 SuperApp
client1% pstack 23796
23796:  SuperApp
 ff313134 fcntl    (1, 7, ffbef9dc)
 ff30de48 fcntl    (1, 7, ffbef9dc, 0, 0, 0) + 1c8
 ff30e254 lockf    (1, 1, 0, 2, ff332584, ff2a0140) + 98
 0001086c main     (1, ffbefac4, ffbefacc, 20800, 0, 0) + 1c
 00010824 _start   (0, 0, 0, 0, 0, 0) + dc
client1% truss -p 23796
fcntl(1, F_SETLKW, 0xFFBEF9DC)  (sleeping...)

 

This verifies that the application is stuck in a lock request. We can use pfiles to see what is going on with the files of process 23796:

client1% pfiles 23796
pfiles 23796
23796:  SuperApp
  Current rlimit: 256 file descriptors
   0: S_IFCHR mode:0620 dev:136,0 ino:37990 uid:466 gid:7 rdev:24,37
      O_RDWR
   1: S_IFREG mode:0644 dev:208,1823 ino:5516985 uid:466 gid:300 size:0
      O_WRONLY|O_LARGEFILE
      advisory write lock set by process 3242
   2: S_IFCHR mode:0620 dev:136,0 ino:37990 uid:466 gid:7 rdev:24,37
      O_RDWR

 

That we are told that there is an advisory lock set on file descriptor 1 that is set by another process, process ID 3242, is useful, but unfortunately it doesn’t tell us if 3242 is a local process or a process on another NFS client or NFS server. We also aren’t told if the file mapped to file descriptor 1 is a local file, or an NFS file. We are, however, told that the major and minor device numbers of the filesystem are 208 and 1823 respectively. If you run the mount command without any arguments, this dumps the list of mounted file systems. You should see a display similar to:

Code View: Scroll / Show All
/ on /dev/dsk/c0t0d0s0 read/write/setuid/intr/largefiles/onerror=panic/dev=2200000 
on Thu Dec 21 11:13:33 2000
/usr on /dev/dsk/c0t0d0s6 read/write/setuid/intr/largefiles/onerror=panic/
dev=2200006 on Thu Dec 21 11:13:34 2000
/proc on /proc read/write/setuid/dev=31c0000 on Thu Dec 21 11:13:29 2000
/dev/fd on fd read/write/setuid/dev=32c0000 on Thu Dec 21 11:13:34 2000
/etc/mnttab on mnttab read/write/setuid/dev=3380000 on Thu Dec 21 11:13:35 2000
/var on /dev/dsk/c0t0d0s7 read/write/setuid/intr/largefiles/onerror=panic/
dev=2200007 on Thu Dec 21 11:13:40 2000
/home/mre on spike:/export/home/mre remote/read/write/setuid/intr/dev=340071f on 
Thu Dec 28 08:51:30 2000

 

The numbers after dev= are in hexadecimal. Device numbers are constructed by taking the major number, shifting it left several bits, and then adding the minor number. Convert the minor number 1823 to hexadecimal, and look for it in the mount table:

Code View: Scroll / Show All
client1% printf "%x\n" 1823
71f
client1% mount | grep 'dev=.*71f'
/home/mre on spike:/export/home/mre remote/read/write/setuid/intr/dev=340071f on 
Thu Dec 28 08:51:30 2000

 

We now know four things:

  • This is an NFS file we are blocking on.
  • The NFS server name is spike.
  • The filesystem on the server is /export/home/mre.
  • The inode number of the file is 5516985.

One obvious cause you should first eliminate is whether the NFS server spike has crashed or not. If it hasn’t crashed, then the next step is to examine the server.

11.3.2. Examining lock state on NFS/NLM servers

Solaris and other System V-derived systems have a useful tool called crash for analyzing system state. Crash actually reads the Unix kernel’s memory and formats its data structures in a more human readable form. Continuing with the example from Section 11.3.1, assuming /export/home/mre is a directory on a UFS filesystem, which can be verified by doing:

spike# df -F ufs | grep /export
/export               (/dev/dsk/c0t0d0s7 ):  503804 blocks   436848 files

 

then you can use crash to get more lock state.

The crash command is like a shell, but with internal commands for examining kernel state. The internal command we will be using is lck :

Code View: Scroll / Show All
spike# crash
dumpfile = /dev/mem, namelist = /dev/ksyms, outfile = stdout
> lck
Active and Sleep Locks:
INO         TYP  START END     PROC  PID  FLAGS STATE   PREV     NEXT     LOCK
30000c3ee18  w   0      0       13   136   0021 3       48bf0f8  ae9008   6878d00 
30000dd8710  w   0      MAXEND  17   212   0001 3       8f1a48   8f02d8   8f0e18  
30001cce1c0  w   193    MAXEND  -1   3242  2021 3       6878850  c43a08   2338a38 

Summary From List:
 TOTAL    ACTIVE  SLEEP
   3      3       0
>

 

An important field is PROC. PROC is the “slot” number of the process. If it is -1, that indicates that the lock is being held by a nonlocal (i.e., an NFS client) process, and the PID field thus indicates the process ID, relative to the NFS client. In the sample display, we see one such entry:

Code View: Scroll / Show All
30001cce1c0  w   193    MAXEND  -1   3242  2021 3       6878850  c43a08   2338a38

 

Note that the process id, 3242, is equal to that which the pfiles command displayed earlier in this example. We can confirm that this lock is for the file in question via crash’s uinode command:

> uinode 30001cce1c0
UFS INODE MAX TABLE SIZE = 34020
ADDR         MAJ/MIN   INUMB  RCNT LINK   UID   GID    SIZE    MODE  FLAGS
30001cce1c0  136,  7   5516985   2    1   466   300    403  f---644  mt rf
>

 

The inode numbers match what pfiles earlier displayed on the NFS client. However, inode numbers are unique per local filesystem. We can make doubly sure this is the file by comparing the major and minor device numbers from the uinode command, 136 and 7, with that of the filesystem that is mounted on /export :

spike# ls -lL /dev/dsk/c0t0d0s7
brw-------   1 root     sys      136,  7 May  6  2000 /dev/dsk/c0t0d0s7
spike#

11.3.3. Clearing lock state

Continuing with our example from Section 11.3.2, at this point we know that the file is locked by another NFS client. Unfortunately, we don’t know which client it is, as crash won’t give us that information. We do however have a potential list of clients in the server’s /var/statmon/sm directory:

spike# cd /var/statmon/sm
spike# ls
client1       ipv4.10.1.0.25  ipv4.10.1.0.26  gonzo      java

 

The entries prefixed with ipv4 are just symbolic links to other entries. The non-symbolic link entries identify the hosts we want to check for.

The most likely cause of the lock not getting released is that the holding NFS client has crashed. You can take the list of hosts from the /var/statmon/sm directory and check if any are dead, or not responding due to a network partition. Once you determine which are dead, you can use Solaris’s clear_locks command to clear lock state. Let’s suppose you determine that gonzo is dead. Then you would do:

spike# clear_locks gonzo

 

If clearing the lock state of dead clients doesn’t fix the problem, then perhaps a now-live client crashed, but for some reason after it rebooted, its status monitor did not send a notification to the NLM server’s status monitor. You can log onto the live clients and check if they are currently mounting the filesystem from the server (in our example, spike:/export). If they are not, then you should consider using clear_locks to clear any residual lock state those clients might have had.

Ultimately, you may be forced to reboot your server. Short of that there are other things you could do. Since you know the inode number and filesystem of file in question, you can determine the file’s name:

spike# cd /export
find . -inum 5516985 -print
./home/mre/database

 

You could rename file database to something else, and copy it back to a file named database. Then kill and restart the SuperApp application on client1. Of course, such an approach requires intimate knowledge or experience with the application to know if this will be safe.

PS:

This article is from book <Managing NFS and NIS, Second Edition>.

Categories: NAS, Storage Tags:

quick configuration of python httpd server

November 28th, 2013 No comments

Let’s assume that you want to copy files from server A to server B, and you have found that no scp available, but wget is there for use. Then you can try run one python command and use wget to download files from server A.

Here’s the steps:

On server A:

cd <directory of files you want to copy>

python -m SimpleHTTPServer #notice the output of this command, for example, “Serving HTTP on 0.0.0.0 port 8000 …”

Now you can open browser and visit http://<hostname of server A>:8000. You will notice files are there now.

On server B:

wget http://<hostname of server A>:8000/<files to copy>

After you’ve copied files, you can press control+c to terminate that python http server on Server A. (Or you can press ctrl+z, and then bg %<job id> to make that python httpd server run in background)

Categories: Programming Tags:

VLAN in windows hyper-v

November 26th, 2013 No comments

Briefly, a virtual LAN (VLAN) can be regarded as a broadcast domain. It operates on the OSI
network layer 2. The exact protocol definition is known as 802.1Q. Each network packet belong-
ing to a VLAN has an identifier. This is just a number between 0 and 4095, with both 0 and 4095
reserved for other uses. Let’s assume a VLAN with an identifier of 10. A NIC configured with
the VLAN ID of 10 will pick up network packets with the same ID and will ignore all other IDs.
The point of VLANs is that switches and routers enabled for 802.1Q can present VLANs to dif-
ferent switch ports in the network. In other words, where a normal IP subnet is limited to a set
of ports on a physical switch, a subnet defined in a VLAN can be present on any switch port—if
so configured, of course.

Getting back to the VLAN functionality in Hyper-V: both virtual switches and virtual NICs
can detect and use VLAN IDs. Both can accept and reject network packets based on VLAN ID,
which means that the VM does not have to do it itself. The use of VLAN enables Hyper-V to
participate in more advanced network designs. One limitation in the current implementation is
that a virtual switch can have just one VLAN ID, although that should not matter too much in
practice. The default setting is to accept all VLAN IDs.

nfs null map – white out any map entry affecting directory

November 25th, 2013 No comments

The automounter also has a map “white-out” feature, via the -null special map. It is used after a directory to effectively delete any map entry affecting that directory from the automounter’s set of maps. It must precede the map entry being deleted. For example:

/tools -null

This feature is used to override auto_master or direct map entries that may have been inherited from an NIS map. If you need to make per-machine changes to the automounter maps, or if you need local control over a mount point managed by the automounter, white-out the conflicting map entry with the -null map.

PS: this is from book <>

Categories: NAS, Storage Tags:

nfs direct map vs indirect map

November 25th, 2013 No comments
  • Indirect maps

Here is an indirect automounter map for the /tools directory, called auto_tools:

deskset         -ro      mahimahi:/tools2/deskset 
sting                    mahimahi:/tools2/sting 
news                     thud:/tools3/news 
bugview                  jetstar:/usr/bugview

 

The first field is called the map key and is the final component of the mount point. The map name suffix and the mount point do not have to share the same name, but adopting this convention makes it easy to associate map names and mount points. This four-entry map is functionally equivalent to the /etc/vfstab excerpt:

mahimahi:/tools2/desket - /tools/deskset  nfs - - ro 
mahimahi:/tools2/string - /tools/sting    nfs - -  
thud:/tools3/news       - /tools/news     nfs - -  
jetstar:/usr/bugview    - /tools/bugview  nfs - -
  • Direct maps

Direct maps define point-specific, nonuniform mount points. The best example of the need for a direct map entry is /usr/man. The /usr directory contains numerous other entries, so it cannot be an indirect mount point. Building an indirect map for /usr/man that uses /usr as a mount point will “cover up” /usr/bin and /usr/etc. A direct map allows the automounter to complete mounts on a single directory entry.

The key in a direct map is a full pathname, instead of the last component found in the indirect map. Direct maps also follow the /etc/auto_contents naming scheme. Here is a sample /etc/auto_direct:
/usr/man wahoo:/usr/share/man
/usr/local/bin mahimahi:/usr/local/bin.sun4

A major difference in behavior is that the real direct mount points are always visible to ls and other tools that read directory structures. The automounter treats direct mounts as individual directory entries, not as a complete directory, so the automounter gets queried whenever the directory containing the mount point is read. Client performance is affected in a marked fashion if direct mount points are used in several well-traveled directories. When a user reads a directory containing a number of direct mounts, the automounter initiates a flurry of mounting activity in response to the directory read requests. Section 9.5.3 describes a trick that lets you use indirect maps instead of direct maps. By using this trick, you can avoid mount storms caused by multiple direct mount points.

Contents from /etc/auto_master:
# Directory Map NFS Mount Options
/tools /etc/auto_tools -ro #this is indirect map
/- /etc/auto_direct #this is direct map
PS:
This article is mostly from book <Managing NFS and NIS, Second Edition>

Categories: NAS, Storage Tags:

Hyper-V architecture: the hypervisor, the virtual machines, and their relations

November 25th, 2013 No comments
hyper-v architecture

hyper-v architecture

cpu rings

cpu rings

hybrid virtualization - microsoft virtual server

hybrid virtualization – microsoft virtual server

PS: This is from book Mastering Windows Server® 2008 R2

Categories: Clouding Tags:

Difference between Computer Configuration settings and User Configuration settings in Active Directory Policy Editor

November 22nd, 2013 No comments
  • Computer Configuration settings are applied to computer accounts at startup and during the background refresh interval.
  • User Configuration settings are applied to the user accounts logon and during the background refresh interval.
Categories: Windows Tags:

resolved – sshd: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost= user=

November 20th, 2013 No comments

Today when I tried to log on one linux server with a normal account, errors were found in /var/log/secure:

Nov 20 07:43:39 test_linux sshd[11200]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.182.120.188 user=testuser
Nov 20 07:43:39 test_linux sshd[11200]: pam_ldap: error trying to bind (Invalid credentials)
Nov 20 07:43:42 test_linux sshd[11200]: nss_ldap: failed to bind to LDAP server ldaps://test.com:7501: Invalid credentials
Nov 20 07:43:42 test_linux sshd[11200]: nss_ldap: failed to bind to LDAP server ldap://test.com: Invalid credentials
Nov 20 07:43:42 test_linux sshd[11200]: nss_ldap: could not search LDAP server – Server is unavailable
Nov 20 07:43:42 test_linux sshd[11200]: nss_ldap: failed to bind to LDAP server ldaps://test.com:7501: Invalid credentials
Nov 20 07:43:43 test_linux sshd[11200]: nss_ldap: failed to bind to LDAP server ldap://test.com: Invalid credentials
Nov 20 07:43:43 test_linux sshd[11200]: nss_ldap: could not search LDAP server – Server is unavailable
Nov 20 07:43:55 test_linux sshd[11200]: pam_ldap: error trying to bind (Invalid credentials)
Nov 20 07:43:55 test_linux sshd[11200]: Failed password for testuser from 10.182.120.188 port 34243 ssh2
Nov 20 07:43:55 test_linux sshd[11201]: fatal: Access denied for user testuser by PAM account configuration

After some attempts on linux PAM(sshd, system-auth), I still got nothing. Later, I checked /etc/ldap.conf with one other box, and found the configuration on the problematic host was not right.

I copied the right ldap.conf and tried log on later, and the issue resolved.

PS:

You can read more about linux PAM here http://www.linux-pam.org/Linux-PAM-html/ (I recommend having a reading on the System Administrators’ Guide as that may be the only one linux administrators can reach. You can also get a detailed info on some commonly used PAM modules such as pam_tally2.so, pam_unix.so, pam_cracklib, etc.)

Categories: Linux, Security, Systems Tags:

Enabling NIS on client hosts

November 19th, 2013 No comments

Once you have one or more NIS servers running ypserv, you can set up NIS clients that query them. Make sure you do not enable NIS on any clients until you have at least one NIS server up and running. If no servers are available, the host that attempts to run as an NIS client will hang.

To enable NIS on a client host, first set up the nsswitch.conf file:

newclient# cp /etc/nsswitch.nis /etc/nsswitch.conf

 

Set up the domain name:

newclient# domainname bedrock
newclient# domainname > /etc/defaultdomain

 

Run ypinit:

newclient# /usr/sbin/ypinit -c

 

You will be prompted for a list of NIS servers. Enter the servers in order of proximity to the client.

Kill (if necessary) ypbind, and restart it:

newclient# ps -ef | grep ypbind
newclient# /usr/lib/netsvc/yp/ypstop
newclient# /usr/lib/netsvc/yp/ypstart

 

Once NIS is running, references to the basic administrative files are handled in two fundamentally different ways, depending on how nsswitch.conf is configured:

  • The NIS database replaces some files. Local copies of replaced files (ethers, hosts, netmasks, netgroups,[3] networks, protocols, rpc, and services) are ignored as soon as the ypbind daemon is started (to enable NIS).

    [3] The netgroups file is a special case. Netgroups are only meaningful when NIS is running, in which case the netgroups map (rather than the file) is consulted. The netgroups file is therefore only used to build the netgroups map; it is never “consulted” in its own right.

  • Some files are augmented, or appended to, by NIS. Files that are appended, or augmented, by NIS are consulted before the NIS maps are queried. The default/etc/nsswitch.conf file for NIS has these appended files: aliases, auto_*, group, passwd, services, and shadow. These files are read first, and if an appropriate entry isn’t found in the local file, the corresponding NIS map is consulted. For example, when a user logs in, an NIS client will first look up the user’s login name in the localpasswd file; if it does not find anything that matches, it will refer to the NIS passwd map.

Although the replaced files aren’t consulted once NIS is running, they shouldn’t be deleted. In particular, the /etc/hosts file is used by an NIS client during the boot process, before it starts NIS, but is ignored as soon as NIS is running. The NIS client needs a “runt” hosts file during the boot process so that it can configure itself and get NIS running. Administrators usually truncate hosts to the absolute minimum: entries for the host itself and the “loopback” address. Diskless nodes need additional entries for the node’s boot server and the server for the diskless node’s /usr filesystem. Trimming the hosts file to these minimal entries is a good idea because, for historical reasons, many systems have extremely long host tables. Other files, like rpc, services, and protocols, could probably be eliminated, but it’s safest to leave the files distributed with your system untouched; these will certainly have enough information to get your system booted safely, particularly if NIS stops running for some reason. However, you should make any local additions to these files on the master server alone. You don’t need to bother keeping the slaves and clients up to date.

PS:

This is from book <Managing NFS and NIS, Second Edition>

Categories: IT Architecture, Network, Systems Tags:

resolved – how to show all results in one page when searching your wordpress blog

November 13th, 2013 No comments

Assume that you have your own wordpress blog, and you note down everything you met in daily work.

Now you have some trouble again in work, and remembered that you’ve noted similar issue before. So you tried searching your wordpress blog with a keyword such as “trouble”. Later, wordpress returned a result of 30 pages, each page had 10 articles. Now you scrolled and click “next page” a lot and that really frustrated you. What if you have all the searching result in one page? Thus you just need scroll the page and no waiting for loading pages of next, next, next page. (You may worry that the page load time will disappoint other guys searching your blog, but this proves to be little to worry, as no body will search your blog except yourself. Believe me buddy!)

Here goes the way to fulfill this functionality:

  1. Go to wordpress admin page, then click “Appearance” -> “Editor”;
  2. Click archive.php in the right to edit this file(search.php refers to archive.php, so you should edit archive.php);
  3. Search for “have_posts()”, and add one line above that line. The line to be added is like this: <?php query_posts($query_string . ‘&showposts=30′); ?> You may change 30 here to any number you want. As you guessed, this is the number that will show after searching.
  4. Save the change and try searching again. You’ll notice the change.

PS:

  1. Note that every time you upgrade wordpress or your wordpress theme you may need to do above steps again;
  2. The idea is from http://wordpress.org/support/topic/show-all-content-on-search-page
Categories: Programming Tags: ,

resolved – kernel panic not syncing: Fatal exception Pid: comm: not Tainted

November 13th, 2013 No comments

We’re install IDM OAM today and the linux server panic every time we run the startup script. Server panic info was like this:

Pid: 4286, comm: emdctl Not tainted 2.6.32-300.29.1.el5uek #1
Process emdctl (pid: 4286, threadinfo ffff88075bf20000, task ffff88073d0ac480)
Stack:
ffff88075bf21958 ffffffffa02b1769 ffff88075bf21948 ffff8807cdcce500
<0> ffff88075bf95cc8 ffff88075bf95ee0 ffff88075bf21998 ffffffffa01fd5c6
<0> ffffffffa02b1732 ffff8807bc2543f0 ffff88075bf95cc8 ffff8807bc2543f0
Call Trace:
[<ffffffffa02b1769>] nfs3_xdr_writeargs+0×37/0x7a [nfs]
[<ffffffffa01fd5c6>] rpcauth_wrap_req+0x7f/0x8b [sunrpc]
[<ffffffffa02b1732>] ? nfs3_xdr_writeargs+0×0/0x7a [nfs]
[<ffffffffa01f612a>] call_transmit+0×199/0x21e [sunrpc]
[<ffffffffa01fc8ba>] __rpc_execute+0×85/0×270 [sunrpc]
[<ffffffffa01fcae2>] rpc_execute+0×26/0x2a [sunrpc]
[<ffffffffa01f5546>] rpc_run_task+0×57/0x5f [sunrpc]
[<ffffffffa02abd86>] nfs_write_rpcsetup+0x20b/0x22d [nfs]
[<ffffffffa02ad1e8>] nfs_flush_one+0×97/0xc3 [nfs]
[<ffffffffa02a86b4>] nfs_pageio_doio+0×37/0×60 [nfs]
[<ffffffffa02a87c5>] nfs_pageio_complete+0xe/0×10 [nfs]
[<ffffffffa02ac264>] nfs_writepages+0xa7/0xe4 [nfs]
[<ffffffffa02ad151>] ? nfs_flush_one+0×0/0xc3 [nfs]
[<ffffffffa02acd2e>] nfs_write_mapping+0×63/0x9e [nfs]
[<ffffffff810f02fe>] ? __pmd_alloc+0x5d/0xaf
[<ffffffffa02acd9c>] nfs_wb_all+0×17/0×19 [nfs]
[<ffffffffa029f6f7>] nfs_do_fsync+0×21/0x4a [nfs]
[<ffffffffa029fc9c>] nfs_file_flush+0×67/0×70 [nfs]
[<ffffffff81117025>] filp_close+0×46/0×77
[<ffffffff81059e6b>] put_files_struct+0x7c/0xd0
[<ffffffff81059ef9>] exit_files+0x3a/0x3f
[<ffffffff8105b240>] do_exit+0×248/0×699
[<ffffffff8100e6a1>] ? xen_force_evtchn_callback+0xd/0xf
[<ffffffff8106898a>] ? freezing+0×13/0×15
[<ffffffff8105b731>] sys_exit_group+0×0/0x1b
[<ffffffff8106bd03>] get_signal_to_deliver+0×303/0×328
[<ffffffff8101120a>] do_notify_resume+0×90/0x6d7
[<ffffffff81459f06>] ? kretprobe_table_unlock+0x1c/0x1e
[<ffffffff8145ac6f>] ? kprobe_flush_task+0×71/0x7c
[<ffffffff8103164c>] ? paravirt_end_context_switch+0×17/0×31
[<ffffffff81123e8f>] ? path_put+0×22/0×27
[<ffffffff8101207e>] int_signal+0×12/0×17
Code: 55 48 89 e5 0f 1f 44 00 00 48 8b 06 0f c8 89 07 48 8b 46 08 0f c8 89 47 04 c9 48 8d 47 08 c3 55 48 89 e5 0f 1f 44 00 00 48 0f ce <48> 89 37 c9 48 8d 47 08 c3 55 48 89 e5 53 0f 1f 44 00 00 f6 06
RIP [<ffffffffa02b03c3>] xdr_encode_hyper+0xc/0×15 [nfs]
RSP <ffff88075bf21928>
—[ end trace 04ad5382f19cf8ad ]—
Kernel panic – not syncing: Fatal exception
Pid: 4286, comm: emdctl Tainted: G D 2.6.32-300.29.1.el5uek #1
Call Trace:
[<ffffffff810579a2>] panic+0xa5/0×162
[<ffffffff81450075>] ? threshold_create_device+0×242/0x2cf
[<ffffffff8100ed2f>] ? xen_restore_fl_direct_end+0×0/0×1
[<ffffffff814574b0>] ? _spin_unlock_irqrestore+0×16/0×18
[<ffffffff810580f5>] ? release_console_sem+0×194/0x19d
[<ffffffff810583be>] ? console_unblank+0x6a/0x6f
[<ffffffff8105766f>] ? print_oops_end_marker+0×23/0×25
[<ffffffff814583a6>] oops_end+0xb7/0xc7
[<ffffffff8101565d>] die+0x5a/0×63
[<ffffffff81457c7c>] do_trap+0×115/0×124
[<ffffffff81013731>] do_alignment_check+0×99/0xa2
[<ffffffff81012cb5>] alignment_check+0×25/0×30
[<ffffffffa02b03c3>] ? xdr_encode_hyper+0xc/0×15 [nfs]
[<ffffffffa02b06be>] ? xdr_encode_fhandle+0×15/0×17 [nfs]
[<ffffffffa02b1769>] nfs3_xdr_writeargs+0×37/0x7a [nfs]
[<ffffffffa01fd5c6>] rpcauth_wrap_req+0x7f/0x8b [sunrpc]
[<ffffffffa02b1732>] ? nfs3_xdr_writeargs+0×0/0x7a [nfs]
[<ffffffffa01f612a>] call_transmit+0×199/0x21e [sunrpc]
[<ffffffffa01fc8ba>] __rpc_execute+0×85/0×270 [sunrpc]
[<ffffffffa01fcae2>] rpc_execute+0×26/0x2a [sunrpc]
[<ffffffffa01f5546>] rpc_run_task+0×57/0x5f [sunrpc]
[<ffffffffa02abd86>] nfs_write_rpcsetup+0x20b/0x22d [nfs]
[<ffffffffa02ad1e8>] nfs_flush_one+0×97/0xc3 [nfs]
[<ffffffffa02a86b4>] nfs_pageio_doio+0×37/0×60 [nfs]
[<ffffffffa02a87c5>] nfs_pageio_complete+0xe/0×10 [nfs]
[<ffffffffa02ac264>] nfs_writepages+0xa7/0xe4 [nfs]
[<ffffffffa02ad151>] ? nfs_flush_one+0×0/0xc3 [nfs]
[<ffffffffa02acd2e>] nfs_write_mapping+0×63/0x9e [nfs]
[<ffffffff810f02fe>] ? __pmd_alloc+0x5d/0xaf
[<ffffffffa02acd9c>] nfs_wb_all+0×17/0×19 [nfs]
[<ffffffffa029f6f7>] nfs_do_fsync+0×21/0x4a [nfs]
[<ffffffffa029fc9c>] nfs_file_flush+0×67/0×70 [nfs]
[<ffffffff81117025>] filp_close+0×46/0×77
[<ffffffff81059e6b>] put_files_struct+0x7c/0xd0
[<ffffffff81059ef9>] exit_files+0x3a/0x3f
[<ffffffff8105b240>] do_exit+0×248/0×699
[<ffffffff8100e6a1>] ? xen_force_evtchn_callback+0xd/0xf
[<ffffffff8106898a>] ? freezing+0×13/0×15
[<ffffffff8105b731>] sys_exit_group+0×0/0x1b
[<ffffffff8106bd03>] get_signal_to_deliver+0×303/0×328
[<ffffffff8101120a>] do_notify_resume+0×90/0x6d7
[<ffffffff81459f06>] ? kretprobe_table_unlock+0x1c/0x1e
[<ffffffff8145ac6f>] ? kprobe_flush_task+0×71/0x7c
[<ffffffff8103164c>] ? paravirt_end_context_switch+0×17/0×31
[<ffffffff81123e8f>] ? path_put+0×22/0×27
[<ffffffff8101207e>] int_signal+0×12/0×17

We tried a lot(application coredump, kdump etc) but still not got solution until we notice that there were a lot of nfs related message in the kernel panic info(marked as red above).

As our linux server was not using NFS or autofs, so we tried upgrade nfs client(nfs-utils) and disabled autofs:

yum update nfs-utils

chkconfig autofs off

After this, the startup for IDM succeeded, and no server panic found anymore!

Categories: Kernel, Linux Tags: ,

An Introduction to Active Directory Basics

November 12th, 2013 No comments

Before we get started covering Active Directory, we’ll lay the foundation with some basics. These
definitions aren’t completely comprehensive but will give you the foundation you need to under-
stand the topics in this chapter. Although there are a lot of terms to grasp, no term is that complex.
We’ll define them here with a short introduction and often expand on them later.

 

  • Workgroup

A workgroup is a group of users connected in a local area network (LAN) but

with each computer having its own user accounts. A user who can log onto one computer will
need a different user account to log onto a different computer, which can become a problem.
A single user who needs to access several computers will have several different user accounts,
often with different passwords.

Workgroups are often used in organizations with fewer than 10 computers. As more computers
are added, a decentralized workgroup becomes harder to manage and administer, requiring it
to be promoted to a domain.

  • Domain

When an organization becomes too big for a workgroup, a domain is created by
running the domain controller promotion wizard (DCPromo) on a server and promoting the
server to a domain controller. A domain controller is a server that hosts a copy of Active
Directory Domain Services.

  • Active Directory Domain Services

Active Directory Domain Services (AD DS) is used to
provide several services to an organization. At its core, it’s a big database of objects (such as
users, computers, and groups) and is used to centrally organize and manage all the objects
within an organization. A single user would have a single user account in Active Directory
and can use this single account to access multiple computers in the organization. This is often
referred to as single sign-on.
Additional services include the ability to easily search AD DS so that objects can easily be
located, as well as secure authentication using Kerberos.
Copies of Active Directory are kept on domain controllers. It’s very common to have at least two
domain controllers for redundancy purposes in case one goes down. Any changes to Active
Directory are passed to each of the domain controllers using a process called replication.

  • Replication

When any object (such as a user account) is added, deleted, or modified within
Active Directory, the change is sent to all other domain controllers (DCs) in the domain. When
a business is located in a single location, the changes are sent to all other DCs within a minute.
Modifications can be done on any DC. The initial change is sent from the DC where the change
was created to other DCs (designated as replication partners) within 15 seconds. If there are
more than four DCs in the organization, they are automatically organized in a logical circle,
and the change is replicated through the replication circle until all the DCs have the change.

  • Objects

Objects within AD are used to represent real-world items. Common objects are
user objects and computer objects that represent people and their computers. The objects can
be managed and administered using AD DS. For example, to represent a user named Sally,
a user account object is created. Sally can then use this account to log onto the domain and
access domain resources such as files, folders, printers, and email. Although we would often
say that we give Sally permission to access the resources, we actually give Sally’s user object
permission to access the resources. Similarly, a computer account object is created to repre-
sent Sally’s computer. All objects have properties that can be configured such as the user’s
first name, last name, display name, logon name, and password for a user object.
The types of objects and their properties are predefined. You won’t find a kitchen-sink object
in AD DS, and you won’t find a favorite color property for users—at least not by default. All
objects that can be added to AD DS and the properties used to define these objects are specified
in the schema.

  • Schema

The schema is the definition of all the object types that Active Directory can
contain, and it includes a list of properties that can be used to describe the objects. You
can think of the schema as a set of blueprints for each of the objects. Just as a blueprint for
a house can be used to create a house, a schema definition for a user object can be used to
create a user object.

Only objects that are defined by the schema can be added to Active Directory, and these objects
can be described only by properties defined and identified by the schema. It’s common for
the schema to be modified a few times in the lifetime of an Active Directory enterprise. For
example, to install Exchange Server 2007 (for mail), the schema must be modified to accept the
different objects and properties required by Exchange. Modifying the schema is often referred
to as extending the schema.

  • Organizational units

Organizational units are used to organize objects within Active
Directory. You can think of an OU simply as a container for the objects. By placing the objects
in different containers, they are easier to manage. For example, you can create a Sales OU and
place all the objects representing users and computers in the sales department in the Sales OU.
OUs have two distinct benefits. You can delegate permissions to an OU, and you can link Group
Policy to an OU. As an example, Maria may be responsible for administration for all users and
computers in the sales department. If these objects were placed in the Sales OU, Maria could
be delegated permission to administer the OU, and it would include all the objects in the OU.
Similarly, you can use Group Policy to apply different settings and configurations to all the user
and computer objects in an OU by applying a single Group Policy object to the OU.

  • Group Policy

Group Policy allows you to configure a setting once and have it apply to
many user and/or computer objects. For example, if you want to ensure all the computers in
the sales department have their firewall enabled, you could place the computers in an OU
and call it Sales, configure a Group Policy object (GPO) that enables the firewall, and link the
policy to the Sales OU. It doesn’t matter if there are five computers in the OU or 5,000; a GPO
will apply the setting to all the computers in the OU.
You can link GPOs to OUs, entire domains, or sites. When linked, a GPO applies to all the
objects within the OU, domain, or site. For example, if you want all users in the entire domain
to have firewalls enabled, instead of linking the GPO to the site, you’d link it to the domain. Two
default GPOs are created when a domain is created: the default domain policy and the default
domain controllers policy.

  • Default domain policy

The default domain policy is a preconfigured GPO that is added
when a domain is created and linked at the domain level. Settings within the default domain
policy apply to all user and computer objects within the domain. This policy starts with some
basic security settings such as requirements for passwords but can be modified as desired.

  • Default domain controllers policy

The default domain controller policy is a preconfig-
ured GPO that is added when a domain is created and linked at the Domain Controllers
OU level. The Domain Controllers OU is created when a domain is created, and all domain
controllers are automatically placed in this OU when they are promoted to a DC. Since the
default domain controller policy is linked to the Domain Controllers OU, it applies to all
domain controllers.

  • Site

A site is a group of well-connected computers and is sometimes referred to as a group
of well-connected subnets. Small to medium-sized businesses often operate out of a single
location, and all the computers in this location are connected via a single LAN. This is a site.
If a remote office is created and connected via a slower connection, it could be configured as
a site. The remote office is well connected within the remote office but not well connected to
the main office. Sites are explored in much more depth in Chapter 21.

  • Forest

A forest is a group of one or more domains that share a common Active Directory.
A single forest will have only one schema (only one definition of objects that can be created)
and only one global catalog.

  • Global catalog

The global catalog (GC) is a listing of all the objects in the entire forest. It is
easily searchable and is often used by different applications to search AD DS for specific objects.
The global catalog is hosted on domain controllers that are designated as GC servers. Since there
is only one GC for a forest and a forest can include multiple domains, it can become quite large.
To limit its size, objects in the GC have only a subset of properties included. For example, a user
account may have 100 properties to describe it, but only about 10 are included in the GC.

  • Tree

A tree is a group of domains with a common namespace. That simply means the two-
part root domain name is common to other domains in the tree. The first domain in the forest
may be called Bigfirm.com. A child domain could be created named sales.bigfirm.com. Notice
the common name (Bigfirm.com). It is possible to create a separate tree within a forest. For
example, another domain could be created named littlefirm.com. It’s not the same namespace,
but since it is in the same forest, it would share a common schema and global catalog.

Note: this is from book Mastering Windows Server® 2008 R2

Categories: Windows Tags:

make ssh on linux not to disconnect after some certain time

November 1st, 2013 No comments

You connect to a linux box through ssh, and sometimes you just found ssh “hangs” there or just disconnected. That’s what ssh configuration on server makes this happen.

You can do the following to make this disconnection time long enough so that you get across this annoying issue:

cp /etc/ssh/sshd_config{,.bak30}
sed -i ‘/ClientAliveInterval/ s/^/# /’ /etc/ssh/sshd_config
sed -i ‘/ClientAliveCountMax/ s/^/# /’ /etc/ssh/sshd_config
echo ‘ClientAliveInterval 30′ >> /etc/ssh/sshd_config
echo ‘TCPKeepAlive yes’ >> /etc/ssh/sshd_config
echo ‘ClientAliveCountMax 99999′ >> /etc/ssh/sshd_config
/etc/init.d/sshd restart

Enjoy!

Categories: Linux Tags:

make sudo asking for no password on linux

November 1st, 2013 No comments

Assuming that you have a user named ‘test’, and he belongs to ‘admin’ group. So you want user test can sudo to root, and don’t want linux prompting for password. Here’s the way you can do it:

cp /etc/sudoers{,.bak}
sed -i ‘/%admin/ s/^/# /’ /etc/sudoers
echo ‘%admin ALL=(ALL) NOPASSWD: ALL’ >> /etc/sudoers

Enjoy!

Categories: Linux, Security Tags:

disable linux strong password policy

November 1st, 2013 No comments

You may enable strong password policy for linux, and can disable it of course. So here’s the way if you want to disable it:

cp /etc/pam.d/system-auth{,.bak}
sed -i ‘/pam_cracklib.so/ s/^/# /’ /etc/pam.d/system-auth
sed -i ‘s/use_authtok//’ /etc/pam.d/system-auth
echo “password” | passwd –stdin username

PS:

  1. To enable strong password for linux, you can have a try on this http://goo.gl/uwdbN
  2. You can read more about linux pam here http://www.linux-pam.org/Linux-PAM-html/
Categories: Linux, Security Tags:

make tee to copy stdin as well as stderr & prevent ESC output of script

October 30th, 2013 No comments
  • Make tee to copy stdin as well as stderr

As said by manpage of tee:

read from standard input and write to standard output and files

So if you have error messages in your script, then the error messages will not copied and write to file.

Here’s one workaround for this:

./aaa.sh 2>&1 | tee -a log

Or you can use the more complicated one:

command > >(tee stdout.log) 2> >(tee stderr.log >&2)

  • Prevent ESC output of script

script literally captures every type of output that was sent to the screen. If you have colored or bold output, this shows up as esc characters within the output file. These characters can significantly clutter the output and are not usually useful. If you set the TERM environmental variable to dumb (using setenv TERM dumb for csh-based shells and export TERM=dumb for sh-based shells), applications will not output the escape characters. This provides a more readable output.

In addition, the timing information provided by script clutters the output. Although it can be useful to have automatically generated timing information, it may be easier to not use script’s timing, and instead just time the important commands with the time command mentioned in the previous chapter.

PS:

  1. Here’s the full version http://stackoverflow.com/questions/692000/how-do-i-write-stderr-to-a-file-while-using-tee-with-a-pipe
  2. Some contents of this article is excerpted from <Optimizing Linux® Performance: A Hands-On Guide to Linux® Performance Tools>.
Categories: Linux, SHELL Tags: