Archive for the ‘Hardware’ Category

raid10 and raid01

April 21st, 2015 No comments

RAID 0 over RAID 1(raid 0+1, raid 10, stripe of mirrors, better)

(RAID 1) A = Drive A1 + Drive A2 (Mirrored)
(RAID 1) B = Drive B1 + Drive B2 (Mirrored)
RAID 0 = (RAID 1) A + (RAID 1) B (Striped)


RAID 1 over RAID 0(raid 1+0, raid01, mirror of stripes)

(RAID 0) A = Drive A1 + Drive A2 (Striped)
(RAID 0) B = Drive B1 + Drive B2 (Striped)
RAID 1 = (RAID 1) A + (RAID 1) B (Mirrored)


For write performance: raid0 > raid10 > raid5

The read performance should be all the same among all raid types.

Categories: Hardware, IT Architecture, Storage, Systems Tags:

resoved – nfs share chown: changing ownership of ‘blahblah': Invalid argument

October 28th, 2014 No comments

Today I encountered the following error when trying to change ownership of some files:

[root@test webdav]# chown -R apache:apache ./bigfiles/
chown: changing ownership of `./bigfiles/opcmessaging': Invalid argument
chown: changing ownership of `./bigfiles/': Invalid argument

This host is running CentOS 6.2, and in this version of OS, nfs4 is by default used:

[root@test webdav]# cat /proc/mounts |grep u01 /u01 nfs4 rw,relatime,vers=4,rsize=32768,wsize=32768

However, the NFS server does not support NFSv4 well, so I modified the share to use NFSv3 by force: /u01 nfs rsize=32768,wsize=32768,hard,nolock,timeo=14,noacl,intr,mountvers=3,nfsvers=3

After umount/mount, the issue was resolved!


If the NAS server is SUN ZFS appliance, then the following should be noted, or the issue may occur even on CentOS/Redhat linux 5.x:



Categories: Hardware, IT Architecture, Linux, NAS, Storage, Systems Tags:

Sun ZFS storage stuck due to incorrect LACP configuration

October 24th, 2014 No comments

Today we met issue with Sun ZFS storage 7320. NFS shares provisioned from the ZFS appliance were not responding to requests, even a "df -h" will stuck there for a long long time. And when we checked from ZFS storage side, we found the following statistics:



And during our checking for the traffic source, the ZFS appliance backed to normal by itself:



As we just configured LACP on this ZFS appliance the day before, so we doubted the issue was caused by incorrect network configuration. Here's the network config:


For "Policy", we should match with switch setup to even balance incoming/outgoing data flow.  Otherwise, we might experience uneven load balance. Our switch was set to L3, so L3 should be ok. We'll get better load spreading if the policy is L3+4 if the switch supports it.  With L3, all connections from any one IP will only use a single member of the aggregation.  With L3+4, it will load spread by UDP or TCP port too. More is here.

For "Mode", it should be set according to switch. If the switch is "passive" mode then server/storage needs to be on "active" mode, and vice versa.

For "Timer", it's regarding how often to check LACP status.

After checking switch setting, we found that the switch is in "Active" mode, and as ZFS appliance was also on "Active" mode, so that's the culprit. So we changed the setting to the following:

2-right-configurationAfter this, we had some observation and ZFS is now operating normally.


You should also have a check of disk operations, if there are timeout errors on the disks, then you should try replace them. Sometimes, a single disk may hang the SCSI bus.  Ideally, the system should fail the disk but it didn't happen. You should manually failed the disk to resolve the issue.

The ZFS Storage Appliance core analysis (previous note) confirms that the disk was the cause of the issue.

It was hanging up communication on the SCSI bus but once it was removed the issue was resolved.

It is uncommon for a single disk to hang up the bus, however; since the disks share the SCSI path (each drive does not have its own dedicated cabling and controller) it is sometimes seen.

You can check the ZFS appliance uptime by running "version show" in the console.

zfs-test:configuration> version show
Appliance Name: zfs-test
Appliance Product: Sun ZFS Storage 7320
Appliance Type: Sun ZFS Storage 7320
Appliance Version: 2013.,1-1.1
First Installed: Sun Jul 22 2012 10:02:24 GMT+0000 (UTC)
Last Updated: Sun Oct 26 2014 22:11:03 GMT+0000 (UTC)
Last Booted: Wed Dec 10 2014 10:03:08 GMT+0000 (UTC)
Appliance Serial Number: d043d335-ae15-4350-ca35-b05ba2749c94
Chassis Serial Number: 1225FMM0GE
Software Part Number: Oracle 000-0000-00
Vendor Product ID: urn:uuid:418bff40-b518-11de-9e65-080020a9ed93
Browser Name: aksh 1.0
Browser Details: aksh
HTTP Server: Apache/2.2.24 (Unix)
SSL Version: OpenSSL 1.0.0k 5 Feb 2013
Appliance Kit: ak/SUNW,maguro_plus@2013.,1-1.1
Operating System: SunOS 5.11 ak/generic@2013.,1-1.1 64-bit
BIOS: American Megatrends Inc. 08080102 05/23/2011
Service Processor:

Categories: Hardware, NAS, Storage Tags:

resolved – fsinfo ERROR: Stale NFS file handle POST

May 15th, 2014 No comments

Today when I tried mount NFS share from one NFS server, it timeout with "mount.nfs: Connection timed out".

I tried to search something in /var/log/messages but no useful info there was found. So I used tcpdump on NFS client:

[root@dcs-hm1-qa132 ~]# tcpdump -nn -vvv host #server is, client is
23:49:11.598407 IP (tos 0x0, ttl 64, id 26179, offset 0, flags [DF], proto TCP (6), length 96) > 40 null
23:49:11.598741 IP (tos 0x0, ttl 62, id 61186, offset 0, flags [DF], proto TCP (6), length 80) > reply ok 24 null
23:49:11.598812 IP (tos 0x0, ttl 64, id 26180, offset 0, flags [DF], proto TCP (6), length 148) > 92 fsinfo fh Unknown/0100010000000000000000000000000000000000000000000000000000000000
23:49:11.599176 IP (tos 0x0, ttl 62, id 61187, offset 0, flags [DF], proto TCP (6), length 88) > reply ok 32 fsinfo ERROR: Stale NFS file handle POST:
23:49:11.599254 IP (tos 0x0, ttl 64, id 26181, offset 0, flags [DF], proto TCP (6), length 148) > 92 fsinfo fh Unknown/010001000000000000002FFF000002580000012C0007B0C00000000A00000000
23:49:11.599627 IP (tos 0x0, ttl 62, id 61188, offset 0, flags [DF], proto TCP (6), length 88) > reply ok 32 fsinfo ERROR: Stale NFS file handle POST:

The reason of "ERROR: Stale NFS file handle POST" may caused by the following reasons:

1.The NFS server is no longer available
2.Something in the network is blocking
3.In a cluster during failover of NFS resource the major & minor numbers on the secondary server taking over is different from that of the primary.

To resolve the issue, you can try bounce NFS service on NFS server using /etc/init.d/nfs restart.

Categories: Hardware, NAS, Storage Tags:

stuck in PXE-E51: No DHCP or proxyDHCP offers were received, PXE-M0F: Exiting Intel Boot Agent, Network boot canceled by keystroke

March 17th, 2014 No comments

If you installed your OS and tried booting up it but stuck with the following messages:


Then one possibility is that, the configuration for your host's storage array is not right. For instance, it should be JBOD but you had configured it to RAID6.

Please note that this is only one possibility for this error, you may search for PXE Error Codes you encoutered for more details.


  • Sometimes, DHCP snooping may prevent PXE functioning, you can read more
  • STP(Spanning-Tree Protocol) makes each port wait up to 50 seconds before data is allowed to be sent on the port. This Delay in turn can cause problems with some applications/protocols (PXE, Bootworks, etc.). To alleviate the problem, Porfast was implemented on Cisco devices, the terminology might differ between different vendor devices. You can read more
  • ARP caching

“Include snapshots” made NFS shares from ZFS appliance shrinking

January 17th, 2014 No comments

Today I met one weird issue when checking one NFS share mounted from ZFS appliance. The NFS filesystem mounted on client was shrinking when I removed files as the space on that filesystem was getting low. But what made me confused was that the filesystem's size would getting lower! Shouldn't the free space getting larger and the size keep unchanged?

After some debugging, I found that this was caused by ZFS appliance shares' "Include snapshots". When I uncheck "Include snapshots", the issue was gone!


Categories: Hardware, NAS, Storage Tags:

resolved – ESXi Failed to lock the file

January 13th, 2014 No comments

When I was power on one VM in ESXi, one error occurred:

An error was received from the ESX host while powering on VM doxer-test.
Cannot open the disk '/vmfs/volumes/4726d591-9c3bdf6c/doxer-test/doxer-test_1.vmdk' or one of the snapshot disks it depends on.
Failed to lock the file

And also:

unable to access file since it is locked

This apparently was caused by some storage issue. I firstly googled and found most of the posts were telling stories about ESXi working mechanism, and I tried some of them but with no luck.

Then I thought of that our storage datastore was using NFS/ZFS, and NFS has file lock issue as you know. So I mount the nfs share which datastore was using and removed one file named lck-c30d000000000000. After this, the VM booted up successfully! (or we can log on ESXi host, and remove lock file there also)

Common storage multi path Path-Management Software

December 12th, 2013 No comments
Vendor Path-Management Software URL
Hewlett-Packard AutoPath, SecurePath
Microsoft MPIO
Hitachi Dynamic Link Manager
EMC PowerPath
IBM RDAC, MultiPath Driver
VERITAS Dynamic Multipathing (DMP)

SAN Terminology

September 13th, 2013 No comments
SCSI Target
A SCSI Target is a storage system end-point that provides a service of processing SCSI commands and I/O requests from an initiator. A SCSI Target is created by the storage system's administrator, and is identified by unique addressing methods. A SCSI Target, once configured, consists of zero or more logical units.
SCSI Initiator
A SCSI Initiator is an application or production system end-point that is capable of initiating a SCSI session, sending SCSI commands and I/O requests. SCSI Initiators are also identified by unique addressing methods (See SCSI Targets).
Logical Unit
A Logical Unit is a term used to describe a component in a storage system. Uniquely numbered, this creates what is referred to as a Logicial Unit Number, or LUN. A storage system, being highly configurable, may contain many LUNS. These LUNs, when associated with one or more SCSI Targets, forms a unique SCSI device, a device that can be accessed by one or more SCSI Initiators.
Internet SCSI, a protocol for sharing SCSI based storage over IP networks.
iSCSI Extension for RDMA, a protocol that maps the iSCSI protocol over a network that provides RDMA services (i.e. InfiniBand). The iSER protocol is transparently selected by the iSCSI subsystem, based on the presence of correctly configured IB hardware. In the CLI and BUI, all iSER-capable components (targets and initiators) are managed as iSCSI components.
Fibre Channel, a protocol for sharing SCSI based storage over a storage area network (SAN), consisting of fiber-optic cables, FC switches and HBAs.
SCSI RDMA Protocol, a protocol for sharing SCSI based storage over a network that provides RDMA services (i.e. InfiniBand).
An iSCSI qualified name, the unique identifier of a device in an iSCSI network. iSCSI uses the form for IQNs. For example, the appliance may use the IQN: to identify one of its iSCSI targets. This name shows that this is an iSCSI device built by a company registered in March of 1986. The naming authority is just the DNS name of the company reversed, in this case, "com.sun". Everything following is a unique ID that Sun uses to identify the target.
Target portal
When using the iSCSI protocol, the target portal refers to the unique combination of an IP address and TCP port number by which an initiator can contact a target.
Target portal group
When using the iSCSI protocol, a target portal group is a collection of target portals. Target portal groups are managed transparently; each network interface has a corresponding target portal group with that interface's active addresses. Binding a target to an interface advertises that iSCSI target using the portal group associated with that interface.
Challenge-handshake authentication protocol, a security protocol which can authenticate a target to an initiator, an initiator to a target, or both.
A system for using a centralized server to perform CHAP authentication on behalf of storage nodes.
Target group
A set of targets. LUNs are exported over all the targets in one specific target group.
Initiator group
A set of initiators. When an initiator group is associated with a LUN, only initiators from that group may access the LUN.
Categories: Hardware, SAN, Storage Tags: ,

make label for swap device using mkswap and blkid

August 6th, 2013 No comments

If you want to label one swap partition in linux, you should not use e2label for this purpose. As e2label is for changing the label on an ext2/ext3/ext4 filesystem, which do not include swap filesystem.

If you use e2label for this, you will get the following error messages:

[root@node2 ~]# e2label /dev/xvda3 SWAP-VM
e2label: Bad magic number in super-block while trying to open /dev/xvda3
Couldn't find valid filesystem superblock.

We should use mkswap for it. As mkswap has one option -L:

-L label
Specify a label, to allow swapon by label. (Only for new style swap areas.)

So let's see example below:

[root@node2 ~]# mkswap -L SWAP-VM /dev/xvda3
Setting up swapspace version 1, size = 2335973 kB
LABEL=SWAP-VM, no uuid

[root@node2 ~]# blkid
/dev/xvda1: LABEL="/boot" UUID="6c5ad2ad-bdf5-4349-96a4-efc9c3a1213a" TYPE="ext3"
/dev/xvda2: LABEL="/" UUID="76bf0aaa-a58e-44cb-92d5-098357c9c397" TYPE="ext3"
/dev/xvdb1: LABEL="VOL1" TYPE="oracleasm"
/dev/xvdc1: LABEL="VOL2" TYPE="oracleasm"
/dev/xvdd1: LABEL="VOL3" TYPE="oracleasm"
/dev/xvde1: LABEL="VOL4" TYPE="oracleasm"
/dev/xvda3: LABEL="SWAP-VM" TYPE="swap"

[root@node2 ~]# swapon /dev/xvda3

[root@node2 ~]# swapon -s
Filename Type Size Used Priority
/dev/xvda3 partition 2281220 0 -1

So now we can add swap to /etc/fstab using LABEL=SWAP-VM:

LABEL=SWAP-VM           swap                    swap    defaults        0 0

iostat dm- mapping to physical device

July 30th, 2013 No comments

-bash-3.2# iostat -tkxn 2

avg-cpu: %user %nice %system %iowait %steal %idle
0.02 0.00 0.48 0.00 0.21 99.29

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 1949.00 0.00 129648.00 66.52 30.66 15.77 0.51 100.20
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 1139.00 0.00 88752.00 77.92 22.92 20.09 0.83 95.00

Device: rBlk_nor/s wBlk_nor/s rBlk_dir/s wBlk_dir/s rBlk_svr/s wBlk_svr/s rops/s wops/s
nas-host:/export/test/repo 0.00 0.00 0.00 218444.00 0.00 218332.00 3084.50 3084.50

Then how can we know which physical device dm-3 is mapping to?

-bash-3.2# cat /sys/block/dm-3/dev
253:3 #this is major, minor number of dm-3


-bash-3.2# dmsetup ls
dmnfs6 (253, 6)
dmnfs5 (253, 5)
dmnfs4 (253, 4)
dmnfs3 (253, 3)
dmnfs2 (253, 2)
dmnfs1 (253, 1)
dmnfs0 (253, 0)

Then we can find which share the device is mapping to:

[root@testhost ~]# cat /sys/block/dm-3/size
8916075 #it's 4T

[root@testhost ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 9.7G 3.4G 5.9G 37% /
/dev/sda1 190M 57M 124M 32% /boot
tmpfs 2.9G 0 2.9G 0% /dev/shm
none 2.9G 648K 2.9G 1% /var/lib/xenstored
sharehost:/export/Service-2 4.1T 2.1T 2.0T 52% /media

So we now know that it's NFS which caused io busy.


device mapper(dm_mod module, dmsetup ls)

multiple devices(software raid, mdraid, /proc/mdstat) and

dmraid(fake raid)

DM-MPIO(DM-Multipathing, multipath/multipathd, dm_multipath module, combined with SAN)

Categories: Hardware, Storage Tags:

resolved – differences between zfs ARC L2ARC ZIL

January 31st, 2013 No comments
  • ARC

zfs ARC(adaptive replacement cache) is a very fast cache located in the server’s memory.

For example, our ZFS server with 12GB of RAM has 11GB dedicated to ARC, which means our ZFS server will be able to cache 11GB of the most accessed data. Any read requests for data in the cache can be served directly from the ARC memory cache instead of hitting the much slower hard drives. This creates a noticeable performance boost for data that is accessed frequently.

  • L2ARC

As a general rule, you want to install as much RAM into the server as you can to make the ARC as big as possible. At some point, adding more memory is just cost prohibitive. That is where the L2ARC becomes important. The L2ARC is the second level adaptive replacement cache. The L2ARC is often called “cache drives” in the ZFS systems.

L2ARC is a new layer between Disk and the cache (ARC) in main memory for ZFS. It uses dedicated storage devices to hold cached data. The main role of this cache is to boost the performance of random read workloads. The intended L2ARC devices include 10K/15K RPM disks like short-stroked disks, solid state disks (SSD), and other media with substantially faster read latency than disk.

  • ZIL

ZIL(ZFS Intent Log) exists for performance improvement on synchronous writes. Synchronous write is very slow than asynchronous write, but it's more stable. Essentially, the intent log of a file system is nothing more than an insurance against power failures, a to-do list if you will, that keeps track of the stuff that needs to be updated on disk, even if the power fails (or something else happens that prevents the system from updating its disks).

To get better performance, use separated disks(SSD) for ZIL, such as zpool add pool log c2d0.

Now I'm giving you an true example about zfs ZIL/L2ARC/ARC on SUN ZFS 7320 head:

test-zfs# zpool iostat -v exalogic
capacity operations bandwidth
pool alloc free read write read write
------------------------- ----- ----- ----- ----- ----- -----
exalogic 6.78T 17.7T 53 1.56K 991K 25.1M
mirror 772G 1.96T 6 133 111K 2.07M
c0t5000CCA01A5FDCACd0 - - 3 36 57.6K 2.07M #these are the physical disks
c0t5000CCA01A6F5CF4d0 - - 2 35 57.7K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A6F5D00d0 - - 2 36 56.2K 2.07M
c0t5000CCA01A6F64F4d0 - - 2 35 57.3K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A76A7B8d0 - - 2 36 56.3K 2.07M
c0t5000CCA01A746CCCd0 - - 2 36 56.8K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A749A88d0 - - 2 35 56.7K 2.07M
c0t5000CCA01A759E90d0 - - 2 35 56.1K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A767FDCd0 - - 2 35 56.1K 2.07M
c0t5000CCA01A782A40d0 - - 2 35 57.1K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A782D10d0 - - 2 35 57.2K 2.07M
c0t5000CCA01A7465F8d0 - - 2 35 56.3K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A7597FCd0 - - 2 35 57.6K 2.07M
c0t5000CCA01A7828F4d0 - - 2 35 56.2K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A7829ACd0 - - 2 35 57.1K 2.07M
c0t5000CCA01A78278Cd0 - - 2 35 57.4K 2.07M
mirror 772G 1.96T 6 133 111K 2.07M
c0t5000CCA01A736000d0 - - 3 35 57.3K 2.07M
c0t5000CCA01A738000d0 - - 2 35 57.3K 2.07M
c0t5000A72030061B82d0 224M 67.8G 0 98 1 1.62M #ZIL(SSD write cache, ZFS Intent Log)
c0t5000A72030061C70d0 224M 67.8G 0 98 1 1.62M
c0t5000A72030062135d0 223M 67.8G 0 98 1 1.62M
c0t5000A72030062146d0 224M 67.8G 0 98 1 1.62M
cache - - - - - -
c2t2d0 334G 143G 15 6 217K 652K #L2ARC(SSD cache drives)
c2t3d0 332G 145G 15 6 215K 649K
c2t4d0 333G 144G 11 6 169K 651K
c2t5d0 333G 144G 13 6 192K 650K
c2t2d0 - - 0 0 0 0
c2t3d0 - - 0 0 0 0
c2t4d0 - - 0 0 0 0
c2t5d0 - - 0 0 0 0

And as for ARC:

test-zfs:> status memory show
Cache 63.4G bytes #ARC
Unused 17.3G bytes
Mgmt 561M bytes
Other 491M bytes
Kernel 14.3G bytes

sun zfs firmware upgrade howto

January 29th, 2013 No comments

This article is going to talk about upgrading firmware for sun zfs 7320(you may find other series of sun zfs heads works too):

PS: Better not use failback, you should always log on the standby ZFS node and do a takeover. This is rule of thumb.PS:

1. On configuration -> cluster, you can see the shared resources(transferable) along with resources owned by current node(locked resource, , such as MGMT interface). And on configuration -> network, only the config of shared network resources(transferable) along with network resources owned by current node(locked resource, such as MGMT interface).

2. Additional Oracle ZFS Storage Appliance related software is available for download at the Oracle Technology Network.

3. Takeover can occur at any time; as discussed above, takeover is attempted whenever peer failure is detected. It can also be triggered manually using the cluster configuration CLI or BUI. This is useful for testing purposes as well as to perform rolling software upgrades (upgrades in which one head is upgraded while the other provides service running the older software, then the second head is upgraded once the new software is validated). Finally, takeover will occur when a head boots and detects that its peer is absent. This allows service to resume normally when one head has failed permanently or when both heads have temporarily lost power.

Failback never occurs automatically. When a failed head is repaired and booted, it will rejoin the cluster (resynchronizing its view of all resources, their properties, and their ownership) and proceed to wait for an administrator to perform a failback operation. Until then, the original surviving head will continue to provide all services. This allows for a full investigation of the problem that originally triggered the takeover, validation of a new software revision, or other administrative tasks prior to the head returning to production service. Because failback is disruptive to clients, it should be scheduled according to business-specific needs and processes. There is one exception: Suppose that head A has failed and head B has taken over. When head A rejoins the cluster, it becomes eligible to take over if it detects that head B is absent or has failed. The principle is that it is always better to provide service than not, even if there has not yet been an opportunity to investigate the original problem. So while failback to a previously-failed head will never occur automatically, it may still perform takeover at any time.

In active-active mode, when take over happens, all resources, include the ones on peer node will be transferred. When the failed node comes back to life, you can then issue the failback which will give back resources assigned to it.


Categories: Hardware, NAS, SAN, Storage Tags:

perl script for monitoring sun zfs memory usage

January 16th, 2013 No comments

On zfs's aksh, I can check memory usage with the following:

test-zfs:> status memory show
Cache 719M bytes
Unused 15.0G bytes
Mgmt 210M bytes
Other 332M bytes
Kernel 7.79G bytes

So now I want to collect this memory usae information automatically for SNMP's use. Here's the steps:

cpan> o conf prerequisites_policy follow
cpan> o conf commit

Since the host is using proxy to get on the internet, so in /etc/wgetrc:

http_proxy =
ftp_proxy =
use_proxy = on

Now install the Net::SSH::Perl perl module:

PERL_MM_USE_DEFAULT=1 perl -MCPAN -e 'install Net::SSH::Perl'

And to confirm that Net::SSH::Perl was installed, run the following command:

perl -e 'use Net::SSH::Perl' #no output is good, as it means the package was installed successfully

Now here goes the perl script to get the memory usage of sun zfs head:

[root@test-centos ~]# cat /var/tmp/mrtg/
use strict;
use warnings;
use Net::SSH::Perl;
my $host = 'test-zfs';
my $user = 'root';
my $password = 'password';

my $ssh = Net::SSH::Perl->new($host);
my ($stdout,$stderr,$exit) = $ssh->cmd("status memory show");
print "ErrorCode:$exit\n";
print "ErrorMsg:$stderr";
} else {
my @std_arr = split(/\n/, $stdout);
shift @std_arr;
foreach(@std_arr) {
if ($_ =~ /.+\b\s+(.+)M\sbytes/){
elsif($_ =~ /.+\b\s+(.+)G\sbytes/){
foreach(@std_arr) {
print $_."\n";
exit $exit;

If you get the following error messages during installation of a perl module:

[root@test-centos ~]# perl -MCPAN -e 'install SOAP::Lite'
CPAN: Storable loaded ok
CPAN: LWP::UserAgent loaded ok
Fetching with LWP:
LWP failed with code[500] message[LWP::Protocol::MyFTP: connect: Connection timed out]
Fetching with Net::FTP:

Trying with "/usr/bin/links -source" to get
ELinks: Connection timed out

Then you may have a check of whether you're using proxy to get on the internet(run cpan > o conf init to re-configure cpan; later you should set /etc/wgetrc: http_proxy, ftp_proxy, use_proxy).


zfs iops on nfs iscsi disk

January 5th, 2013 No comments

On zfs storage 7000 series BUI, you may found the following statistic:

This may seem quite weird as you can see that, NFSv3(3052) + iSCSI(1021) is larger than Disk(1583). As iops for protocal NFSv3/iSCSI finally goes to Disk, so why iops for the two protocals is larger than Disk iops?

Here's the reason:

Disk operations for NFSv3 and iSCSI are logical operations. These logical operations are then combined/optimized by sun zfs storage and then finally go to physical Disk operations.


1.When doing continuous access to disks(like VOD), disk throughputs will become the bottleneck of performance rather than IOPS. In constract, IOPS limits disk performance when random access is going on disks.

2.For NAS performance analytic, here are two good articles(in Chinese)

3.You may also wonder why Disk iops can be as high as 1583. As this number is the sum of all disk controllers of the zfs storage system. Here's some ballpark numbers for HDD iops:


Categories: Hardware, NAS, SAN, Storage Tags:

zfs shared lun stoage set up for oracle RAC

January 4th, 2013 No comments
  • create iSCSI Target Group

Open zfs BUI, navigate through "Configuration" -> "SAN" -> "iSCSI Targets". Then create new iSCSI Target by clicking plus sign. Give it an alias, and then select the Network interface(may be bond or LACP) you want to use(check it from "Configuration" -> "Network" and "Configuration" -> "Cluster"). After creating this iSCSI target, drag the newly created target to the right side "iSCSI Target Groups" to create one iSCSI Target Group. You can give that iSCSI target group an name too. Note down the iSCSI Target Group's iqn, this is important for later operations.(Network interfaces:use NAS interface. You can select multiple interfaces)

  • create iSCSI Initiator Group

Before going on the next step, we need first get the iSCSI initiator IQN for each hosts we want LUN allocated. On each host, execute the following command to get the iqn for iscsi on linux platform(You can edit this file before read it, for example, make iqn name ended with` hostname` so it's easier for later operations on LUN<do a /etc/init.d/iscsi restart after your modification to initiatorname.iscsi>):

[root@test-host ~]# cat /etc/iscsi/initiatorname.iscsi
InitiatorName=<your host's iqn name>

Now go back to zfs BUI, navigate through "Configuration" -> "SAN" -> "Initiators". On the left side, click "iSCSI Initiators", then click plus sign on it. Enter IQN you get from previos step and give it an name.(do this for each host you want iSCSI LUN allocated). After this, drag the newly created iSCSI initiator(s) from left side to form new iSCSI Initiator Groups on the right side(drag two items from the left to the same item on the right to form an group).

  • create shared LUNs for iSCSI Initiator Group

After this, we need now create LUNs for iSCSI Initiator Group(so that shared lun can be allocated, for example, oracle RAC need shared storage). Click on diskette sign on the just created iSCSI Initiator Group,select the project you want the LUN allocated from, give it a name, and assign the volume size. Select the right target group you created before(you can also create a new one e.g. RAC in shares).

PS:You can also now go to "shares" -> "Luns" and create Lun(s) using the target group you created and use default Initiator group. Note that one LUN need one iSCSI target. So you should create more iSCSI targets and add them to iSCSI target group if you want more LUNs.

  • scan shared LUNs from hosts

Now we're going to operate on linux hosts. On each host you want iSCSI LUN allocated, do the following steps:

iscsiadm -m discovery -t st -p <ip address of your zfs storage>(use cluster's ip if there's zfs cluster) #Discover available targets from a discovery portal
iscsiadm -m node -T <variable, iSCSI Target Group iqn> -p <ip address of your zfs storage> -l #Log into a specific target. Or use output from above command($1 is --portal, $3 is --targetname, -l is --login). Use -u to log out for a specified record.
service iscsi restart

After these steps, you host(s) should now see the newly allocated iSCSI LUN(s), you can run fdisk -l to confirm.


Here's more about iscsiadm:

iscsiadm -m session -P 1 #Display list of all current sessions logged in, -P for level(1 to 3). From here, we can know which target are the local disks mapped to(e.g. /dev/sde is mapped to target, then we can know the NAS ZFS appliance name. On ZFS appliance, after got the target group name, we can check which LUNs use that target group, thus we know the mapping between local iscsi disk and ZFS LUN)

iscsiadm -m session --rescan #rescan all sessions

iscsiadm -m session -r SID --rescan #rescan a specific session #SID can be got from iscsiadm -m session -P 1

iscsiadm -m node -T targetname -p ipaddress -u #Log out of a specific target

If you want to remove specific iSCSI LUN from system, then do the following:

cd /sys/class/iscsi_session/session<SID>/device/target<scsi host number>:0:0/<scsi host number>:0:0:<Lun number>
echo 1 > delete

iscsiadm -m node -T targetname -p ipaddress #Display information about a target
iscsiadm -m node -s -T targetname -p ipaddress #Display statistics about a target

iscsiadm -m discovery -o show #View iSCSI database regarding discovery
iscsiadm -m node -o show #View iSCSI database regarding targets to log into
iscsiadm -m session -o show #View iSCSI database regarding sessions logged into
multipath -ll #View if the targets are multipathed (MPIO)

You can find more information about iscsi disk in /sys/class/{scsi_device, scsi_disk, scsi_generic,scsi_host} and /sys/block/ after get the info from iscsiadm -m session -P 3.

Here's the CMD equivalent of this article(aksh of oracle ZFS appliance):

shares project rac186187
set mountpoint=/export/rac186187
set quota=1T
set readonly=true
set default_permissions=777

set default_user=root

set default_group=root
set sharenfs="sec=sys,,rw=@,root=@"
#get reservation
#get pool
#get snapdir #snapdir = visible
#get default_group #default_group = root
#get default_user #default_user = root
#get exported #exported = true

configuration net interfaces list
configuration net datalinks list

configuration san iscsi targets create rac186187
set alias=rac186187
set interfaces=aggr93001

configuration san iscsi targets list
configuration san iscsi targets groups create rac186187
set name=rac186187

[cat /etc/iscsi/initiatorname.iscsi] ->,
configuration san iscsi initiators create testhost186
set alias=testhost186

configuration san iscsi initiators create testhost187
set alias=testhost187

configuration san iscsi initiators list
configuration san iscsi initiators groups create rac186187
set name=rac186187

shares select rac186187 #project must set readonly=false
lun rac186187
set volsize=500G
set targetgroup=rac186187
set initiatorgroup=rac186187

And if you want to create one share, here's the way:

shares select <project name> #to show current properties, run "show"
filesystem <new share name>
set quota=100G
set quota_snap=false
set reservation=50G
set reservation_snap=false
set root_permissions=777
set root_user=root
set root_group=root
cd /


Suppose we know that there is one session logged in to target, then how can we add LUNs or change the size of existed LUN on ZFS?

First, to change the size of existing LUN:

[root@testhost1 ~]# iscsiadm -m session
tcp: [2],2

Log on ZFS UI, go to Configuration, SAN, iSCSI, Targets, search for "" which is the target name, then you'll find the target and target group it belongs to. Note down the target group name, e.g. RAC01, then go to Shares, LUNs, click on the LUN and change the size as needed.

To add new LUN to the host:

First, find the iscsi initiator name on this host testhost1-p:

[root@testhost1 ~]# cat /etc/iscsi/initiatorname.iscsi

Log on ZFS UI, go to Configuration, SAN, iSCSI, Initiators, search "", you'll find the initiator and initiator group. From here you can click the diskette icon and add new LUN. Make sure to select the right target group you got from previous step.

Categories: Hardware, NAS, Storage Tags:

resolved – error: conflicting types for ‘pci_pcie_cap’ – Infiniband driver OFED installation

December 10th, 2012 No comments

OFED is an abbr for "OpenFabrics Enterprise Distribution". When installing OFED- on a centos/RHEL 5.8 linux system, I met the following problem:

In file included from /var/tmp/OFED_topdir/BUILD/ofa_kernel-
/var/tmp/OFED_topdir/BUILD/ofa_kernel- error: conflicting types for 'pci_pcie_cap'
include/linux/pci.h:1015: error: previous definition of 'pci_pcie_cap' was here
make[4]: *** [/var/tmp/OFED_topdir/BUILD/ofa_kernel-] Error 1
make[3]: *** [/var/tmp/OFED_topdir/BUILD/ofa_kernel-] Error 2
make[2]: *** [/var/tmp/OFED_topdir/BUILD/ofa_kernel-] Error 2
make[1]: *** [_module_/var/tmp/OFED_topdir/BUILD/ofa_kernel-] Error 2
make[1]: Leaving directory `/usr/src/kernels/2.6.18-308.'
make: *** [kernel] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.70159 (%build)

After googling and experimenting, I can tell you the definite resolution for this problem -> install another version of OFED, OFED- Go to the following official site to download:


1. Here's the file of OFED installation howto: OFED installation README.txt

2.Some Infiniband commands that may be useful for you:

service openibd status/start/stop
lspci -k|grep -i infini
ibv_devinfo #Verify the status of ports by using ibv_devinfo: all connected ports should report a "PORT_ACTIVE" state

how to turn on hba flags connected to EMC arrays

October 3rd, 2012 No comments

As per EMC recommendation following flags should be enabled for Vmware ESX hosts, if not there will be performance issues:


Here's the commands that'll do the trick:

sudo symmask -sid <sid> set hba_flags on C,SPC2,SC3 -enable -wwn <port wwn> -dir <dir number> -p <port number>

Categories: Hardware, NAS, SAN, Storage Tags:

Resolved – Errors found during scanning of LUN allocated from IBM XIV array

October 2nd, 2012 No comments

So here's the story:
After the LUN(IBM XIV array) allocated, we run a 'xiv_fc_admin -R' to make the LUN visible to OS(testhost-db-clstr-vol_37 is the new LUN's Volume Name):
root@testhost01 # xiv_devlist -o device,vol_name,vol_id
XIV Devices
Device Vol Name Vol Id
/dev/dsk/c2t500173804EE40140d19s2 testhost-db-clstr-vol_37 1974
/dev/dsk/c2t500173804EE40150d19s2 testhost-db-clstr-vol_37 1974
/dev/dsk/c4t500173804EE40142d19s2 testhost-db-clstr-vol_37 1974
/dev/dsk/c4t500173804EE40152d19s2 testhost-db-clstr-vol_37 1974
/dev/vx/dmp/xiv0_16 testhost-db-clstr-vol_17 1922
Non-XIV Devices

Then, I ran 'vxdctl enable' in order to make the DMP device visible to OS, but error message prompted:
root@testhost01 # vxdctl enable
VxVM vxdctl ERROR V-5-1-0 Data Corruption Protection Activated - User Corrective Action Needed
VxVM vxdctl INFO V-5-1-0 To recover, first ensure that the OS device tree is up to date (requires OS specific commands).
VxVM vxdctl INFO V-5-1-0 Then, execute 'vxdisk rm' on the following devices before reinitiating device discovery:
xiv0_18, xiv0_18, xiv0_18, xiv0_18

After this, the new LUN disappered from output of 'xiv_devlist -o device,vol_name,vol_id'(testhost-db-clstr-vol_37 disappered), and xiv0_18(the DMP device of new LUN) turned to 'Unreachable device', see below:

root@testhost01 # xiv_devlist -o device,vol_name,vol_id
XIV Devices
Device Vol Name Vol Id
Non-XIV Devices
Unreachable devices: /dev/vx/dmp/xiv0_18
Also, 'vxdisk list' showed:
root@testhost01 # vxdisk list xiv0_18
Device: xiv0_18
devicetag: xiv0_18
type: auto
flags: error private autoconfig
pubpaths: block=/dev/vx/dmp/xiv0_18s2 char=/dev/vx/rdmp/xiv0_18s2
guid: -
udid: IBM%5F2810XIV%5F4EE4%5F07B6
site: -
Multipathing information:
numpaths: 4
c4t500173804EE40142d19s2 state=disabled
c4t500173804EE40152d19s2 state=disabled
c2t500173804EE40150d19s2 state=disabled
c2t500173804EE40140d19s2 state=disabled

I tried to format the new DMP device(xiv0_18), but failed with info below:
root@testhost01 # format -d /dev/vx/dmp/xiv0_18
Searching for disks...done

c2t500173804EE40140d19: configured with capacity of 48.06GB
c2t500173804EE40150d19: configured with capacity of 48.06GB
c4t500173804EE40142d19: configured with capacity of 48.06GB
c4t500173804EE40152d19: configured with capacity of 48.06GB
Unable to find specified disk '/dev/vx/dmp/xiv0_18'.

Also, 'vxdisksetup -i' failed with info below:
root@testhost01 # vxdisksetup -i /dev/vx/dmp/xiv0_18
prtvtoc: /dev/vx/rdmp/xiv0_18: No such device or address

And, 'xiv_fc_admin -R' failed with info below:
root@testhost01 # xiv_fc_admin -R
ERROR: Error during command execution: vxdctl enabled
OK, that's all of the symptoms and the headache, here's the solution:

1. Run 'xiv_fc_admin -R'(ERROR: Error during command execution: vxdctl enabled will prompt, ignore it. this step scanned for new LUN). You can also run a devfsadm -c disk(not needed actually)
2. Now exclude problematic paths of the DMP device(you can check the paths from vxdisk list xiv0_18)
root@testhost01 # vxdmpadm exclude vxvm path=c4t500173804EE40142d19s2
root@testhost01 # vxdmpadm exclude vxvm path=c4t500173804EE40152d19s2
root@testhost01 # vxdmpadm exclude vxvm path=c2t500173804EE40150d19s2
root@testhost01 # vxdmpadm exclude vxvm path=c2t500173804EE40140d19s2
3. Now run 'vxdctl enable', the following error message will NOT showed:
VxVM vxdctl ERROR V-5-1-0 Data Corruption Protection Activated - User Corrective Action Needed
VxVM vxdctl INFO V-5-1-0 To recover, first ensure that the OS device tree is up to date (requires OS specific commands).
VxVM vxdctl INFO V-5-1-0 Then, execute 'vxdisk rm' on the following devices before reinitiating device discovery:
xiv0_18, xiv0_18, xiv0_18, xiv0_18
4. Now include the problematic paths:
root@testhost01 # vxdmpadm include vxvm path=c4t500173804EE40142d19s2
root@testhost01 # vxdmpadm include vxvm path=c4t500173804EE40152d19s2
root@testhost01 # vxdmpadm include vxvm path=c2t500173804EE40150d19s2
root@testhost01 # vxdmpadm include vxvm path=c2t500173804EE40140d19s2

5. Run 'vxdctl enable'. After this, you should now see the DMP device in output of 'xiv_devlist -o device,vol_name,vol_id'
root@testhost01 # xiv_devlist -o device,vol_name,vol_id
XIV Devices
Device Vol Name Vol Id
/dev/vx/dmp/xiv0_18 testhost-db-clstr-vol_37 1974
Non-XIV Devices

6. 'vxdisk list' will now show the DMP device(xiv0_18) as 'auto - - nolabel', obviously we should now label the DMP device:
root@testhost01 # format -d xiv0_18
Searching for disks...done

c2t500173804EE40140d19: configured with capacity of 48.06GB
c2t500173804EE40150d19: configured with capacity of 48.06GB
c4t500173804EE40142d19: configured with capacity of 48.06GB
c4t500173804EE40152d19: configured with capacity of 48.06GB
Unable to find specified disk 'xiv0_18'.

root@testhost01 # vxdisk list xiv0_18
Device: xiv0_18
devicetag: xiv0_18
type: auto
flags: nolabel private autoconfig
pubpaths: block=/dev/vx/dmp/xiv0_18 char=/dev/vx/rdmp/xiv0_18
guid: -
udid: IBM%5F2810XIV%5F4EE4%5F07B6
site: -
errno: Disk is not usable
Multipathing information:
numpaths: 4
c4t500173804EE40142d19s2 state=enabled
c4t500173804EE40152d19s2 state=enabled
c2t500173804EE40150d19s2 state=enabled
c2t500173804EE40140d19s2 state=enabled

root@testhost01 # vxdisksetup -i /dev/vx/dmp/xiv0_18
prtvtoc: /dev/vx/rdmp/xiv0_18: Unable to read Disk geometry errno = 0x16

Not again! But don't panic this time. Now run format for each subpath of the DMP device(can be found in output of vxdisk list xiv0_18), for example:
root@testhost01 # format c4t500173804EE40142d19s2

c4t500173804EE40142d19s2: configured with capacity of 48.06GB
selecting c4t500173804EE40142d19s2
[disk formatted]
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
!<cmd> - execute <cmd>, then return
format> label
Ready to label disk, continue? yes

format> save
Saving new disk and partition definitions
Enter file name["./format.dat"]:
format> quit

7. After the subpaths were labelled, now run a 'vxdctl enable' again. After this, you'll find the DMP device turned it's state from 'auto - - nolabel' to 'auto:none - - online invalid', and vxdisk list no longer showed the DMP device as 'Disk is not usable':
root@testhost01 # vxdisk list xiv0_18
Device: xiv0_18
devicetag: xiv0_18
type: auto
info: format=none
flags: online ready private autoconfig invalid
pubpaths: block=/dev/vx/dmp/xiv0_18s2 char=/dev/vx/rdmp/xiv0_18s2
guid: -
udid: IBM%5F2810XIV%5F4EE4%5F07B6
site: -
Multipathing information:
numpaths: 4
c4t500173804EE40142d19s2 state=enabled
c4t500173804EE40152d19s2 state=enabled
c2t500173804EE40150d19s2 state=enabled
c2t500173804EE40140d19s2 state=enabled

8. To add the new DMP device to Disk Group, the following steps should be followed:
/usr/lib/vxvm/bin/vxdisksetup -i xiv0_18
vxdg -g <dg_name> adddisk <disk_name>=<device name>
/usr/sbin/vxassist -g <dg_name> maxgrow <vol name> alloc=<newly-add-luns>
/etc/vx/bin/vxresize -g <dg_name> -bx <vol name> <new size>


Categories: Hardware, SAN, Storage Tags:

lvm snapshot backup

September 20th, 2012 No comments

Here are the updated steps for LVM snapshot backup.

# determine available space.

sudo vgs | awk '/VolGroup/ {print $NF}'

# create snapshots, 1Gb should be enough but if you really struggle you can try even smaller cause it will only have to hold the delta between now and when you destroy them.

sudo lvcreate -L 1G -s -n rootLVsnap VolGroup00/rootLV

sudo lvcreate -L 1G -s -n varLVsnap VolGroup00/varLV

# create the FS for backups. allocate as much space as you have. last-resort: use NFS.

sudo lvcreate -n OSbackup -L 5G VolGroup00

sudo mkfs -t ext3 /dev/VolGroup00/OSbackup

sudo mkdir /OSbackup

sudo mount /dev/VolGroup00/OSbackup /OSbackup

# create a backup of root

# gzip is important. if you are really tight on space, try gzip -9 or even bzip2

sudo dd if=/dev/VolGroup00/rootLVsnap bs=1M | sudo sh -c 'gzip -c > /OSbackup/root.dd.gz'

# now, remove root snapshot and extend backup fs

sudo lvremove VolGroup00/rootLVsnap

sudo lvextend -L +1G VolGroup00/OSbackup

sudo resize2fs /dev/VolGroup00/OSbackup

# backup var

sudo dd if=/dev/VolGroup00/varLVsnap bs=1M | sudo sh -c 'gzip -c > /OSbackup/var.dd.gz'

sudo lvremove VolGroup00/varLVsnap

# backup boot

cd /boot; sudo tar -pczf /OSbackup/boot.tar.gz .

# unmount the fs and destroy mountpoint

sudo umount /OSbackup

sudo rmdir /OSbackup

PS: Here's more info

Categories: Hardware, Storage Tags:

thin provisioning aka virtual provisioning on EMC Symmetrix

July 28th, 2012 No comments

For basic information about thin provisioning, here's some excerpts from wikipedia/HDS site:

Thin provisioning is the act of using virtualization technology to give the appearance of more physical resource than is actually available. It relies on on-demand allocation of blocks of data versus the traditional method of allocating all the blocks up front. This methodology eliminates almost all whitespace which helps avoid the poor utilization rates, often as low as 10%, that occur in the traditional storage allocation method where large pools of storage capacity are allocated to individual servers but remain unused (not written to). This traditional model is often called "fat" or "thick" provisioning.

Thin provisioning simplifies application storage provisioning by allowing administrators to draw from a central virtual pool without immediately adding physical disks. When an application requires more storage capacity, the storage system automatically allocates the necessary physical storage. This just-in-time method of provisioning decouples the provisioning of storage to an application from the physical addition of capacity to the storage system.

The term thin provisioning is applied to disk later in this article, but could refer to an allocation scheme for any resource. For example, real memory in a computer is typically thin provisioned to running tasks with some form of address translation technology doing the virtualization. Each task believes that it has real memory allocated. The sum of the allocated virtual memory assigned to tasks is typically greater than the total of real memory.

The following article below shows the step how to create thin pool, add and remove components from the pool and how to delete thin pool:

And for more information about thin provisioning on EMC Symmetrix V-Max  with Veritas Storage Foundation, the following PDF file may help you.

EMC Symmetrix V-Max with Veritas Storage Foundation.pdf


1.symcfg -sid 1234 list -datadev #list all TDAT devices(thin data devices which consists thin pool, and thin pool provide the actual physical storage to thin devices)
2.symcfg -sid 1234 list -tdev #list all TDEV devices(thin devices)

3.The following article may be useful for you if you encountered problems when trying to perform storage reclamation(VxVM vxdg ERROR V-5-1-16063 Disk d1 is used by one or more subdisks which are pending to be reclaimed):



Categories: Hardware, SAN, Storage Tags: ,

Resolved – VxVM vxconfigd ERROR V-5-1-0 Segmentation violation – core dumped

July 25th, 2012 2 comments

When I tried to import veritas disk group today using vxdg -C import doxerdg, there's error message shown as the following:

VxVM vxdg ERROR V-5-1-684 IPC failure: Configuration daemon is not accessible
return code of vxdg import command is 768

VxVM vxconfigd DEBUG V-5-1-0 IMPORT: Trying to import the disk group using configuration database copy from emc5_0490
VxVM vxconfigd ERROR V-5-1-0 Segmentation violation - core dumped

Then I used pstack to print the stack trace of the dumped file:

root # pstack /var/core/core_doxerorg_vxconfigd_0_0_1343173375_140
core 'core_doxerorg_vxconfigd_0_0_1343173375_14056' of 14056: vxconfigd
ff134658 strcmp (fefc04e8, 103fba8, 0, 0, 31313537, 31313737) + 238
001208bc da_find_diskid (103fba8, 0, 0, 0, 0, 0) + 13c
002427dc dm_get_da (58f068, 103f5f8, 0, 0, 68796573, 0) + 14c
0023f304 ssb_check_disks (58f068, 0, f37328, fffffffc, 4, 0) + 3f4
0018f8d8 dg_import_start (58f068, 9c2088, ffbfed3c, 4, 0, 0) + 25d8
00184ec0 dg_reimport (0, ffbfedf4, 0, 0, 0, 0) + 288
00189648 dg_recover_all (50000, 160d, 3ec1bc, 1, 8e67c8, 447ab4) + 2a8
001f2f5c mode_set (2, ffbff870, 0, 0, 0, 0) + b4c
001e0a80 setup_mode (2, 3e90d4, 4d5c3c, 0, 6c650000, 6c650000) + 18
001e09a0 startup (4d0da8, 0, 0, fffffffc, 0, 4d5bcc) + 3e0
001e0178 main (1, ffbffa7c, ffbffa84, 44f000, 0, 0) + 1a98
000936c8 _start (0, 0, 0, 0, 0, 0) + b8

Then I tried restart vxconfigd, but it failed as well:

root@doxer#/sbin/vxconfigd -k -x syslog

VxVM vxconfigd ERROR V-5-1-0 Segmentation violation - core dumped

After reading the man page of vxconfigd, I determined to use -r reset to reset all Veritas Volume Manager configuration information stored in the kernel as part of startup processing. But before doing this, we need umount all vxvm volumes as stated in the man page:

The reset fails if any volume devices are in use, or if an imported shared disk group exists.

After umount all vxvm partitions, then I ran the following command:

vxconfid -k -r reset

After this, the importing of DGs succeeded.

Categories: Hardware, SAN, Storage Tags: ,

resolved – df Input/output error from veritas vxfs

July 10th, 2012 No comments

If you got error like the following when do a df list which has veritas vxfs as underlying FS:

df: `/BCV/testdg': Input/output error
df: `/BCV/testdg/ora': Input/output error
df: `/BCV/testdg/ora/archivelog01': Input/output error
df: `/BCV/testdg/ora/gg': Input/output error

And when use vxdg list, you found the dgs are in disabled status:

testarc_PRD disabled 1275297639.26.doxer
testdb_PRD disabled 1275297624.24.doxer

Don't panic, to resolve this, you need do the following:

1) Force umount of the failed fs's
2) deporting and importing failed disk groups.
3) Fixing plexes which were in the DISABLED FAILED state.
4) Fsck.vxfs of failed fs's
5) Remounting of the needable fs's

Categories: Hardware, SAN, Storage Tags:


July 5th, 2012 No comments

SIM - Systems Insight Manager(and SIM agent), port 50000 and https.

PSP - Proliant suppot package, conf files at /etc/hp-snmp-agents.conf and /etc/snmp/snmpd.conf. It talks to SIM server.

Categories: Hardware, Servers Tags:

error system dram available – some DIMMs not probed by solaris

June 12th, 2012 No comments

We encountered error message as the following after we disabled some components on a SUN T2000 server:

SEP 29 19:34:35 ERROR: System DRAM  Available: 004096 MB  Physical: 008192 MB

This means that only 4Gb memory available although 8G is physically installed on the server. We can confirm the memory size from the following:

# prtdiag
System Configuration: Sun Microsystems sun4v Sun Fire T200
Memory size: 3968 Megabytes


1. Why the server panic :
We don't have clear data for that . But normally Solaris 10 on T2000 Systems may Panic When Low on Memory. The system panics with the following panic string:

hypervisor call 0x21 returned an unexpected error 2

But in our case , that's also not happened .

2. The error which we can see from Alom:

sc> showcomponent
Disabled Devices
MB/CMP0/CH0/R1/D1 : [Forced DIAG fail (POST)]

DIMMs with CEs are being unnecessarily flagged by POST as faulty. When POST encounters a single CE, the associated DIMM is declared faulty and half of system's memory is deconfigured and unavailable for Solaris. Since PSH (Predictive Self-Healing) is the primary means for detecting errors and diagnosing faults on the Niagara platforms, this policy is too aggressive (reference bug 6334560).
3.What action we can take now :

a) clear the SC log .
b)enable the component in SC .
c)Monitor the server .

if again same fault reports , we will replace the DIMM.


For more information about DIMM, you can refer to

Categories: Hardware, Servers Tags: ,

H/W under test during POST on SUN T2000 Series

June 12th, 2012 No comments

We got the following error messages during POST on a SUN T2000 Series server:

0:0:0>ERROR: TEST = Queue Block Mem Test
0:0:0>H/W under test = MB/CMP0/CH0/R1/D1/S0 (J0901)
0:0:0>Repair Instructions: Replace items in order listed by 'H/W under
test' above.
0:0:0>MSG = Pin 236 failed on MB/CMP0/CH0/R1/D1/S0 (J0901)
ERROR: The following devices are disabled:
Aborting auto-boot sequence.

To resolve this issue, we can disable the components in ALOM/ILOM and power off /on then try to reboot the machine. Here's the steps:

If you use ALOM :
disablecomponent component

If you use ILOM :
-> set /SYS/component component_state=disabled
-> stop /SYS
-> start /SYS
Example :
-> set /SYS/MB/CMP0/CH0/R1/D1 component_state=disabled

-> stop /SYS
Are you sure you want to stop /SYS (y/n)? y
Stopping /SYS
-> start /SYS
Are you sure you want to start /SYS (y/n)? y
Starting /SYS

After you disabled the components, you should clear SC error log and FMA logs:

Clearing faults from SC:

a) Show the faults on the system controller
sc> showfaults -v

b) For each fault listed run
sc> clearfault <uuid>

c) re-enable the disabled components run
sc> clearasrdb

d) Clear ereports
sc> setsc sc_servicemode true
sc> clearereports -y

To clear the FMA faults and error logs from Solaris:
a) Show faults in FMA
# fmadm faulty

b) For each fault listed in the 'fmadm faulty' run
# fmadm repair <uuid>

c) Clear ereports and resource cache
# cd /var/fm/fmd
# rm e* f* c*/eft/* r*/*

d) Reset the fmd serd modules
# fmadm reset cpumem-diagnosis
# fmadm reset cpumem-retire
# fmadm reset eft
# fmadm reset io-retire

Categories: Hardware, Servers Tags:


May 30th, 2012 1 comment

Here goes some differences between SCSI ISCSI FCP FCoE FCIP NFS CIFS DAS NAS SAN(excerpt from Internet):

Most storage networks use the SCSI protocol for communication between servers and disk drive devices. A mapping layer to other protocols is used to form a network: Fibre Channel Protocol (FCP), the most prominent one, is a mapping of SCSI over Fibre Channel; Fibre Channel over Ethernet (FCoE); iSCSI, mapping of SCSI over TCP/IP.


A storage area network (SAN) is a dedicated network that provides access to consolidated, block level data storage. SANs are primarily used to make storage devices, such as disk arrays, tape libraries, and optical jukeboxes, accessible to servers so that the devices appear like locally attached devices to the operating system. A storage area network (SAN) is a dedicated network that provides access to consolidated, block level data storage. SANs are primarily used to make storage devices, such as disk arrays, tape libraries, and optical jukeboxes, accessible to servers so that the devices appear like locally attached devices to the operating system. Historically, data centers first created "islands" of SCSI disk arrays as direct-attached storage (DAS), each dedicated to an application, and visible as a number of "virtual hard drives" (i.e. LUNs). Operating systems maintain their own file systems on their own dedicated, non-shared LUNs, as though they were local to themselves. If multiple systems were simply to attempt to share a LUN, these would interfere with each other and quickly corrupt the data. Any planned sharing of data on different computers within a LUN requires advanced solutions, such as SAN file systems or clustered computing. Despite such issues, SANs help to increase storage capacity utilization, since multiple servers consolidate their private storage space onto the disk arrays.Sharing storage usually simplifies storage administration and adds flexibility since cables and storage devices do not have to be physically moved to shift storage from one server to another. SANs also tend to enable more effective disaster recovery processes. A SAN could span a distant location containing a secondary storage array. This enables storage replication either implemented by disk array controllers, by server software, or by specialized SAN devices. Since IP WANs are often the least costly method of long-distance transport, the Fibre Channel over IP (FCIP) and iSCSI protocols have been developed to allow SAN extension over IP networks. The traditional physical SCSI layer could only support a few meters of distance - not nearly enough to ensure business continuance in a disaster.

More about FCIP is here (still use FC protocol)

A competing technology to FCIP is known as iFCP. It uses routing instead of tunneling to enable connectivity of Fibre Channel networks over IP.

IP SAN uses TCP as a transport mechanism for storage over Ethernet, and iSCSI encapsulates SCSI commands into TCP packets, thus enabling the transport of I/O block data over IP networks.

Network-attached storage (NAS), in contrast to SAN, uses file-based protocols such as NFS or SMB/CIFS where it is clear that the storage is remote, and computers request a portion of an abstract file rather than a disk block. The key difference between direct-attached storage (DAS) and NAS is that DAS is simply an extension to an existing server and is not necessarily networked. NAS is designed as an easy and self-contained solution for sharing files over the network.


FCoE works with standard Ethernet cards, cables and switches to handle Fibre Channel traffic at the data link layer, using Ethernet frames to encapsulate, route, and transport FC frames across an Ethernet network from one switch with Fibre Channel ports and attached devices to another, similarly equipped switch.


When an end user or application sends a request, the operating system generates the appropriate SCSI commands and data request, which then go through encapsulation and, if necessary, encryption procedures. A packet header is added before the resulting IP packets are transmitted over an Ethernet connection. When a packet is received, it is decrypted (if it was encrypted before transmission), and disassembled, separating the SCSI commands and request. The SCSI commands are sent on to the SCSI controller, and from there to the SCSI storage device. Because iSCSI is bi-directional, the protocol can also be used to return data in response to the original request.


Fibre channel is more flexible; devices can be as far as ten kilometers (about six miles) apart if optical fiber is used as the physical medium. Optical fiber is not required for shorter distances, however, because Fibre Channel also works using coaxial cable and ordinary telephone twisted pair.


Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems in 1984,[1] allowing a user on a client computer to access files over a network in a manner similar to how local storage is accessed. On the contrary, CIFS is its Windows-based counterpart used in file sharing.

Categories: Hardware, NAS, SAN, Storage Tags:

check lun0 is the first mapped LUN before on centos linux

May 26th, 2012 No comments from package sg3_utils scans all the SCSI buses on the system, updating the SCSI layer to reflect new devices on the bus. But in order for this to work, LUN0 must be the first mapped logical unit. Here's some excerpt from wiki page:

LUN 0: There is one LUN which is required to exist in every target: zero. The logical unit with LUN zero is special in that it must implement a few specific commands, most notably Report LUNs, which is how an initiator can find out all the other LUNs in the target. But LUN zero need not provide any other services, such as a storage volume.

To confirm LUN0 is the first mapped LUN, do the following check if you're using symantec storage foundation:

syminq -pdevfile |awk '!/^#/ {print $1,$4,$5}' |sort -n | uniq | while read _sym _FA _port
if [[ -z "$(symcfg -sid $_sym -fa $_FA -p $_port -addr list | awk '$NF=="000"')" ]]
print Sym $_sym, FA $_FA:$_port

If you see the following line, then it proves that lun0 is the first mapped LUN, and you can continue with the script to scan new lun:

Symmetrix ID: 000287890217

Director Device Name Attr Address
---------------------- ----------------------------- ---- --------------
Ident Symbolic Port Sym Physical VBUS TID LUN
------ -------- ---- ---- ----------------------- ---- --- ---

FA-4A 04A 0 0000 c1t600604844A56CA43d0s* VCM 0 00 000


For more infomation what Logical Unit Number(LUN) is, you may refer to:

Categories: Hardware, SAN, Storage Tags:

solaris format disk label Changing a disk label (EFI / SMI)

May 24th, 2012 No comments

I had inserted a drive into a V440 and after running devfsadm, I ran format on the disk. I was presented with the following partition table:

partition> p
Current partition table (original):
Total disk sectors available: 143358320 + 16384 (reserved sectors)

Part Tag Flag First Sector Size Last Sector
0 usr wm 34 68.36GB 143358320
1 unassigned wm 0 0 0
2 unassigned wm 0 0 0
3 unassigned wm 0 0 0
4 unassigned wm 0 0 0
5 unassigned wm 0 0 0
6 unassigned wm 0 0 0
8 reserved wm 143358321 8.00MB 143374704

This disk was used in a zfs pool and, as a result, uses an EFI label. The more familiar label that is used is an SMI label (8 slices; numbered 0-7 with slice 2 being the whole disk). The advantage of the EFI label is that it supports LUNs over 1TB in size and prevents overlapping partitions by providing a whole-disk device called cxtydz rather than using cxtydzs2.

However, I want to use this disk for UFS partitions. This means I need to get it back the SMI label for the device. Here’s how it’s done:

# format -e
partition> label
[0] SMI Label
[1] EFI Label
Specify Label type[1]: 0
Warning: This disk has an EFI label. Changing to SMI label will erase all
current partitions.
Continue? y
Auto configuration via format.dat[no]?
Auto configuration via generic SCSI-2[no]?
partition> q
format> q

Running format again will show that the SMI label was placed back onto the disk:

partition> p
Current partition table (original):
Total disk cylinders available: 14087 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks
0 root wm 0 - 25 129.19MB (26/0/0) 264576
1 swap wu 26 - 51 129.19MB (26/0/0) 264576
2 backup wu 0 - 14086 68.35GB (14087/0/0) 143349312
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 usr wm 52 - 14086 68.10GB (14035/0/0) 142820160
7 unassigned wm 0 0 (0/0/0) 0


  1. Keep in mind that changing disk labels will destroy any data on the disk.
  2. Here's more info about EFI & SMI disk label -
  3. More on UEFI and BIOS -
Categories: Hardware, Storage Tags:

what is fence or fencing device

May 16th, 2012 No comments

To understand what is fencing device, you need first know something about split-brian condition. read here for info:

Here's is something about what fence device is:

Fencing is the disconnection of a node from shared storage. Fencing cuts off I/O from shared storage, thus ensuring data integrity. A fence device is a hardware device that can be used to cut a node off from shared storage. This can be accomplished in a variety of ways: powering off the node via a remote power switch, disabling a Fibre Channel switch port, or revoking a host's SCSI 3 reservations. A fence agent is a software program that connects to a fence device in order to ask the fence device to cut off access to a node's shared storage (via powering off the node or removing access to the shared storage by other means).

To check whether a LUN has SCSI-3 Persistent Reservation, run the following:

root@doxer# symdev -sid 369 show 2040|grep SCSI
SCSI-3 Persistent Reserve: Enabled

And here's an article about I/O fencing using SCSI-3 Persistent Reservations in the configuration of SF Oracle RAC: