Archive for the ‘NAS’ Category

create shared iscsi LUNs from local disk on Linux

January 19th, 2016 Comments off

We can use iscsitarget to share local disks as iscsi LUNs for clients. Below are brief steps.

First, install some packages:

yum install kernel-devel iscsi-initiator-utils -y #'kernel-uek-devel' if you are using oracle linux with UEK
cd iscsitarget- #download the package from here
make install

And below are some useful tips about iscsitarget:

The iSCSI target consists of a kernel module (/lib/modules/`uname -r`/extra/iscsi/iscsi_trgt.ko)

The kernel modules will be installed in the module directory of the kernel

The daemon(/usr/sbin/ietd) and the control tool(/usr/sbin/ietadm)

/etc/init.d/iscsi-target status

/etc/iet/{ietd.conf, initiators.allow, targets.allow}

Later, we can modify IET(iSCSI Enterprise Target) config file:

vi /etc/iet/ietd.conf

        Lun 0 Path=/dev/sdb1,Type=fileio,ScsiId=0,ScsiSN=doxerorg
        Lun 1 Path=/dev/sdb2,Type=fileio,ScsiId=1,ScsiSN=doxerorg
        Lun 2 Path=/dev/sdb3,Type=fileio,ScsiId=2,ScsiSN=doxerorg
        Lun 3 Path=/dev/sdb4,Type=fileio,ScsiId=3,ScsiSN=doxerorg
        Lun 4 Path=/dev/sdb5,Type=fileio,ScsiId=4,ScsiSN=doxerorg
        Lun 5 Path=/dev/sdb6,Type=fileio,ScsiId=5,ScsiSN=doxerorg
        Lun 6 Path=/dev/sdb7,Type=fileio,ScsiId=6,ScsiSN=doxerorg
        Lun 7 Path=/dev/sdb8,Type=fileio,ScsiId=7,ScsiSN=doxerorg

chkconfig iscsi-target on
/etc/init.d/iscsi-target start

Assume the server sharing local disks for iscsi LUN is with IP, and we can do below on client hosts to scan for iscsi LUNs:

[root@client01 ~]# iscsiadm -m discovery -t st -p
Starting iscsid:                                           [  OK  ],1

[root@client01 ~]# iscsiadm -m node -T -p -l
Logging in to [iface: default, target:, portal:,3260] (multiple)
Login to [iface: default, target:, portal:,3260] successful.

[root@client01 ~]# iscsiadm -m session --rescan #or iscsiadm -m session -r SID --rescan
Rescanning session [sid: 1, target:, portal:,3260]

[root@client01 ~]# iscsiadm -m session -P 3

You can also scan for LUNs on the server with local disk shared, but you should make sure iscsi-target service boot up between network & iscsi:

mv /etc/rc3.d/S39iscsi-target /etc/rc3.d/S12iscsi-target


  1. iSCSI target port default to 3260. You can check iscsi connection info in /var/lib/iscsi/send_targets/ - <iscsi portal ip, port>, and /var/lib/iscsi/nodes/ - <target iqn>/<iscsi portal ip, port>.
  2. If there are multiple targets to log on, we can use "iscsiadm -m node --loginall=all".  "iscsiadm -m node -T -p -u" to log out.
  3. More info is here (includes windows iscsi operation), and here is about create iSCSI on Oracle ZFS appliance.


Categories: Hardware, NAS, SAN, Storage Tags:

resolved – mountd Cannot export /scratch, possibly unsupported filesystem or fsid= required

November 2nd, 2015 Comments off

Today when I tried to access one autofs exported filesystem on one client host, it reported error:

[root@server01 ~]# cd /net/client01/scratch/
-bash: cd: scratch/: No such file or directory

From server side, we can see it's exported and writable:

[root@client01 ~]# cat /etc/exports
/scratch *(rw,no_root_squash)

[root@client01 ~]# df -h /scratch
Filesystem            Size  Used Avail Use% Mounted on
                      200G  103G   98G  52% /scratch

So I tried mount manually on client side, but still error reported:

[root@server01 ~]# mount client01:/scratch /media
mount: client01:/scratch failed, reason given by server: Permission denied

Here's log on server side:

[root@client01 scratch]# tail -f /var/log/messages
Nov  2 03:41:58 client01 mountd[2195]: Caught signal 15, un-registering and exiting.
Nov  2 03:41:58 client01 kernel: nfsd: last server has exited, flushing export cache
Nov  2 03:41:59 client01 kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Nov  2 03:41:59 client01 kernel: NFSD: starting 90-second grace period
Nov  2 03:42:11 client01 mountd[16046]: authenticated mount request from for /scratch (/scratch)
Nov  2 03:42:11 client01 mountd[16046]: Cannot export /scratch, possibly unsupported filesystem or fsid= required
Nov  2 03:43:05 client01 mountd[16046]: authenticated mount request from for /scratch (/scratch)
Nov  2 03:43:05 client01 mountd[16046]: Cannot export /scratch, possibly unsupported filesystem or fsid= required
Nov  2 03:44:11 client01 mountd[16046]: authenticated mount request from for /scratch (/scratch)
Nov  2 03:44:11 client01 mountd[16046]: Cannot export /scratch, possibly unsupported filesystem or fsid= required

After some debugging, the reason was that the exported FS was already NFS filesystem on server side, and NFS FS cannot be exported again:

[root@client01 ~]# df -h /scratch
Filesystem            Size  Used Avail Use% Mounted on
                      200G  103G   98G  52% /scratch

To WA this, just do normal mount of the NFS share instead of using autofs export:

[root@server01 ~]# mount nas01:/export/generic/share_scratch /media
Categories: Hardware, IT Architecture, NAS, Storage Tags:

resolved – nfsv4 Warning: rpc.idmapd appears not to be running. All uids will be mapped to the nobody uid

August 31st, 2015 Comments off

Today when we tried to mount a nfs share as NFSv4(mount -t nfs4 testnas:/export/testshare01 /media), the following message prompted:

Warning: rpc.idmapd appears not to be running.
All uids will be mapped to the nobody uid.

And I had a check of file permissions under the mount point, they were owned by nobody as indicated:

[root@testvm~]# ls -l /u01/local
total 8
drwxr-xr-x 2 nobody nobody 2 Dec 18 2014 SMKIT
drwxr-xr-x 4 nobody nobody 4 Dec 19 2014 ServiceManager
drwxr-xr-x 4 nobody nobody 4 Mar 31 08:47 ServiceManager.15.1.5
drwxr-xr-x 4 nobody nobody 4 May 13 06:55 ServiceManager.15.1.6

However, as I checked, rpcidmapd was running:

[root@testvm ~]# /etc/init.d/rpcidmapd status
rpc.idmapd (pid 11263) is running...

After some checking, I found it's caused by low nfs version and some missed nfs4 packages on the OEL5 boxes. You can do below to fix this:

yum -y update nfs-utils nfs-utils-lib nfs-utils-lib-devel sblim-cmpi-nfsv4 nfs4-acl-tools
/etc/init.d/nfs restart
/etc/init.d/rpcidmapd restart

If you are using Oracle SUN ZFS appliance, then please make sure to set on ZFS side anonymous user mapping to "root" and also Custom NFSv4 identity domain to the one in your env(e.g. to avoid NFS clients nobody owner issue.

resoved – nfs share chown: changing ownership of ‘blahblah’: Invalid argument

October 28th, 2014 Comments off

Today I encountered the following error when trying to change ownership of some files:

[root@test webdav]# chown -R apache:apache ./bigfiles/
chown: changing ownership of `./bigfiles/opcmessaging': Invalid argument
chown: changing ownership of `./bigfiles/': Invalid argument

This host is running CentOS 6.2, and in this version of OS, nfs4 is by default used:

[root@test webdav]# cat /proc/mounts |grep u01 /u01 nfs4 rw,relatime,vers=4,rsize=32768,wsize=32768

However, the NFS server does not support NFSv4 well, so I modified the share to use NFSv3 by force: /u01 nfs rsize=32768,wsize=32768,hard,nolock,timeo=14,noacl,intr,mountvers=3,nfsvers=3

After umount/mount, the issue was resolved!


If the NAS server is SUN ZFS appliance, then the following should be noted, or the issue may occur even on CentOS/Redhat linux 5.x:



Categories: Hardware, IT Architecture, Linux, NAS, Storage, Systems Tags:

Sun ZFS storage stuck due to incorrect LACP configuration

October 24th, 2014 Comments off

Today we met issue with Sun ZFS storage 7320. NFS shares provisioned from the ZFS appliance were not responding to requests, even a "df -h" will stuck there for a long long time. And when we checked from ZFS storage side, we found the following statistics:



And during our checking for the traffic source, the ZFS appliance backed to normal by itself:



As we just configured LACP on this ZFS appliance the day before, so we doubted the issue was caused by incorrect network configuration. Here's the network config:


For "Policy", we should match with switch setup to even balance incoming/outgoing data flow.  Otherwise, we might experience uneven load balance. Our switch was set to L3, so L3 should be ok. We'll get better load spreading if the policy is L3+4 if the switch supports it.  With L3, all connections from any one IP will only use a single member of the aggregation.  With L3+4, it will load spread by UDP or TCP port too. More is here.

For "Mode", it should be set according to switch. If the switch is "passive" mode then server/storage needs to be on "active" mode, and vice versa.

For "Timer", it's regarding how often to check LACP status.

After checking switch setting, we found that the switch is in "Active" mode, and as ZFS appliance was also on "Active" mode, so that's the culprit. So we changed the setting to the following:

2-right-configurationAfter this, we had some observation and ZFS is now operating normally.


You should also have a check of disk operations, if there are timeout errors on the disks, then you should try replace them. Sometimes, a single disk may hang the SCSI bus.  Ideally, the system should fail the disk but it didn't happen. You should manually failed the disk to resolve the issue.

The ZFS Storage Appliance core analysis (previous note) confirms that the disk was the cause of the issue.

It was hanging up communication on the SCSI bus but once it was removed the issue was resolved.

It is uncommon for a single disk to hang up the bus, however; since the disks share the SCSI path (each drive does not have its own dedicated cabling and controller) it is sometimes seen.

You can check the ZFS appliance uptime by running "version show" in the console.

zfs-test:configuration> version show
Appliance Name: zfs-test
Appliance Product: Sun ZFS Storage 7320
Appliance Type: Sun ZFS Storage 7320
Appliance Version: 2013.,1-1.1
First Installed: Sun Jul 22 2012 10:02:24 GMT+0000 (UTC)
Last Updated: Sun Oct 26 2014 22:11:03 GMT+0000 (UTC)
Last Booted: Wed Dec 10 2014 10:03:08 GMT+0000 (UTC)
Appliance Serial Number: d043d335-ae15-4350-ca35-b05ba2749c94
Chassis Serial Number: 1225FMM0GE
Software Part Number: Oracle 000-0000-00
Vendor Product ID: urn:uuid:418bff40-b518-11de-9e65-080020a9ed93
Browser Name: aksh 1.0
Browser Details: aksh
HTTP Server: Apache/2.2.24 (Unix)
SSL Version: OpenSSL 1.0.0k 5 Feb 2013
Appliance Kit: ak/SUNW,maguro_plus@2013.,1-1.1
Operating System: SunOS 5.11 ak/generic@2013.,1-1.1 64-bit
BIOS: American Megatrends Inc. 08080102 05/23/2011
Service Processor:

Categories: Hardware, NAS, Storage Tags:

resolved – fsinfo ERROR: Stale NFS file handle POST

May 15th, 2014 Comments off

Today when I tried mount NFS share from one NFS server, it timeout with "mount.nfs: Connection timed out".

I tried to search something in /var/log/messages but no useful info there was found. So I used tcpdump on NFS client:

[root@dcs-hm1-qa132 ~]# tcpdump -nn -vvv host #server is, client is
23:49:11.598407 IP (tos 0x0, ttl 64, id 26179, offset 0, flags [DF], proto TCP (6), length 96) > 40 null
23:49:11.598741 IP (tos 0x0, ttl 62, id 61186, offset 0, flags [DF], proto TCP (6), length 80) > reply ok 24 null
23:49:11.598812 IP (tos 0x0, ttl 64, id 26180, offset 0, flags [DF], proto TCP (6), length 148) > 92 fsinfo fh Unknown/0100010000000000000000000000000000000000000000000000000000000000
23:49:11.599176 IP (tos 0x0, ttl 62, id 61187, offset 0, flags [DF], proto TCP (6), length 88) > reply ok 32 fsinfo ERROR: Stale NFS file handle POST:
23:49:11.599254 IP (tos 0x0, ttl 64, id 26181, offset 0, flags [DF], proto TCP (6), length 148) > 92 fsinfo fh Unknown/010001000000000000002FFF000002580000012C0007B0C00000000A00000000
23:49:11.599627 IP (tos 0x0, ttl 62, id 61188, offset 0, flags [DF], proto TCP (6), length 88) > reply ok 32 fsinfo ERROR: Stale NFS file handle POST:

The reason of "ERROR: Stale NFS file handle POST" may caused by the following reasons:

1.The NFS server is no longer available
2.Something in the network is blocking
3.In a cluster during failover of NFS resource the major & minor numbers on the secondary server taking over is different from that of the primary.

To resolve the issue, you can try bounce NFS service on NFS server using /etc/init.d/nfs restart.

Categories: Hardware, NAS, Storage Tags:

“Include snapshots” made NFS shares from ZFS appliance shrinking

January 17th, 2014 Comments off

Today I met one weird issue when checking one NFS share mounted from ZFS appliance. The NFS filesystem mounted on client was shrinking when I removed files as the space on that filesystem was getting low. But what made me confused was that the filesystem's size would getting lower! Shouldn't the free space getting larger and the size keep unchanged?

After some debugging, I found that this was caused by ZFS appliance shares' "Include snapshots". When I uncheck "Include snapshots", the issue was gone!


Categories: Hardware, NAS, Storage Tags:

resolved – ESXi Failed to lock the file

January 13th, 2014 Comments off

When I was power on one VM in ESXi, one error occurred:

An error was received from the ESX host while powering on VM doxer-test.
Cannot open the disk '/vmfs/volumes/4726d591-9c3bdf6c/doxer-test/doxer-test_1.vmdk' or one of the snapshot disks it depends on.
Failed to lock the file

And also:

unable to access file since it is locked

This apparently was caused by some storage issue. I firstly googled and found most of the posts were telling stories about ESXi working mechanism, and I tried some of them but with no luck.

Then I thought of that our storage datastore was using NFS/ZFS, and NFS has file lock issue as you know. So I mount the nfs share which datastore was using and removed one file named lck-c30d000000000000. After this, the VM booted up successfully! (or we can log on ESXi host, and remove lock file there also)

resolved – differences between zfs ARC L2ARC ZIL

January 31st, 2013 Comments off
  • ARC

zfs ARC(adaptive replacement cache) is a very fast cache located in the server’s memory.

For example, our ZFS server with 12GB of RAM has 11GB dedicated to ARC, which means our ZFS server will be able to cache 11GB of the most accessed data. Any read requests for data in the cache can be served directly from the ARC memory cache instead of hitting the much slower hard drives. This creates a noticeable performance boost for data that is accessed frequently.

  • L2ARC

As a general rule, you want to install as much RAM into the server as you can to make the ARC as big as possible. At some point, adding more memory is just cost prohibitive. That is where the L2ARC becomes important. The L2ARC is the second level adaptive replacement cache. The L2ARC is often called “cache drives” in the ZFS systems.

L2ARC is a new layer between Disk and the cache (ARC) in main memory for ZFS. It uses dedicated storage devices to hold cached data. The main role of this cache is to boost the performance of random read workloads. The intended L2ARC devices include 10K/15K RPM disks like short-stroked disks, solid state disks (SSD), and other media with substantially faster read latency than disk.

  • ZIL

ZIL(ZFS Intent Log) exists for performance improvement on synchronous writes. Synchronous write is very slow than asynchronous write, but it's more stable. Essentially, the intent log of a file system is nothing more than an insurance against power failures, a to-do list if you will, that keeps track of the stuff that needs to be updated on disk, even if the power fails (or something else happens that prevents the system from updating its disks).

To get better performance, use separated disks(SSD) for ZIL, such as zpool add pool log c2d0.

Now I'm giving you an true example about zfs ZIL/L2ARC/ARC on SUN ZFS 7320 head:

test-zfs# zpool iostat -v exalogic
capacity operations bandwidth
pool alloc free read write read write
------------------------- ----- ----- ----- ----- ----- -----
exalogic 6.78T 17.7T 53 1.56K 991K 25.1M
mirror 772G 1.96T 6 133 111K 2.07M
c0t5000CCA01A5FDCACd0 - - 3 36 57.6K 2.07M #these are the physical disks
c0t5000CCA01A6F5CF4d0 - - 2 35 57.7K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A6F5D00d0 - - 2 36 56.2K 2.07M
c0t5000CCA01A6F64F4d0 - - 2 35 57.3K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A76A7B8d0 - - 2 36 56.3K 2.07M
c0t5000CCA01A746CCCd0 - - 2 36 56.8K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A749A88d0 - - 2 35 56.7K 2.07M
c0t5000CCA01A759E90d0 - - 2 35 56.1K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A767FDCd0 - - 2 35 56.1K 2.07M
c0t5000CCA01A782A40d0 - - 2 35 57.1K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A782D10d0 - - 2 35 57.2K 2.07M
c0t5000CCA01A7465F8d0 - - 2 35 56.3K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A7597FCd0 - - 2 35 57.6K 2.07M
c0t5000CCA01A7828F4d0 - - 2 35 56.2K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A7829ACd0 - - 2 35 57.1K 2.07M
c0t5000CCA01A78278Cd0 - - 2 35 57.4K 2.07M
mirror 772G 1.96T 6 133 111K 2.07M
c0t5000CCA01A736000d0 - - 3 35 57.3K 2.07M
c0t5000CCA01A738000d0 - - 2 35 57.3K 2.07M
c0t5000A72030061B82d0 224M 67.8G 0 98 1 1.62M #ZIL(SSD write cache, ZFS Intent Log)
c0t5000A72030061C70d0 224M 67.8G 0 98 1 1.62M
c0t5000A72030062135d0 223M 67.8G 0 98 1 1.62M
c0t5000A72030062146d0 224M 67.8G 0 98 1 1.62M
cache - - - - - -
c2t2d0 334G 143G 15 6 217K 652K #L2ARC(SSD cache drives)
c2t3d0 332G 145G 15 6 215K 649K
c2t4d0 333G 144G 11 6 169K 651K
c2t5d0 333G 144G 13 6 192K 650K
c2t2d0 - - 0 0 0 0
c2t3d0 - - 0 0 0 0
c2t4d0 - - 0 0 0 0
c2t5d0 - - 0 0 0 0

And as for ARC:

test-zfs:> status memory show
Cache 63.4G bytes #ARC
Unused 17.3G bytes
Mgmt 561M bytes
Other 491M bytes
Kernel 14.3G bytes

sun zfs firmware upgrade howto

January 29th, 2013 Comments off

This article is going to talk about upgrading firmware for sun zfs 7320(you may find other series of sun zfs heads works too):

PS: Better not use failback, you should always log on the standby ZFS node and do a takeover. This is rule of thumb.PS:

1. On configuration -> cluster, you can see the shared resources(transferable) along with resources owned by current node(locked resource, , such as MGMT interface). And on configuration -> network, only the config of shared network resources(transferable) along with network resources owned by current node(locked resource, such as MGMT interface).

2. Additional Oracle ZFS Storage Appliance related software is available for download at the Oracle Technology Network.

3. Takeover can occur at any time; as discussed above, takeover is attempted whenever peer failure is detected. It can also be triggered manually using the cluster configuration CLI or BUI. This is useful for testing purposes as well as to perform rolling software upgrades (upgrades in which one head is upgraded while the other provides service running the older software, then the second head is upgraded once the new software is validated). Finally, takeover will occur when a head boots and detects that its peer is absent. This allows service to resume normally when one head has failed permanently or when both heads have temporarily lost power.

Failback never occurs automatically. When a failed head is repaired and booted, it will rejoin the cluster (resynchronizing its view of all resources, their properties, and their ownership) and proceed to wait for an administrator to perform a failback operation. Until then, the original surviving head will continue to provide all services. This allows for a full investigation of the problem that originally triggered the takeover, validation of a new software revision, or other administrative tasks prior to the head returning to production service. Because failback is disruptive to clients, it should be scheduled according to business-specific needs and processes. There is one exception: Suppose that head A has failed and head B has taken over. When head A rejoins the cluster, it becomes eligible to take over if it detects that head B is absent or has failed. The principle is that it is always better to provide service than not, even if there has not yet been an opportunity to investigate the original problem. So while failback to a previously-failed head will never occur automatically, it may still perform takeover at any time.

In active-active mode, when take over happens, all resources, include the ones on peer node will be transferred. When the failed node comes back to life, you can then issue the failback which will give back resources assigned to it.


Categories: Hardware, NAS, SAN, Storage Tags:

perl script for monitoring sun zfs memory usage

January 16th, 2013 Comments off

On zfs's aksh, I can check memory usage with the following:

test-zfs:> status memory show
Cache 719M bytes
Unused 15.0G bytes
Mgmt 210M bytes
Other 332M bytes
Kernel 7.79G bytes

So now I want to collect this memory usae information automatically for SNMP's use. Here's the steps:

cpan> o conf prerequisites_policy follow
cpan> o conf commit

Since the host is using proxy to get on the internet, so in /etc/wgetrc:

http_proxy =
ftp_proxy =
use_proxy = on

Now install the Net::SSH::Perl perl module:

PERL_MM_USE_DEFAULT=1 perl -MCPAN -e 'install Net::SSH::Perl'

And to confirm that Net::SSH::Perl was installed, run the following command:

perl -e 'use Net::SSH::Perl' #no output is good, as it means the package was installed successfully

Now here goes the perl script to get the memory usage of sun zfs head:

[root@test-centos ~]# cat /var/tmp/mrtg/
use strict;
use warnings;
use Net::SSH::Perl;
my $host = 'test-zfs';
my $user = 'root';
my $password = 'password';

my $ssh = Net::SSH::Perl->new($host);
my ($stdout,$stderr,$exit) = $ssh->cmd("status memory show");
print "ErrorCode:$exit\n";
print "ErrorMsg:$stderr";
} else {
my @std_arr = split(/\n/, $stdout);
shift @std_arr;
foreach(@std_arr) {
if ($_ =~ /.+\b\s+(.+)M\sbytes/){
elsif($_ =~ /.+\b\s+(.+)G\sbytes/){
foreach(@std_arr) {
print $_."\n";
exit $exit;

If you get the following error messages during installation of a perl module:

[root@test-centos ~]# perl -MCPAN -e 'install SOAP::Lite'
CPAN: Storable loaded ok
CPAN: LWP::UserAgent loaded ok
Fetching with LWP:
LWP failed with code[500] message[LWP::Protocol::MyFTP: connect: Connection timed out]
Fetching with Net::FTP:

Trying with "/usr/bin/links -source" to get
ELinks: Connection timed out

Then you may have a check of whether you're using proxy to get on the internet(run cpan > o conf init to re-configure cpan; later you should set /etc/wgetrc: http_proxy, ftp_proxy, use_proxy).


zfs iops on nfs iscsi disk

January 5th, 2013 Comments off

On zfs storage 7000 series BUI, you may found the following statistic:

This may seem quite weird as you can see that, NFSv3(3052) + iSCSI(1021) is larger than Disk(1583). As iops for protocal NFSv3/iSCSI finally goes to Disk, so why iops for the two protocals is larger than Disk iops?

Here's the reason:

Disk operations for NFSv3 and iSCSI are logical operations. These logical operations are then combined/optimized by sun zfs storage and then finally go to physical Disk operations.


1.When doing continuous access to disks(like VOD), disk throughputs will become the bottleneck of performance rather than IOPS. In constract, IOPS limits disk performance when random access is going on disks.

2.For NAS performance analytic, here are two good articles(in Chinese)

3.You may also wonder why Disk iops can be as high as 1583. As this number is the sum of all disk controllers of the zfs storage system. Here's some ballpark numbers for HDD iops:


Categories: Hardware, NAS, SAN, Storage Tags:

zfs shared lun stoage set up for oracle RAC – using iscsiadm

January 4th, 2013 Comments off
  • create iSCSI Target Group

Open zfs BUI, navigate through "Configuration" -> "SAN" -> "iSCSI Targets". Then create new iSCSI Target by clicking plus sign. Give it an alias, and then select the Network interface(may be bond or LACP) you want to use(check it from "Configuration" -> "Network" and "Configuration" -> "Cluster"). After creating this iSCSI target, drag the newly created target to the right side "iSCSI Target Groups" to create one iSCSI Target Group. You can give that iSCSI target group an name too. Note down the iSCSI Target Group's iqn, this is important for later operations.(Network interfaces:use NAS interface. You can select multiple interfaces)

  • create iSCSI Initiator Group

Before going on the next step, we need first get the iSCSI initiator IQN for each hosts we want LUN allocated. On each host, execute the following command to get the iqn for iscsi on linux platform(You can edit this file before read it, for example, make iqn name ended with` hostname` so it's easier for later operations on LUN<do a /etc/init.d/iscsi restart after your modification to initiatorname.iscsi>. You should first ensure package iscsi-initiator-utils is installed):

[root@test-host ~]# cat /etc/iscsi/initiatorname.iscsi
InitiatorName=<your host's iqn name>

Now go back to zfs BUI, navigate through "Configuration" -> "SAN" -> "Initiators". On the left side, click "iSCSI Initiators", then click plus sign on it. Enter IQN you get from previos step and give it an name.(do this for each host you want iSCSI LUN allocated). After this, drag the newly created iSCSI initiator(s) from left side to form new iSCSI Initiator Groups on the right side(drag two items from the left to the same item on the right to form an group).

  • create shared LUNs for iSCSI Initiator Group

After this, we need now create LUNs for iSCSI Initiator Group(so that shared lun can be allocated, for example, oracle RAC need shared storage). Click on diskette sign on the just created iSCSI Initiator Group,select the project you want the LUN allocated from, give it a name, and assign the volume size. Select the right target group you created before(you can also create a new one e.g. RAC in shares).

PS:You can also now go to "shares" -> "Luns" and create Lun(s) using the target group you created and use default Initiator group. Note that one LUN need one iSCSI target. So you should create more iSCSI targets and add them to iSCSI target group if you want more LUNs.

  • scan shared LUNs from hosts

Now we're going to operate on linux hosts. On each host you want iSCSI LUN allocated, do the following steps:

iscsiadm -m discovery -t st -p <ip address of your zfs storage>(use cluster's ip if there's zfs cluster) #Discover available targets from a discovery portal
iscsiadm -m node -T <variable, iSCSI Target Group iqn> -p <ip address of your zfs storage> -l #Log into a specific target. Or use output from above command($1 is --portal, $3 is --targetname, -l is --login). Use -u to log out for a specified record.

If there's error when handling with multipath devices (more info here <udevadm>):

iscsiadm: default: 1 session requested, but 1 already present.
iscsiadm: Could not log into all portals


#iscsiadm -m session --rescan

#iscsiadm -m node --loginall=all

#iscsiadm -m discovery -t sendtargets -p <ZFS IP>

#iscsiadm -m node --loginall=all

iscsiadm -m session --rescan

#iscsiadm -m node --loginall=all

After these steps, you host(s) should now see the newly allocated iSCSI LUN(s), you can run fdisk -l to confirm.


  • To remove discovered targets on one specific portal(or ZFS appliance):

iscsiadm -m discovery -t st -p

ls -l /var/lib/iscsi/nodes/

iscsiadm -m node -T <iscsi target group iqn> -p -u

iscsiadm -m discovery -o show

iscsiadm -m node -o delete -p

ls -l /var/lib/iscsi/nodes/

  • Here's more about iscsiadm:

iscsiadm -m session -P 1 #Display list of all current sessions logged in, -P for level(1 to 3). From here, we can know which target are the local disks mapped to(e.g. /dev/sde is mapped to target, then we can know the NAS ZFS appliance name. On ZFS appliance, after got the target group name, we can check which LUNs use that target group, thus we know the mapping between local iscsi disk and ZFS LUN)

iscsiadm -m session --rescan #rescan all sessions

iscsiadm -m session -r SID --rescan #rescan a specific session #SID can be got from iscsiadm -m session -P 1

iscsiadm -m node -T targetname -p ipaddress -u #Log out of a specific target

If you want to remove specific iSCSI LUN from system, then do the following:

cd /sys/class/iscsi_session/session<SID>/device/target<scsi host number>:0:0/<scsi host number>:0:0:<Lun number>
echo 1 > delete

iscsiadm -m node -T targetname -p ipaddress #Display information about a target
iscsiadm -m node -s -T targetname -p ipaddress #Display statistics about a target

iscsiadm -m discovery -o show #View iSCSI database regarding discovery
iscsiadm -m node -o show #View iSCSI database regarding targets to log into
iscsiadm -m session -o show #View iSCSI database regarding sessions logged into
multipath -ll #View if the targets are multipathed (MPIO)

You can find more information about iscsi disk in /sys/class/{scsi_device, scsi_disk, scsi_generic,scsi_host} and /sys/block/ after get the info from iscsiadm -m session -P 3.

If you want to set snapshot on all shares, then try below:

cd /
shares select project1 select share1 snapshots automatic create
set frequency=day hour=21 minute=49 keep=7
commit -- there's no extra empty lines here
cd /
shares select project1 select share1 snapshots automatic create
set frequency=day hour=21 minute=49 keep=7

Here's the CMD equivalent of this article(aksh of oracle ZFS appliance):

shares project rac186187
set mountpoint=/export/rac186187
set quota=1T
set readonly=true
set default_permissions=777

set default_user=root

set default_group=root
set sharenfs="sec=sys,,rw=@,root=@"
#get reservation
#get pool
#get snapdir #snapdir = visible
#get default_group #default_group = root
#get default_user #default_user = root
#get exported #exported = true

configuration net interfaces list
configuration net datalinks list

configuration san iscsi targets create rac186187
set alias=rac186187
set interfaces=aggr93001

configuration san iscsi targets list
configuration san iscsi targets groups create rac186187
set name=rac186187

[cat /etc/iscsi/initiatorname.iscsi] ->,
configuration san iscsi initiators create testhost186
set alias=testhost186

configuration san iscsi initiators create testhost187
set alias=testhost187

configuration san iscsi initiators list
configuration san iscsi initiators groups create rac186187
set name=rac186187

shares select rac186187 #project must set readonly=false
lun rac186187
set volsize=500G
set targetgroup=rac186187
set initiatorgroup=rac186187

And if you want to create one share, here's the way:

shares select <project name> #to show current properties, run "show"
filesystem <new share name>
set quota=100G
set quota_snap=false
set reservation=50G
set reservation_snap=false
set root_permissions=777
set root_user=root
set root_group=root
cd /

To modify role permissions:

zfs02:configuration roles devroot authorizations> create

zfs02:configuration roles devroot auth (uncommitted)> set scope= <tab>
ad appliance dataset keystore net schema stmf update workflow
alert cluster hardware nas role stat svc user worksheet

zfs02:configuration roles devroot auth (uncommitted)> set scope=alert
scope = alert

zfs02:configuration roles devroot auth (uncommitted)> show
scope = alert
allow_configure = false

zfs02:configuration roles devroot auth (uncommitted)> set allow_configure=true
allow_configure = true (uncommitted)

zfs02:configuration roles devroot auth (uncommitted)> show
scope = alert
allow_configure = true (uncommitted)

zfs02:configuration roles devroot auth (uncommitted)> commit

To search for logs, you can try below:

ssh root@testzfs|tee log.txt

testzfs:> maintenance logs select alert list -a
testzfs:> maintenance logs select audit list -a
testzfs:> maintenance logs select fltlog list -a
testzfs:> maintenance logs select scrk list -a
testzfs:> maintenance logs select system list -a

Then grep <entry> log.txt


Suppose we know that there is one session logged in to target, then how can we add LUNs or change the size of existed LUN on ZFS?

First, to change the size of existing LUN:

[root@testhost1 ~]# iscsiadm -m session
tcp: [2],2

Log on ZFS UI, go to Configuration, SAN, iSCSI, Targets, search for "" which is the target name, then you'll find the target and target group it belongs to. Note down the target group name, e.g. RAC01, then go to Shares, LUNs, click on the LUN and change the size as needed.

To add new LUN to the host:

First, find the iscsi initiator name on this host testhost1-p:

[root@testhost1 ~]# cat /etc/iscsi/initiatorname.iscsi

Log on ZFS UI, go to Configuration, SAN, iSCSI, Initiators, search "", you'll find the initiator and initiator group. From here you can click the diskette icon and add new LUN. Make sure to select the right target group you got from previous step.

Categories: Hardware, NAS, Storage Tags:

how to turn on hba flags connected to EMC arrays

October 3rd, 2012 Comments off

As per EMC recommendation following flags should be enabled for Vmware ESX hosts, if not there will be performance issues:


Here's the commands that'll do the trick:

sudo symmask -sid <sid> set hba_flags on C,SPC2,SC3 -enable -wwn <port wwn> -dir <dir number> -p <port number>

Categories: Hardware, NAS, SAN, Storage Tags:


May 30th, 2012 1 comment

Here goes some differences between SCSI ISCSI FCP FCoE FCIP NFS CIFS DAS NAS SAN(excerpt from Internet):

Most storage networks use the SCSI protocol for communication between servers and disk drive devices. A mapping layer to other protocols is used to form a network: Fibre Channel Protocol (FCP), the most prominent one, is a mapping of SCSI over Fibre Channel; Fibre Channel over Ethernet (FCoE); iSCSI, mapping of SCSI over TCP/IP.


A storage area network (SAN) is a dedicated network that provides access to consolidated, block level data storage. SANs are primarily used to make storage devices, such as disk arrays, tape libraries, and optical jukeboxes, accessible to servers so that the devices appear like locally attached devices to the operating system. A storage area network (SAN) is a dedicated network that provides access to consolidated, block level data storage. SANs are primarily used to make storage devices, such as disk arrays, tape libraries, and optical jukeboxes, accessible to servers so that the devices appear like locally attached devices to the operating system. Historically, data centers first created "islands" of SCSI disk arrays as direct-attached storage (DAS), each dedicated to an application, and visible as a number of "virtual hard drives" (i.e. LUNs). Operating systems maintain their own file systems on their own dedicated, non-shared LUNs, as though they were local to themselves. If multiple systems were simply to attempt to share a LUN, these would interfere with each other and quickly corrupt the data. Any planned sharing of data on different computers within a LUN requires advanced solutions, such as SAN file systems or clustered computing. Despite such issues, SANs help to increase storage capacity utilization, since multiple servers consolidate their private storage space onto the disk arrays.Sharing storage usually simplifies storage administration and adds flexibility since cables and storage devices do not have to be physically moved to shift storage from one server to another. SANs also tend to enable more effective disaster recovery processes. A SAN could span a distant location containing a secondary storage array. This enables storage replication either implemented by disk array controllers, by server software, or by specialized SAN devices. Since IP WANs are often the least costly method of long-distance transport, the Fibre Channel over IP (FCIP) and iSCSI protocols have been developed to allow SAN extension over IP networks. The traditional physical SCSI layer could only support a few meters of distance - not nearly enough to ensure business continuance in a disaster.

More about FCIP is here (still use FC protocol)

A competing technology to FCIP is known as iFCP. It uses routing instead of tunneling to enable connectivity of Fibre Channel networks over IP.

IP SAN uses TCP as a transport mechanism for storage over Ethernet, and iSCSI encapsulates SCSI commands into TCP packets, thus enabling the transport of I/O block data over IP networks.

Network-attached storage (NAS), in contrast to SAN, uses file-based protocols such as NFS or SMB/CIFS where it is clear that the storage is remote, and computers request a portion of an abstract file rather than a disk block. The key difference between direct-attached storage (DAS) and NAS is that DAS is simply an extension to an existing server and is not necessarily networked. NAS is designed as an easy and self-contained solution for sharing files over the network.


FCoE works with standard Ethernet cards, cables and switches to handle Fibre Channel traffic at the data link layer, using Ethernet frames to encapsulate, route, and transport FC frames across an Ethernet network from one switch with Fibre Channel ports and attached devices to another, similarly equipped switch.


When an end user or application sends a request, the operating system generates the appropriate SCSI commands and data request, which then go through encapsulation and, if necessary, encryption procedures. A packet header is added before the resulting IP packets are transmitted over an Ethernet connection. When a packet is received, it is decrypted (if it was encrypted before transmission), and disassembled, separating the SCSI commands and request. The SCSI commands are sent on to the SCSI controller, and from there to the SCSI storage device. Because iSCSI is bi-directional, the protocol can also be used to return data in response to the original request.


Fibre channel is more flexible; devices can be as far as ten kilometers (about six miles) apart if optical fiber is used as the physical medium. Optical fiber is not required for shorter distances, however, because Fibre Channel also works using coaxial cable and ordinary telephone twisted pair.


Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems in 1984,[1] allowing a user on a client computer to access files over a network in a manner similar to how local storage is accessed. On the contrary, CIFS is its Windows-based counterpart used in file sharing.

Categories: Hardware, NAS, SAN, Storage Tags:

what is fence or fencing device

May 16th, 2012 Comments off

To understand what is fencing device, you need first know something about split-brian condition. read here for info:

Here's is something about what fence device is:

Fencing is the disconnection of a node from shared storage. Fencing cuts off I/O from shared storage, thus ensuring data integrity. A fence device is a hardware device that can be used to cut a node off from shared storage. This can be accomplished in a variety of ways: powering off the node via a remote power switch, disabling a Fibre Channel switch port, or revoking a host's SCSI 3 reservations. A fence agent is a software program that connects to a fence device in order to ask the fence device to cut off access to a node's shared storage (via powering off the node or removing access to the shared storage by other means).

To check whether a LUN has SCSI-3 Persistent Reservation, run the following:

root@doxer# symdev -sid 369 show 2040|grep SCSI
SCSI-3 Persistent Reserve: Enabled

And here's an article about I/O fencing using SCSI-3 Persistent Reservations in the configuration of SF Oracle RAC: