Archive

Archive for February, 2012

symclone manpage

February 24th, 2012 No comments

Take the following symclone command for example:

testbox root #symclone -sid 912 -f /opt/bcvbackup/test/cfile/R1_to_Backup_Clone_ALL query
Device File Name : /opt/bcvbackup/test/cfile/R1_to_Backup_Clone_ALL
Device's Symmetrix ID : 000290101912
Source Device Target Device State Copy
--------------------------------- ---------------------------- ------------ ----
Protected Modified Modified
Logical Sym Tracks Tracks Logical Sym Tracks CGDP SRC <=> TGT (%)
--------------------------------- ---------------------------- ------------ ----
N/A 0102 0 0 N/A 0123 0 X.X. Copied 100
N/A 0106 62 0 N/A 0127 0 X.X. CopyInProg 99
N/A 010A 0 0 N/A 012E 0 X.X. Copied 100
N/A 010E 0 0 N/A 0122 0 X.X. Copied 100
N/A 0111 0 0 N/A 012D 0 X.X. Copied 100
N/A 1454 0 0 N/A 1420 0 X.X. Copied 100
N/A 1453 0 0 N/A 144D 0 X.X. Copied 100
N/A 0288 0 0 N/A 0289 0 X.X. Copied 100

Total -------- -------- --------
Track(s) 62 0 0
MB(s) 3.9 0.0 0.0
Legend:

(C): X = The background copy setting is active for this pair.
. = The background copy setting is not active for this pair.
(G): X = The Target device is associated with this group.
. = The Target device is not associated with this group.
(D): X = The Clone session is a differential copy session.
. = The Clone session is not a differential copy session.
(P): X = The pre-copy operation has completed one cycle
. = The pre-copy operation has not completed one cycle

testbox root #symclone -sid 912 -f /opt/bcvbackup/test/cfile/R1_to_Backup_Clone_ALL verify -copied -i 30 -c 280
All device(s) in the list are in 'Copied' state.

testbox root #cat /opt/bcvbackup/test/cfile/R1_to_Backup_Clone_ALL
102 123
106 127
10A 12e
10E 122
111 12d
1454 1420
1453 144d
288 289

verify Verifies whether one device pair or all
device pairs in a group are in the Copied
state.
query Returns mirror state information about one
or all device pairs in a group.

-copied Verifies that the copy session(s) are in
the Copied state.
-i Specifies the repeat interval, in seconds,
to display or to acquire an exclusive lock
on the Symmetrix host database. The default
interval is 10 seconds. The minimum
interval is five seconds. When used with the
verify action, the number of seconds
specified indicates the interval of time
(in seconds) to repeat the verify command
before the verify action finds and reports
the pairs fully synchronized.

-c Specifies the number (count) of times to
display or to acquire an exclusive lock on
the Symmetrix host database. If you do not
specify this option and specify an interval
(-i), the program will loop continuously to
display or to start the mirroring operation.
-file Applies a device file to the command. The
device file contains device pairs
(SymDevnames) listing a pair per each line
(the source device first, a space, and the
target device last within each line entry).
Device files can include comment lines that
begin with the pound sign (#). A Symmetrix
ID is required for this option. -f is
synonymous with -file.

PS:

For the full symclone command help topic, you can download EMC Solutions Enabler symmetrix cli command reference pdf. You can search it by google or just download it here: EMCSolutionsEnabler symmetrix cli command reference

Categories: Hardware, Storage Tags:

zencart error WARNING An Error occurred, please refresh the page and try again.

February 23rd, 2012 No comments

I met this error today. The whole site stopped rendering but just this single sentence.

The first thing appeared in my mind was to edit php.ini display_errors parameter to On, and then turn error_reporting = E_ALL & ~E_DEPRECATED. But after bounce httpd on the host, this still didn't help, the sentence "WARNING: An Error occurred, please refresh the page and try again." was still there.

Then after some searching work, I found out that we should edit includes/application_top.php in root directory of zencart source, add the following line:

define('STRICT_ERROR_REPORTING', true);

In position like:

define('STRICT_ERROR_REPORTING', true);
if (defined('STRICT_ERROR_REPORTING') && STRICT_ERROR_REPORTING == true) {
@ini_set('display_errors', TRUE);
error_reporting(version_compare(PHP_VERSION, 5.3, '>=') ? E_ALL & ~E_DEPRECATED & ~E_NOTICE : version_compare(PHP_VERSION, 5.4, '>=') ? E_ALL & ~E_DEPRECATED & ~E_NOTICE & ~E_STRICT : E_ALL & ~E_NOTICE);
} else {
error_reporting(0);
//error_reporting(E_ALL & ~E_NOTICE);
}

That's all of it. After refreshing the sicked site for another time, another error message occurred but it really made me happy(though it was still error message):

"145 Table './shop/shop_sessions' is marked as crashed and should be repaired

in:
[select value from shop_sessions where sesskey = 'hi82nc2vrbbvf2t7f5isgvesg4' and expiry > '1329661291']"

I was familiar with this as I've handled this for my pre colleague, also about zencart. What we need do is connect to database and run:

mysql> repair table shop_sessions;

After all these(actually just two steps, first turn on debug, second repair corrupted table, php.ini steps can be omitted), the site was rendering ok.

syminq cannot find device on clariion array

February 23rd, 2012 No comments

DGC was bought by symantec and who was bought by EMC, that's why symcli works on clarrion array. symcli was well integrated and functioned on latest arrays such as DMX, VNX etc, but for the old clarrion array, there might be some problem when using symcli on it. Take the following for example.

I tried to find one disk(lun) on a clariion array using syminq -pdevfile, but failed:

testbox:root root # syminq -pdevfile|grep EMC_CLARiiON1_15

N/A

Then I checked the output of syminq -pdevfile:

testbox:root root # syminq -pdevfile
# Symm_id pdev dev dir dir_port
000287970985 /dev/rdsk/c2t5006048C4A85AA4Fd0s2 0000 16C 0
000287970985 /dev/rdsk/c3t5006048C4A85AA70d0s2 0000 1D 1

Then I ran syminq without -pdevfile parameter:

testbox:root root # syminq |grep EMC_CLARiiON1_15

Device Product Device
------------------------------------------ --------------------------- ---------------------
Name Type Vendor ID Rev Ser Num Cap (KB)
------------------------------------------ --------------------------- ---------------------
/dev/vx/rdmp/EMC_CLARiiON1_15s2 DGC RAID 5 0226 60000062 35651584

From here we can see that -pdevfile(Lists physical device names in a format for use as pdevfile entries) won't work on clarrion array.

Here's more info about the old DGC company and it's clarrion product: http://en.wikipedia.org/wiki/CLARiiON

Categories: Hardware, Storage Tags:

hostname is different between linux and solaris

February 21st, 2012 No comments

1. For linux, -a is a option for the command hostname:
-a, --alias
Display the alias name of the host (if used).
For example:
[root@linux ~]# hostname -a
linux localhost.localdomain localhost
[root@linux ~]# grep linux /etc/hosts
127.0.0.1 linux.doxer.org linux localhost.localdomain localhost

2.For solaris:

But for solaris, there's no -a option, which means, if you run hostname -a on a solaris box, you're actually setting the hostname to "-a", which in turn will cause many problem especially ldap.

Categories: IT Architecture, Kernel, Linux, Systems, Unix Tags:

Too many cron jobs and crond processes running

February 17th, 2012 No comments

I faced a problem that a ton of crond processes(cronjobs, or crontab) were running on the OS:

root@localhost# ps auxww|grep cron
vare 543 0.0 0.0 141148 5904 ? S 01:43 0:00 crond
root 4085 0.0 0.0 72944 976 ? Ss 2010 1:13 crond
vare 4522 0.0 0.0 141148 5904 ? S Feb16 0:00 crond
vare 5446 0.0 0.0 141148 5904 ? S 02:43 0:00 crond
vare 9202 0.0 0.0 141148 5904 ? S Feb16 0:00 crond
vare 10245 0.0 0.0 141148 5908 ? S 03:43 0:00 crond
vare 13989 0.0 0.0 141148 5904 ? S Feb16 0:00 crond
vare 15487 0.0 0.0 141148 5908 ? S 04:43 0:00 crond
vare 18796 0.0 0.0 141148 5904 ? S Feb16 0:00 crond
vare 20448 0.0 0.0 141148 5908 ? S 05:43 0:00 crond
root 23168 0.0 0.0 6024 596 pts/0 S+ 06:15 0:00 grep cron
vare 23474 0.0 0.0 141148 5904 ? S Feb16 0:00 crond
vare 27183 0.0 0.0 141148 5904 ? S Feb16 0:00 crond
vare 28358 0.0 0.0 141148 5904 ? S 00:43 0:00 crond
vare 32032 0.0 0.0 141148 5904 ? S Feb16 0:00 crond

.....(and more)

Now let's see what cronjobs are running by user vare:
root@localhost# crontab -u vare -l
# run the VERA Deploy routine
43 * * * * cd /share/scripts > /dev/null 2>&1 ; sleep 5 ; /share/scripts/Application/VARE/Deploy > /dev/null 2>&1

After check the script /share/bbscripts/Application/VERA/Deploy, I can see that the script is changing directory to a NFS mount point<i.e. cd /share/scripts> and then do some checks<i.e. /share/scripts/Application/VARE/Deploy>. But as there's problem during the process it's changing to NFS mount point, so the script hung there and didn't quit normally. As such, the number of crond was increasing.

Method to solve this specific problem(specific means you've to check your own script) is to first kill the hung processes of crond, then bounce autofs and then restart crond.

 

httpd installation upgrade tips

February 17th, 2012 No comments

1.different files after configure, make, make install

Firstly, read the following link to get the elementary knowledge about configure/make/make install http://tldp.org/LDP/LG/current/smith.html

Here's the comparison after/before running configure:

Here's the comparison after/before running make:

[root@test ~]# diff configure.after configure.before
10d9
< Makefile
24,26d22
< config.log
< config.nice
< config.status
40d35
< modules.c

This means after configure, 5 files, i.e. Makefile, config.log, config.nice, config.status were generated.

[root@test ~]# diff make.after configure.after
23d22
< buildmark.o
32d30
< httpd
43,44d40
< modules.lo
< modules.o

This means buildmark.o, httpd, modules.lo, modules.o were generated after make command.

 

2.config.nice(with --prefix option to put the new one to some other place)

If you installed httpd by compiling source package(i.e. through downloading/unpack/configure/make/make install), and you haven't remove the source package(especially config.nice file under the unpacked source package), then you'll have the magic when you want to upgrade httpd. Using config.nice!

Read this article http://answers.oreilly.com/topic/100-how-to-upgrade-apache-using-config-nice/ to easily upgrade httpd with all your selected options before.

Categories: IT Architecture, Linux, Systems Tags:

add oracle under vcs control howto – using veritas vxvm filesystem

February 14th, 2012 No comments

In this example, I'm gonna add oracle and oracle listener under vcs control.

haconf -makerw #make Change VCS to read-write mode

hagrp -add SG_myoracle #add service group

hagrp -modify SG_myoracle SystemList  host3 0 host4 1 host2 2 host1 3

hagrp -modify SG_myoracle AutoStartList  host1 host2 host3 host4 #List of systems on which, under specific conditions, the service group will be started with VCS (usually at system boot). For example, if a system is a member of a failover service group's AutoStartList attribute, and if the service group is not already running on another system in the cluster, the group is brought online when the system is started.

hagrp -modify SG_myoracle SourceFile "./main.cf"

hares -add dg_myoracle DiskGroup SG_myoracle #add disk group
hares -modify dg_myoracle Critical 0
hares -modify dg_myoracle DiskGroup myoracle
hares -modify dg_myoracle PanicSystemOnDGLoss 0
hares -modify dg_myoracle StartVolumes 1
hares -modify dg_myoracle StopVolumes 1
hares -modify dg_myoracle MonitorReservation 0
hares -modify dg_myoracle tempUseFence INVALID
hares -modify dg_myoracle DiskGroupType private
hares -modify dg_myoracle Enabled 1
hares -add vip_myoracle IP SG_myoracle #add vip
hares -modify vip_myoracle Critical 0
hares -local vip_myoracle Device
hares -modify vip_myoracle Device bond0 -sys host1
hares -modify vip_myoracle Device bond0 -sys host2
hares -modify vip_myoracle Device bond0 -sys host3
hares -modify vip_myoracle Device bond0 -sys host4
hares -modify vip_myoracle Address "192.168.0.7"
hares -modify vip_myoracle NetMask "255.255.255.0"
hares -modify vip_myoracle Enabled 1
hares -add mnt_myoracle Mount SG_myoracle #add mount point resource
hares -modify mnt_myoracle Critical 0
hares -modify mnt_myoracle MountPoint "/myoracle"
hares -modify mnt_myoracle BlockDevice "/dev/vx/dsk/myoracle/myoracleroot"
hares -modify mnt_myoracle FSType vxfs
hares -modify mnt_myoracle MountOpt largefiles
hares -modify mnt_myoracle FsckOpt "%-y"
hares -modify mnt_myoracle SnapUmount 0
hares -modify mnt_myoracle CkptUmount 1
hares -modify mnt_myoracle SecondLevelMonitor 0
hares -modify mnt_myoracle SecondLevelTimeout 30
hares -modify mnt_myoracle VxFSMountLock 0
hares -modify mnt_myoracle Enabled 1
hares -add mnt_myoracle-ora Mount SG_myoracle #add another mount point resource
hares -modify mnt_myoracle-ora Critical 0
hares -modify mnt_myoracle-ora MountPoint "/myoracle/ora"
hares -modify mnt_myoracle-ora BlockDevice "/dev/vx/dsk/myoracle/myoracle-ora"
hares -modify mnt_myoracle-ora FSType vxfs
hares -modify mnt_myoracle-ora MountOpt largefiles
hares -modify mnt_myoracle-ora FsckOpt "%-y"
hares -modify mnt_myoracle-ora SnapUmount 0
hares -modify mnt_myoracle-ora CkptUmount 1
hares -modify mnt_myoracle-ora SecondLevelMonitor 0
hares -modify mnt_myoracle-ora SecondLevelTimeout 30
hares -modify mnt_myoracle-ora VxFSMountLock 0
hares -modify mnt_myoracle-ora Enabled 1
hares -add lsnr_myoracle Netlsnr SG_myoracle #add listener resource
hares -modify lsnr_myoracle Critical 0
hares -modify lsnr_myoracle Owner oracle
hares -modify lsnr_myoracle Home "/ora/product/11.2.0.2a"
hares -modify lsnr_myoracle TnsAdmin "/myoracle/ora/admin/etc"
hares -modify lsnr_myoracle Listener LISTENER_myoracle
hares -modify lsnr_myoracle MonScript "./bin/Netlsnr/LsnrTest.pl"
hares -modify lsnr_myoracle AgentDebug 0
hares -modify lsnr_myoracle Enabled 1

hares -add myoracle Oracle SG_myoracle #add oracle resource
hares -modify myoracle Critical 0
hares -modify myoracle Sid myoracle
hares -modify myoracle Owner oracle
hares -modify myoracle Home "/ora/product/11.2.0.2a"
hares -modify myoracle Pfile "/myoracle/ora/admin/myoracle/pfile/initmyoracle.ora"
hares -modify myoracle StartUpOpt STARTUP
hares -modify myoracle ShutDownOpt IMMEDIATE
hares -modify myoracle AutoEndBkup 1
hares -modify myoracle MonScript "./bin/Oracle/SqlTest.pl"
hares -modify myoracle AgentDebug 0
hares -modify myoracle Enabled 1

hares -add proxy_mnic_myoracle Proxy SG_myoracle #add proxy resource
hares -modify proxy_mnic_myoracle Critical 0
hares -modify proxy_mnic_oracle TargetResName mnic
hares -modify proxy_mnic_oracle Enabled 1

#Now do the dependency

hares -link mnt_myoracle dg_myoracle

hares -link mnt_myoracle-ora mnt_myoracle

hares -link myoracle mnt_myoracle-ora

hares -link vip_myoracle proxy_mnic_oracle

hares -link lsnr_myoracle vip_myoracle

hares -link lsnr_myoracle mnt_myoracle-ora

haconf -dump -makero #Write the configuration to disk and remove the designation stale. -makero changes the VCS mode to read-only.

NB:

If your system already has other service group configured, then hacf -cftocmd is your friend. Refer to here.

Categories: Clouding, HA & HPC, IT Architecture Tags:

luxadm forcelip/display on solaris 10

February 3rd, 2012 No comments

Now let's talk luxadm forcelip/display on solaris. Pay attention to bold ones. This article will be a little long and all about cXtXdXsX, be patient. :D
testhost:root root # vxprint -ht|grep dm #check for the disks on OS's view:
dm emc333263A c1t5006048452A70F7Cd231s2 auto 65536 212055808 -
dm emc3330DA8 c1t5006048452A70F7Cd232s2 auto 65536 17609728 -
dm emc3332646 c1t5006048452A70F7Cd230s2 auto 65536 70640128 -

testhost:root root # luxadm probe #this will probe for SAN disks and it's multipath
No Network Array enclosures found in /dev/es
Found Fibre Channel device(s):
Node WWN:5006048452a70f7c Device Type:Disk device
Logical Path:/dev/rdsk/c1t5006048452A70F7Cd230s2 #the OS disk's wwn
Node WWN:5006048452a70f7c Device Type:Disk device
Logical Path:/dev/rdsk/c1t5006048452A70F7Cd231s2
Node WWN:5006048452a70f7c Device Type:Disk device
Logical Path:/dev/rdsk/c1t5006048452A70F7Cd232s2
Node WWN:5006048452a70f43 Device Type:Disk device
Logical Path:/dev/rdsk/c3t5006048452A70F43d230s2
Node WWN:5006048452a70f43 Device Type:Disk device
Logical Path:/dev/rdsk/c3t5006048452A70F43d231s2
Node WWN:5006048452a70f43 Device Type:Disk device
Logical Path:/dev/rdsk/c3t5006048452A70F43d232s2

From output of luxadm probe, we'll know that there're c1 and c3. We can prove this from
bash-3.00# /usr/sbin/cfgadm -la|grep fabric
c1 fc-fabric connected configured unknown
c3 fc-fabric connected configured unknown

testhost:root root # cfgadm -la|grep fabric
c1 fc-fabric connected configured unknown
c3 fc-fabric connected configured unknown
testhost:root root # fcinfo hba-port -l
HBA Port WWN: 210000e08b18da4f #this is the wwn for hba
OS Device Name: /dev/cfg/c1 #device name for the hba
Manufacturer: QLogic Corp.
Model: 375-3102-xx
Firmware Version: 03.03.28
FCode/BIOS Version: fcode: 1.13;
Type: N-port
State: online
Supported Speeds: 1Gb 2Gb
Current Speed: 2Gb
Node WWN: 200000e08b18da4f
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 0
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 0
Invalid CRC Count: 0
HBA Port WWN: 210000e08b18024f
OS Device Name: /dev/cfg/c3
Manufacturer: QLogic Corp.
Model: 375-3102-xx
Firmware Version: 03.03.28
FCode/BIOS Version: fcode: 1.13;
Type: N-port
State: online
Supported Speeds: 1Gb 2Gb
Current Speed: 2Gb
Node WWN: 200000e08b18024f
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 1
Loss of Signal Count: 1
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 0
Invalid CRC Count: 0

To display information on remote targets(includes the storage manufacturer, the storage product type, WWPNs, and all of the SCSI targets that have been presented to the host):
testhost:root root # fcinfo remote-port -slp 210000e08b18024f #which luns are seen by hba 210000e08b18024f?
Remote Port WWN: 5006048452a70f43
Active FC4 Types: SCSI
SCSI Target: yes
Node WWN: 5006048452a70f43
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 1
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 255
Invalid CRC Count: 0
LUN: 230
Vendor: EMC
Product: SYMMETRIX
OS Device Name: /dev/rdsk/c3t5006048452A70F43d230s2
LUN: 231
Vendor: EMC
Product: SYMMETRIX
OS Device Name: /dev/rdsk/c3t5006048452A70F43d231s2
LUN: 232
Vendor: EMC
Product: SYMMETRIX
OS Device Name: /dev/rdsk/c3t5006048452A70F43d232s2

To Display WWN data for a target device or host bus adapter on the specified fibre channel port
testhost:root root # luxadm -e port
/devices/pci@1e,600000/SUNW,qlc@3/fp@0,0:devctl CONNECTED
/devices/pci@1d,700000/SUNW,qlc@1/fp@0,0:devctl CONNECTED
testhost:root root # luxadm -e dump_map /devices/pci@1e,600000/SUNW,qlc@3/fp@0,0:devctl
Pos Port_ID Hard_Addr Port WWN Node WWN Type
0 10300 0 5006048452a70f7c 5006048452a70f7c 0x0 (Disk device)
1 15500 0 210000e08b18da4f 200000e08b18da4f 0x1f (Unknown Type,Host Bus Adapter)
Here's the multipath info:
testhost:root root # vxdmpadm getctlr all
LNAME PNAME VENDOR CTLR-ID
========================================================================================================
c1 /pci@1e,600000/SUNW,qlc@3/fp@0,0 QLogic Corp. 21:00:00:e0:8b:18:da:4f
c3 /pci@1d,700000/SUNW,qlc@1/fp@0,0 QLogic Corp. 21:00:00:e0:8b:18:02:4f
c0 /pci@1c,600000/scsi@2 - -
Here's the multipath info for a specific disk(c1t5006048452A70F7Cd231s2):

testhost:root root # vxdisk list c1t5006048452A70F7Cd231s2

Device: c1t5006048452A70F7Cd231s2
devicetag: c1t5006048452A70F7Cd231
type: auto
hostid: testhost
disk: name=emc333263A id=1277720253.8.testhost
group: name=tpdbrdbd01root-dg id=1277720279.10.testhost
info: format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags: online ready private autoconfig autoimport imported
pubpaths: block=/dev/vx/dmp/c1t5006048452A70F7Cd231s2 char=/dev/vx/rdmp/c1t5006048452A70F7Cd231s2
guid: {5da11fa8-1dd2-11b2-ab51-0003ba89d76a}
udid: EMC%5FSYMMETRIX%5F000290102333%5F33!G+000
site: -
version: 3.1
iosize: min=512 (bytes) max=2048 (blocks)
public: slice=2 offset=65792 len=212055808 disk_offset=0
private: slice=2 offset=256 len=65536 disk_offset=0
update: time=1277829173 seqno=0.11
ssb: actual_seqno=0.0
headers: 0 240
configs: count=1 len=48144
logs: count=1 len=7296
Defined regions:
config priv 000048-000239[000192]: copy=01 offset=000000 enabled
config priv 000256-048207[047952]: copy=01 offset=000192 enabled
log priv 048208-055503[007296]: copy=01 offset=000000 enabled
lockrgn priv 055504-055647[000144]: part=00 offset=000000
Multipathing information:
numpaths: 2
c1t5006048452A70F7Cd231s2 state=enabled
c3t5006048452A70F43d231s2 state=enabled

To read more info:
1.Add and Configure LUNs in Solaris 
2.man page for luxadm 
3.man page for fcinfo 
4./usr/sbin/cfgadm -la |grep fabric#solaris, check Fibre Channel controller status
fcinfo hba-port -l #check hba infomation, like Qlogic, Emulex
/usr/sbin/lpfc/lputil #Emulex HBAs are not seen in cfgadm -al output. Emulex uses "lpfc" driver. You can manipulate them via /usr/sbin/lpfc/lputil
luxadm -e port #check whether hba cards are connected, this will show physical path
luxadm –e forcelip c2 #forcelip of one entire controller
cfgadm –c configure c2::5006048452a72687 #configure lun
cfgadm –c configure c2 #configure the whole controller, it does not effect previously configured LUNs
devfsadm -c disk #scan disks in solaris
symcfg disco #update sym db on this host.
luxadm probe #check FC disks allocated to this host

ntp offset – use ntpdate to manually sync local time with ntp server

February 2nd, 2012 1 comment

Here's the outline for resolution:

1.check whether ntpd is running on the problematic host;

2.stop ntpd;

3.use ntpdate to manually sync with ntp server

4.start up ntpd.

Here's the detailed commands:

root@testhost# service ntpd stop
Shutting down ntpd: [ OK ]
root@testhost# ps -ef|grep ntp
root 9805 9542 0 01:53 pts/0 00:00:00 grep ntp
root@testhost# cat /etc/ntp.conf
tinker panic 0
server timehost1 prefer
server timehost2
server timehost3
driftfile /var/lib/ntp/drift

# Prohibit general access to this service.
restrict default ignore

# Permit the cluster node listed to synchronise with this time service.
# Do not permit those systems to modify the configuration of this service.
# Allow this host to be used as a timesource

# Permit all loopback interface access.
restrict 127.0.0.1
root@testhost# ntpdate -u timehost1
2 Feb 01:55:20 ntpdate[9824]: step time server 172.20.220.27 offset 59.998407 sec
root@testhost# service ntpd start
Starting ntpd: [ OK ]
root@testhost# ps -ef|grep ntp
ntp 9999 1 0 01:55 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
root 10017 9542 0 01:55 pts/0 00:00:00 grep ntp

PS:

  • If your host is a Virtual Machine running on Xen hypervisor, then you may find that ntpdate or ntpd will fail to synchronize time with the time server. That's because the VM will sync with hypervisor by default. So if you want to sync your VM's time with the time server, there're two methonds:

1. Synchronize time on XEN hypervisor and the VM will then sync with it automatically;

2. If you just want to sync VM's time without change XEN hypervisor's time setting, then on the VM, do the following:

echo 1 > /proc/sys/xen/independent_wallclock

After this step, you can now sync time for the VM without impacting others.

  • You can use ntpq -p to check the status of remote time servers.

[root@test-host ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
*LOCAL(0) .LOCL. 5 l 66 64 377 0.000 0.000 0.001
adcq7-rtr-9.us. .INIT. 16 u - 512 0 0.000 0.000 0.000
adcq7-rtr-10.us .INIT. 16 u - 512 0 0.000 0.000 0.000

On column "remote", we can see that it's blank space before adcq7-rtr-9.us and adcq7-rtr-10.us, this means that the two sources are discarded, failed sanity check and has never been synced to.

For more details about output of ntpq -p, you can read the explanation here.

  • For ntpd firewall issue, if you want to use ntpd, then you need to fix your network/firewall/NAT so that ntpd can have full unrestricted access to UDP port 123 in both directions. Also, you may setup one cronjob to bounce ntpd every hour so that ntp will be forced to sync with time server. echo "`tr -cd 0-6 </dev/urandom | head -c 2` */1 * * * root /sbin/service ntpd restart" >> /etc/crontab #you can of course manully add to cronjob via crontab -e -u root
  • Many people have difficulties with using RESTRICT. They want to set themselves up to be as secure as possible, so they create an extremely limited default RESTRICT line in their /etc/ntp.conf file, and then they find that they can't talk to anyone. If you're having problems with your server, in order to do proper debugging, you should turn off all RESTRICT lines in your /etc/ntp.conf file, and otherwise simplify the configuration as much as possible, so that you can make sure that the basic functions are working correctly. Once you get the basics working, try turning back on various features, one-by-one. Here some tips for ntp restrict keyword controlling ntpd access.
  • To get start with ntp, read this guide(explained server & peer & stratum. Pool is a list of servers). And here is the full document about ntp.

using timex to check whether performance degradation caused by OS or VxVM

February 1st, 2012 No comments

To check for differences between operating system times to access disks and Volume Manager times to access disks, we can know whether to check for differences between operating system times to access disks and Volume Manager times to access disks. This is because they should both be about the same since both commands force a read of disk header information. If one of those is markedly greater then it indicates a problem in that area.

#echo | timex /usr/sbin/format #to avoid prompt for user input. Use time instead of timex for linux
real          13.03

user           0.10

sys            1.49
#timex vxdisk –o alldgs list
real           2.65

user           0.00

sys            0.00

Categories: IT Architecture, Kernel, Linux, Systems, Unix Tags: