Archive

Posts Tagged ‘vcs’

veritas vcs 5.1 on solaris 5.10 changes of restarting procedure

July 26th, 2012 No comments

For 5.1 VCS on solaris 10, start/stop of VCS are no longer controlled by /etc/rc*.d/S* scripts.
They are under SMF control. Plus, some of the /etc/default/gab,llt,vcs,vxfen etc.. there are lines which needs to be set to 1 if VCS is setup manually.
For example:

VCS_START=1
VCS_STOP=1

More interestingly with VCS one node cluster, the SMF resource for vcs is not system/vcs:default, It is system/vcs-onenode:default.

Categories: Clouding, HA, HA & HPC, IT Architecture Tags:

vcs commands hang consistently

June 8th, 2012 No comments

Today we encounter an issue that veritas vcs commands hang in a consistent manner. The commands like haconf -dump -makero just stuck there for a long time that we have to terminate it from console. When using truss(on solaris) or strace(on linux) to trace system calls and signals, we found the following output:

test# truss haconf -dump -makero

execve("/opt/VRTSvcs/bin/haconf", 0xFFBEF21C, 0xFFBEF22C) argc = 3
resolvepath("/usr/lib/ld.so.1", "/usr/lib/ld.so.1", 1023) = 16
open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT

open("//.vcspwd", O_RDONLY) Err#2 ENOENT
getuid() = 0 [0]
getuid() = 0 [0]
so_socket(1, 2, 0, "", 1) = 4
fcntl(4, F_GETFD, 0x00000004) = 0
fcntl(4, F_SETFD, 0x00000001) = 0
connect(4, 0xFFBE7E1E, 110, 1) = 0
fstat64(4, 0xFFBE7AF8) = 0
getsockopt(4, 65535, 8192, 0xFFBE7BF8, 0xFFBE7BF4, 0) = 0
setsockopt(4, 65535, 8192, 0xFFBE7BF8, 4, 0) = 0
fcntl(4, F_SETFL, 0x00000084) = 0
brk(0x000F6F28) = 0
brk(0x000F8F28) = 0
poll(0xFFBE8A60, 1, 0) = 1
send(4, " G\0\0\0 $\0\0\t15\0\0\0".., 57, 0) = 57
poll(0xFFBE8AA0, 1, -1) = 1
poll(0xFFBE68B8, 0, 0) = 0
recv(4, " G\0\0\0 $\0\0\r02\0\0\0".., 8192, 0) = 55
poll(0xFFBE8B10, 1, 0) = 1
send(4, " G\0\0\0 $\0\0\f 1\0\0\0".., 58, 0) = 58
poll(0xFFBE8B50, 1, -1) = 1
poll(0xFFBE6968, 0, 0) = 0
recv(4, " G\0\0\0 $\0\0\r02\0\0\0".., 8192, 0) = 49
getpid() = 10386 [10385]
poll(0xFFBE99B8, 1, 0) = 1
send(4, " G\0\0\0 $\0\0\f A\0\0\0".., 130, 0) = 130
poll(0xFFBE99F8, 1, -1) = 1
poll(0xFFBE7810, 0, 0) = 0
recv(4, " G\0\0\0 $\0\0\r02\0\0\0".., 8192, 0) = 62
fstat64(4, 0xFFBE9BB0) = 0
getsockopt(4, 65535, 8192, 0xFFBE9CB0, 0xFFBE9CAC, 0) = 0
setsockopt(4, 65535, 8192, 0xFFBE9CB0, 4, 0) = 0
fcntl(4, F_SETFL, 0x00000084) = 0
getuid() = 0 [0]
door_info(3, 0xFFBE78C8) = 0
door_call(3, 0xFFBE78B0) = 0
open("//.vcspwd", O_RDONLY) Err#2 ENOENT
poll(0xFFBEE370, 1, 0) = 1
send(4, " G\0\0\0 $\0\0\t13\0\0\0".., 42, 0) = 42
poll(0xFFBEE3B0, 1, -1) (sleeping...)

After some digging into the internet, we found the following solution to this weird problem:

1. Stop VCS on all nodes in the cluster by manually killing both had & hashadow processes on each node.
# ps -ef | grep had
root 27656 1 0 10:24:02 ? 0:00 /opt/VRTSvcs/bin/hashadow
root 27533 1 0 10:22:01 ? 0:02 /opt/VRTSvcs/bin/had -restart

# kill 27656 27533
GAB: Port h closed

2. Unconfig GAB & llt.
# gabconfig -U
GAB: Port a closed
GAB unavailable

# lltconfig -U
lltconfig: this will attempt to stop and reset LLT. Confirm (y/n)? y

3. Unload GAB & llt modules.
# modinfo | grep gab
100 60ea8000 38e9b 136 1 gab (GAB device)

# modunload -i 100
GAB unavailable

# modinfo | grep llt
84 60c6a000 fd74 137 1 llt (Low Latency Transport device)
# modunload -i 84
LLT Protocol unavailable

4. Restart llt.
# /etc/rc2.d/S70llt start
Starting LLT
LLT Protocol available

5. Restart gab.
# /etc/gabtab
GAB available
GAB: Port a registration waiting for seed port membership

6. Restart VCS :
# hastart -force
# VCS: starting on: <node_name>

Categories: Clouding, HA, HA & HPC, IT Architecture Tags:

impact of restart vxconfigd on solaris and linux – VxVM Configuration Daemon

May 30th, 2012 No comments

stop and restart the VxVM Configuration Daemon, vxconfigd may cause your VxVA, VMSA and/or VEA session to exit. This may also cause a momentary stoppage of any VxVM configuration actions. This should not harm any data; however, it may cause some configuration operations (e.g. moving subdisks, plex resynchronization) to abort unexpectedly. Any VxVM configuration changes should be completed before running this section.

If you are using EMC PowerPath devices with Veritas Volume Manager, you must run the EMC command(s) 'powervxvm setup' (or 'safevxvm setup') and/or 'powervxvm online' (or 'safevxvm online') if this script terminates abnormally. Also, if VCS service groups are running on the host, restarting vxconfigd may cause failover to occur. So you'd better freeze service groups before doing this. You can refer to the following for details: http://www.doxer.org/differences-between-freezing-vcs-system-and-freezing-service-group/

Categories: Clouding, HA, HA & HPC, IT Architecture Tags:

vcs service group and resource attributes dictionary page

May 22nd, 2012 No comments

Here's all the veritas vcs service group and resource attributes and their explanation/crab sheet/cheatsheet(actually this is the file content of /etc/VRTSvcs/conf/attributes/cluster_attrs.xml):

Administrators Contains list of users with Administrator privileges.
 AllowNativeCliUsers If user does not have root privileges, and if this attribute is set to 0 (false), user is prompted for a password when issuing ha-xxx commands. If this attribute is set to 1(true), the user is not prompted; instead, VCS validates OS user's login against VCS' list of user IDs and assigns appropriate privileges. Default = 0(false).
 ClusterLocation Specifies the location of the cluster.
 ClusterName Arbitrary string containing the name of cluster.
 ClusterOwner This attribute is used for VCS notification; specifically, VCS sends notifications to persons designated in this attribute when something goes wrong with the cluster.
 CompareRSM Indicates if VCS engine is to verify that Replicated State Machine is consistent.This can be set by using the hadebug command.
 CounterInterval Intervals counted by the attribute GlobalCounter indicating approximately how often a broadcast will happen that will cause the GlobalCounter attribute to increase. The default value of the GlobalCounter increment can be modified by changing CounterInterval. If you increase this attribute to exceed five seconds, consider increasing the default value of the ShutdownTimeout attribute
 DumpingMembership Indicates that the engine is writing to disk.
 EngineClass Indicates the scheduling class for the VCS engine (had).
 EnginePriority Indicates the priority in which had runs. This attribute has no effect for windows environment.
 GlobalCounter This counter increases incrementally by one for each counter interval. It increases when the broadcast is received. VCS uses the GlobalCounter attribute to measure the time it takes to shut down a system. By default, the GlobalCounter attribute is updated every five seconds. This default value, combined with the 60-second default value of ShutdownTimeout, means if system goes down within twelve increments of GlobalCounter, it is treated as a fault. The default value of GlobalCounter increment can be modified by changing the CounterInterval attribute.
 GroupLimit Maximum number of service groups.
 HacliUserLevel This attribute has three, case-sensitive values:
 LockMemory Controls the locking of VCS engine pages in memory. This attribute has three values: ALL: Locks all current and future pages. CURRENT: Locks current pages.
 LogSize Size of the log file. Minimum value 64 KB  Maximum value 128 MB.
 MajorVersion Major version of system's join protocol.
 MinorVersion Minor version of system's join protocol.
 Notifier Indicates the status of the notifier in the cluster; specifically:
 Operators Contains list of users with Operator privileges.
 ProcessClass Indicates the scheduling class for had processes (for example, triggers).
 ProcessPriority Indicates the priority of had processes (for example, triggers). This attribute has no effect for windows environment.
 PrintMsg If set to 1 (true) , enables logging TagM messages in engine log.
 ReadOnly Indicates the mode of cluster configuration.
 ResourceLimit Maximum number of resources.
 SourceFile File from which the configuration was read.
 TypeLimit Maximum number of resource types.
 UserNames List of VCS user names.
 VCSMode Denotes the mode for which VCS is licensed, including VCS, VCS_QUICKSTART, and VCS_OPS.
 LinkMonitoring Enables link monitoring.
 NotifyList Stores notification list consisting of recipient's email addresses, separated by spaces.
 MaxFactor For internal use only.
 LoadSampling For internal use only.
 Factor For internal use only.
 Stewards Specifies the IP address/hostname of systems running the steward process.
 ClusterAddress Specifies the cluster's virtual IP address (used by a remote cluster when connecting to the local cluster).
 ClusterUUID Indicates the unique cluster identification assigned to the cluster by the Availability Manager.
 ClusState Indicates the current state of the cluster
 AutoStartTimeout If the local cluster cannot communicate with one or more remote clusters, this attribute specifies the number of seconds the VCS engine waits before initiating the AutoStart process for an AutoStart global service group.
 PanicOnNoMem For internal use only.
 UseFence Indicates whether the cluster uses SCSI III I/O fencing.
 VCSFeatures Indicates which VCS features are enabled.
 ClusterTime The number of seconds since January 1, 1970. This is defined by the lowest node in running state.
 WACPort The TCP port on which the WAC (Wide Area Connector) process on the local cluster listens for connection from remote clusters. The attribute can take a value from 0 to 65535.
 SecureClus Indicates whether the cluster is secured. VCS runs in the Secure mode using VxSS; all public network communications use SSL. VCS users belong to the platform user base and VCS does not store user passwords. This value cannot be changed while the cluster is running.
 AvailableCapacity Available Capacity = Capacity - Current System Load
 Capacity Value expressing total load capacity of system. This value is relative to other systems in the cluster and does not reflect any real value associated with the system. Default=100
 ConfigBlockCount Number of 512-byte blocks in configuration when the system joined the cluster.
 ConfigCheckSum Sixteen-bit checksum of configuration identifying when the system joined the cluster.
 ConfigDiskState State of configuration on the disk when the system joined the cluster.
 ConfigFile Directory containing the configuration files.
 ConfigModDate Last modification date of configuration when the system joined the cluster.
 CurrentLimits System-maintained calculation of current value of Limits. CurrentLimits = Limits - (additive value of all service group Prerequisites).
 CPUUsage Indicates the CPUUsage of the system in the form of CPU percentage utilization. This attribute's value is valid if the Enabled value in CPUUsageMonitoring attribute equals 1. This value is updated when there is a change of 5 percent since the last indicated value.
 CPUUsageMonitoring Monitors the system's CPU usage using various factors. The default value for this attribute is CPUUsageMonitoring = {Enabled = 0, NotifyThreshold = 0, NotifyTimeLimit = 0, ActionThreshold = 0, ActionTimeLimit = 0, Action = NONE}
 DiskHbStatus Indicates status of communication disks on the system.
 DynamicLoad System-maintained value of current dynamic load. The value is set external to VCS with the hasys -load command.
 Frozen Indicates if service groups can be brought online or taken offline on the system. Groups cannot be brought online taken offline if the attribute value is 1(true).
 GUIIPAddr Determines the local IP address that VCS uses to accept connections. Incoming connections over other IP addresses are dropped. If GUIIPAddr is not set, the default behavior is to accept external connections over all configured local IP addresses.
 Limits An unordered set of name=value pairs denoting specific resources available on a system. Names are arbitrary and are set by the administrator for any value; names are not obtained from the system. The format for Limits is:Limits() = { Name=Value, Name2=Value2 }.
 Location Denotes the location of the system.
 LinkHbStatus Indicates status of private network links on any system.
 LoadTimeCounter System-maintained internal counter of how many seconds the system load has been above LoadWarningLevel. This value resets to zero anytime system load drops below the value in LoadWarningLevel.
 LoadTimeThreshold Indicates length of time a system must remain above at or above LoadWarningLevel before the loadwarning trigger is fired. Default = 600 seconds.
 LoadWarningLevel Defines, as a percentage of system Capacity, the level at which load has reached a critical limit. Default = 80 percent.
 MajorVersion Major version of system's join protocol.
 MinorVersion Minor version of system's join protocol.
 NodeId System ID specified in "/etc/llttab".
 OnGrpCnt Number of groups that are online, or about to go online.
 ShutdownTimeout Determines whether to treat system reboot as a fault for service groups running on the system. On many systems, when a reboot occurs the processes are killed first, then the system goes down. When the VCS engine is killed, service groups that include the failed system in their SystemList attributes are autodisabled. However, if the system goes down within the number of seconds designated in ShutdownTimeout, service groups previously online on the failed system are treated as faulted and failed over. If you do not want to treat the system reboot as a fault, set the value for this attribute to 0. Default = 120 seconds
 SourceFile File from which the configuration was read.
 SysInfo Provides platform-specific information, including the name, version, and release of the operating system, the name of the system on which it is running, and the hardware type.
 SystemOwner This attribute is used for VCS email notification and logging. VCS sends email notification to the person designated in this attribute when an event occurs related to the system.
 SysState System state such as running , faulted , exited.
 TFrozen Indicates if a group can be brought online or taken offline on the system.
 TRSE Indicates in seconds the time to Regular State Exit. Time is calculated as the duration between the events of VCS losing port h membership and of VCS losing port a membership of GAB.
 UpDownState This attribute has four values DOWN, UP BUT NOT IN CLUSTER MEMBERSHIP, UP AND IN JEOPARDY, UP.
 UserInt Stores a system's integer value.
 UserStr Stores a system's String value.
 DiskHbDown Indicates if communication disks are down on any system. Enabled by the LinkMonitoring attribute.
 LinkHbDown Indicates if private network links are down on any system. Enabled by the LinkMonitoring attribute.
 Load Normalized value of system load used to compare systems in load balancing. Value is determined by dividing the raw values of LoadRaw by the values of Factor.
 LoadRaw List of load-calculation criterion and their associated raw values over the last five seconds.
 SysName The name of the system. The name must begin with a letter and must only contain letters, numbers, dashes (-), and underscores (_).
 SystemLocation Denotes the location of the system.
 LLTNodeId Displays the node ID ( as defined in the llthots.txt ) for the node.
 ConfigInfoCnt For internal use only.
 AgentsStopped The attribute is set to 1 for a system when all agents running on that system are stopped.
 NoAutoDisable When set to 0, this attribute autodisables service groups when the VCS engine is taken down. Groups remain autodisabled until the engine is brought up (regular membership). Setting this attribute to 1 bypasses the autodisable feature.
 LicenseKey LicenseKey
 VCSFeatures Indicates which VCS features are enabled.
 LicenseType Indicates the license type of the base VCS key used by the system. Possible values are:
 VCSMode Denotes the mode for which VCS is licensed, including VCS, Traffic Director, and VCS_OPS.
 CPUBinding Binds the HAD process to the specified CPU.
 EngineRestarted Indicates whether the VCS engine (HAD) was restarted by the hashadow process on a node in the cluster. The value 1 indicates that the engine was restarted; 0 indicates it was not restarted.
 ConnectorState Indicates the state of the wide-area connector (WAC). If 0, WAC is not running. If 1, WAC is running and communicating with the VCS engine.
 ActiveCount Number of resources in a service group that are active (online or waiting to go online). When the number of resources drops to zero, the service group is considered offline.
 Administrators List of VCS users with privileges to administer the group.
 AutoDisabled Indicates that VCS does not know the status of a service group (or specified system for parallel service groups). This is due to: 1) Group not probed (on specified system for parallel groups) in the SystemList attribute. 2) VCS engine in not running on a node designated in the SystemList attribute, but the node is visible.
 AutoFailOver Indicates whether VCS initiates an automatic failover if the service group faults.
 AutoRestart Restarts a service group after a faulted persistent resource becomes online. This attribute applies to persistent resources only. Default value is 1(true).
 AutoStart Designates whether a service group is automatically started when VCS is started. Default value is 1(true).
 AutoStartIfPartial Indicates whether to initiate bringing a service group online if the group is probed and discovered to be in a PARTIAL state when VCS is started. Default =1.
 AutoStartList List of systems on which, under specific conditions, the service group will be started with  VCS (usually at system boot). For example, if a system is a member of a failover service group's AutoStartList attribute, and if it is not already running on another system in the cluster, the group is brought online when the system is started.
 AutoStartPolicy Sets the policy VCS uses to determine on which system to bring a service group online if multiple systems are available.
 CurrentCount Number of systems on which the service groups is active.
 Enabled Indicates if a group can be failed over or brought online. If any of the local values are disabled, the group is disabled. Default value is 1(true).
 Evacuate Indicates if VCS initiates an automatic failover when user issues hastop -local -evacuate. Default value is 1(true).
 Evacuating Indicates the node ID from which the service group is being evacuated.
 Failover Indicates service group is in the process of failing over.
 FailOverPolicy Sets the policy VCS uses to determine which system a group fails over to if multiple systems exist. The values are Priority (default), Load, RoundRobin.
 FromQ Indicates the system name from which the service is failing over. This attribute is specified when service group failover is a direct consequence of the group event, such as a resource fault within the group or a group switch.
 Frozen Disables all actions, including autostart, online and offline, and failover, except for monitor actions performed by agents. Default value is 0(false).
 GroupOwner This attribute is used for VCS email notification and logging. VCS sends email notification to the person designated in this attribute when an event occurs related to the service group.
 IntentOnline Indicates whether to keep service groups online or offline. It is set to 1 by VCS if an attempt has been made, successful or not, to online the service group. For failover groups, this attribute is set to 0 by VCS when the group is taken offline. For parallel groups, it is set to 0 for the system when the group is taken offline or when the group faults and can fail over to another system.
 LastSuccess For internal use only.
 Load Integer value expressing total system load this group will put on a system.
 ManualOps Indicates if manual operations are allowed on the service group.
 MigrateQ Indicates the system from which the service group is migrating. This attribute is specified when group failover is an indirect consequence, such as system shutdown, another group faulted and is linked to this group, etc.
 NumRetries Indicates the number of times attempts are made to bring a service group online. This attribute is used only if the attribute OnlineRetryLimit is set for the service group.
 OnlineRetryInterval Indicates the interval, in seconds, during which a service group that has successfully restarted on the same system and faults again should be failed over, even if the attribute OnlineRetryLimit is non-zero. This prevents a group from continuously faulting and restarting on the same system. Default=0
 OnlineRetryLimit If non-zero, specifies the number of times the VCS engine tries to restart a faulted service group on the same system on which the group faulted, before it gives up and tries to fail over the group to another system. Default = 0.
 Operators List of VCS users with privileges to operate the group. A Group Operator can only perform  online/offline, and temporary freeze/unfreeze operations pertaining to a specific group.
 Parallel This indicates if service group is failover (0), parallel (1) or hybrid (2).
 PathCount Number of resources in path not yet taken offline. When this number drops to zero, the engine may take the entire service group offline if critical fault has occurred.
 PreOnline Indicates that the VCS engine should not online a service group in response to a manual group online, group autostart, or group failover. The engine should instead call a user-defined script that checks for external conditions before bringing the group online. Default value is 0(false).
 PreOnlining Indicates that VCS engine invoked the preonline script; however, the script has not yet returned with group online.
 Prerequisites An unordered set of name=value pairs denoting specific resources required by a service group. If prerequisites are not met, the group cannot go online. The format for Prerequisites is: Prerequisites() = { Name=Value, name2=value2 }. Names used in setting Prerequisites are arbitrary and not obtained from the system. Coordinate name=value pairs listed in Prerequisites with the same name=value pairs in Limits().
 Priority Enables users to designate and prioritize the service group. VCS does not interpret the value; rather, this attribute enables the user to configure the priority of a service group and the sequence of actions required in response to a particular event. Default=0
 PrintTree Indicates whether or not the resource dependency tree is written to the configurtaion file.
 Probed Indicates whether all enabled resources in the group have been detected by their respective agents.
 ProbesPending The number of resources that remain to be detected by the agent on each system.
 Responding Indicates VCS engine is responding to a failover event and is in the process of bringing the service group online or failing over the node.
 SourceFile File from which the configuration was read.
 State Group state on each system. Group states are OFFLINE, ONLINE, FAULTED, PARTIAL, STARTING, STOPPING, OFFLINE | FAULTED, OFFLINE | STARTED, PARTIAL | FAULTED, PARTIAL | STARTING, PARTIAL | STOPPING, ONLINE | STOPPING.
 SystemList List of systems on which the service group is configured to run and their priorities. Lower numbers indicate a preference for the system as a failover target.
 SystemZones Indicates the virtual sublists within the SystemList attribute that grant priority in failing over.Values are string/integer pairs. The string key is the name of a system in the SystemList attribute, and the integer is the number of the zone. Systems with the same zone number are members of the same zone. If a service group faults on one system in a zone, it is granted priority to fail over to another system within the same zone, despite the policy granted by the FailOverPolicy attribute.
 Tag Identifies special-purpose service groups created for specific VCS products.
 TargetCount Indicates the number of target systems on which the service group should be brought online.
 TFrozen Indicates if service groups can be brought online on the system. Groups cannot be brought online if the attribute value is 1(true). Default value is 0 (false).
 ToQ Indicates the node name to which the service is failing over. This attribute is specified when service group failover is a direct consequence of the group event, such as a resource fault within the group or a group switch.
 TriggerResStateChange Determines whether or not to invoke the resstatechange trigger if resource state changes.
 UserIntGlobal Use this attribute for any purpose. It is not used by VCS.
 UserStrGlobal Use this attribute for any purpose. It is not used by VCS.
 TypeDependencies Creates a dependency between resource types specified in the service group list, and all instances of the respective resource type.
 UserIntLocal Use this attribute for any purpose. It is not used by VCS.
 UserStrLocal Use this attribute for any purpose. It is not used by VCS.
 Dependencies Creates a dependency between resource types specified in the service group list, and all instances of the respective resource type.
 PostOffline Setting this attribute to 1 executes the PostOffline event trigger on the system where the group went offline from a partial or fully online state.
 PostOnline Setting this attribute to 1 executes the PostOnline event trigger on the system where the group went online from a partial or fully offline state.
 ExtMonApp For internal use only.
 ExtMonArgs For internal use only.
 PreOffline For internal use only.
 PreOfflining For internal use only.
 Restart For internal use only.
 TriggerEvent For internal use only.
 ClusterList Specifies the list of clusters on which the service group is configured to run.
 Authority Indicates whether or not the local cluster is allowed to bring the service group online. If set to 0, it is not, if set to 1, it is.
 ClusterFailOverPolicy Determines how a global service group behaves when a cluster faults.
 ManageFaults Specifies if VCS manages resource failures within the service group by calling clean entry point for
 FaultPropagation Specifies if VCS should propagate the fault up to parent resources and take the entire service group
 PreonlineTimeout Defines the maximum amount of time the preonline script takes to run the command hagrp -online -nopre for the group. Note that HAD uses this timeout during evacuation only.
 DeferAutoStart Indicates whether HAD defers the auto-start of a local group in case the global cluster is not fully connected.
 VCSi3Info Enables VCS service groups to be mapped to VERITAS i3 applications. This attribute is managed solely by the i3 product and should not be set or modified by the user.
 AgentClass Indicates the scheduling class for the VCS agent process.
 AgentFailedOn A list of systems on which the agent for the resource type has failed.
 AgentPriority Indicates the priority in which the agent process runs. This attribute has no effect for windows environment. Default = 0.
 AgentReplyTimeout The number of seconds the engine waits to receive a heartbeat from the agent before restarting the agent. Default = 130 seconds.
 AgentStartTimeout The number of seconds after starting the agent that the engine waits for the initial agent "handshake" before restarting the agent. Default = 60 seconds.
 ArgList An ordered list of attributes whose values are passed to the open, close, online, offline, monitor, and clean entry points.
 AttrChangedTimeout Maximum time (in seconds) within which the attr_changed entry point must complete or be terminated. Default = 60 seconds.
 CleanTimeout Maximum time (in seconds) within which the clean entry point must complete or else be terminated. Default = 60 seconds.
 CloseTimeout Maximum time (in seconds) within which the close entry point must complete or else be terminated. Default = 60 seconds.
 ConfInterval When a resource has remained online for the specified time (in seconds), previous faults and restart attempts are ignored by the agent.
 FaultOnMonitorTimeouts When a monitor times out as many times as the value specified, the corresponding resource is brought down by calling the clean entry point. The resource is then marked FAULTED, or it is restarted, depending on the value set in the Restart Limit attribute. When FaultOnMonitorTimeouts is set to 0, monitor failures are not considered indicative of a resource fault. A low value may lead to spurious resource faults, especially on heavily loaded systems.
 LogFileSize Specifies the size (in bytes) of the agent log file. Minimum value is 65536 bytes. Maximum value is 134217728 bytes (128MB). Default = 33554432 (32MB)
 MonitorInterval Duration (in seconds) between two consecutive monitor calls for an ONLINE or transitioning resource. A lower value could impact performance if many resources of the same type exist. A higher value could delay detection of a faulted resource.
 MonitorTimeout Maximum time (in seconds) within which the monitor entry point must complete or else be terminated. Default = 60 seconds
 OfflineMonitorInterval Duration (in seconds) between two consecutive monitor calls for an OFFLINE resource. If set to 0, OFFLINE resources are not monitored.
 NumThreads Number of threads used within the agent process for managing resources. This number does not include the three threads used for other internal purposes.Increasing to a significantly large value can degrade system performance. Decreasing to 1 prevents multiple threads. Default = 10.
 OfflineTimeout Maximum time (in seconds) within which the offline entry point must complete or else be terminated. Default = 300 seconds
 OnlineRetryLimit Number of times to retry online, if the attempt to online a resource is unsuccessful. This parameter is meaningful only if clean is implemented. Default = 0.
 OnlineTimeout Maximum time (in seconds) within which the online entry point must complete or else be terminated. Default = 300 seconds
 OnlineWaitLimit Number of monitor intervals to wait after completing the online procedure, and before the resource becomes online. Default = 2.
 OpenTimeout Maximum time (in seconds) within which the open entry point must complete or else be terminated. Default = 60 seconds.
 Operations Indicates valid operations of resources of the resource type. Values are OnOnly (can online only), OnOff (can online and offline), None (cannot online or offline).
 RestartLimit Number of times to retry bringing a resource online when it is taken offline unexpectedly and before VCS declares it FAULTED. Default = 0
 ScriptClass Indicates the scheduling class of the script processes (for example, online) created by the agent. This attribute has no effect for windows environment.
 ScriptPriority Indicates the priority of the script processes created by the agent. This attribute has no effect for windows environment. Default = 0.
 SourceFile File from which the configuration was read.
 ToleranceLimit Number of times the monitor entry point should return OFFLINE before declaring the resource FAULTED. A large value could delay detection of a genuinely faulted resource. Default = 0
 MonitorIfOffline Indicates whether resources are monitored when offline (value 1), or not (value 0).
 Type File system type, such as vxfs, ufs, etc.
 RestartLimits The number of times the agent should try to restart the resources.
 FireDrill Specifies whether or not fire drill is enabled for resource type. If set to 1, fire drill is enabled. If set to 0, it is disabled.
 LogDbg Indicates the debug severities enabled for the resource type or agent framework. Debug severities used by the agent entry points are in the range of DBG_1 to DBG_21. The debug messages from the agent framework are logged with the severities DBG_AGINFO, DBG_AGDEBUG and DBG_AGTRACE, representing the least to most verbose.
 MonitorStatsParam Designates the values governing the monitor interval. Valid keys include:
 InfoInterval Determines when info entry point is invoked by the agent framework. If set to 0, the entry point is not invoked. Set this attribute to a non-zero value to invoke the entry point periodically.
 InfoTimeout Timeout value for info entry point. If entry point does not complete by the designated time, the agent framework cancels the entry point's thread.
 ActionTimeout Timeout value for action entrypoint. Default is 40s
 SupportedActions Valid action tokens for this resource type. Default is an
 LogLevel LogLevel
 LogTags LogTags
 ArgListValues List of arguments passed to the resource's agent on each system.This attribute is resource- and system-specific, meaning that the list of values passed to the agent depend on which system and which resource they are for.
 AutoStart Indicates that the resource is brought online when the service group is brought online. Default value is 1(true).
 ConfidenceLevel Indicates the level of confidence in an online resource. Values range from 0 - 100. Note that some VCS agents may not take advantage of this attribute and may always set it to 0. Set the level to 100 if the attribute is not used.
 Critical Indicates that the service group is faulted when the resource, or any resource it depends on, faults. Default value is 1(true).
 Enabled Indicates agents monitor the resource. If a resource is created dynamically while VCS is running, you must enable the resource before VCS monitors it. When Enabled is set to 0(false), it implies a disabled resource. VCS will not bring a disabled resource, nor its children online, even if the children are enabled. If you specify the resource in main.cf prior to starting VCS, the default value for this attribute is 0(false).
 Flags Additional information relating to the state of a resource. Possible values are : RESTARTING, STATUS UNKNOWN, MONITOR TIMEDOUT, UNABLE TO OFFLINE and ADMIN WAIT.
 Group String name of the service group to which the resource belongs.
 LastOnline Indicates the system name on which the resource was last online. This attribute is automatically set by the VCS engine (had).
 MonitorOnly Indicates if the resource can be brought online or taken offline. If set to 0(false), resource can be brought online or taken offline. If set to 1(true),resource can be monitored only. Default value is 0(false).
 IState Indicates internal state of a resource. In addition to the State attribute, this attribute shows to which state the resource is transitioning. Possible values are : NOT WAITING, WAITING TO GO ONLINE, WAITING FOR CHILDREN ONLINE, WAITING TO GO OFFLINE, WAITING TO GO OFFLINE (propagate), WAITING TO GO ONLINE (reverse), WAITING TO GO OFFLINE (reverse/propagate).
 Path The number of parent resources in the path up to the top of the resource graph. This attribute is used when an online resource faults.
 Probed Indicates whether the resource has been detected by the agent.
 ResourceOwner This attribute is used for VCS email notification and logging. VCS sends email notification to the person designated in this attribute when an event occurs related to the resource.VCS also logs the owner name in when an event occurs.If ResourceOwner is not specified in main.cf, the default value is "unknown."
 Signaled Indicates whether a resource has been traversed. Used when bringing a service group online or taking it offline.
 Start Indicates whether a resource was started (the process of bringing it online was initiated) on a system.
 State Resource state on each system. Possible values are : ONLINE, OFFLINE, FAULTED, ONLINE | STATE UNKNOWN, ONLINE | MONITOR TIMEDOUT, ONLINE | UNABLE TO OFFLINE, OFFLINE | STATE UNKNOWN, FAULTED | RESTARTING. A faulted resource is physically offline, though unintentionally.
 AgentDebug A flag that defines whether the agent logs additional debug messages. The value 1(true) indicates that the agent will log additional debug messages. The value 0(false) indicates that it will not. Default value is 0(false).
 TriggerEvent For internal use only.
 ResourceInfo This attribute has three predefined keys:State: values are Valid, Invalid, or Stale Msg: output of the info entry point captured on stdout by the agent framework TS: timestamp indicating when the ResourceInfo attribute was updated by the agent framework Defaults: State = Valid Msg = "" TS = ""
 ComputeStats The attribute indicates to the agfw whether or not to calculate monitor time statistics for the resource. By default this is set to FALSE.
 MonitorTimeStats The valid keys for this attribute are: Average, TS. Average is the average time taken by the monitor EP over the last "Frequency" number of monitor cycles. TS is the timestamp of when the engine last updated the Average for the resource. Default values are:
 Name For internal use only.
 Enabled Indicates if SNMP traps are enabled.
 IPAddr IP address of the host where the SNMP Manager resides.
 Port Port of SNMP server.
 SourceFile File from which the configuration was read.
 TrapList List of traps and their descriptions.
 Clusterlist List of clusters whose health is determined by this heartbeat.
 AgentState State of the heartbeat agent.
 State This is the state of the heartbeat. This state is used to determine the health of the remote cluster.
 AYAInterval This is the 'Are You Alive Interval'. This is the interval after which the local cluster heartbeats the remote cluster.
 InitTimeout Timeout value for the 'init' entry pont.
 StartTimeout Timeout value for the 'start' entrypoint.
 CleanTimeout This is the timeout value for the 'clean' entry point.
 StopTimeout This is the timeout value for the Stop entry point.
 AYATimeout This is the timeout value for the aya entry point.
 AYARetryLimit number of times to call the aya entry point before giving up.
 Arguments extra generic information that can be passed to the heartbeat agent.
 LogDbg This is used for log messages.

PS:
1.You can download cluster_attrs.xml here for more infomation on vcs service group and resource attributes such as whether the attribute is editable/important/mustconfigure/displayname etc .

vcs-cluster_attrs.zip

2.Some vcs attributes not listed here as they're dedicated for apps, such as oracle. We can import the vcs attributes configuration file detailed for example in this article: http://sfdoccentral.symantec.com/sf/5.0/solaris64/html/vcs_agents_oracle/ch_vha_oracle_configagent9.html
Categories: Clouding, HA, HA & HPC, IT Architecture Tags:

differences between freezing vcs system and freezing service group

May 16th, 2012 No comments

In veritas vcs, freezing a system prevents service groups from coming online on the system if they failover from another node in the cluster. But this does not prevent faults from failing any service group already online on the system.

To prevent veritas intervention on faults caused by expected changes (even if the symptoms are unexpected) we would usually freeze the service group. This prevents any online/clean or restart operation kicking in on detection of faults.

After your modification on vcs, you need check that resources are not autodisabled and make sure that the config is made ro again.

Here's the step to freeze service group(s) in vcs:
/opt/VRTS/bin/haconf -makerw
mkdir /var/tmp/veritas_config_backup_`date +%F`
cp -R /etc/VRTSvcs /var/tmp/veritas_config_backup_`date +%F`
/opt/VRTS/bin/hagrp -freeze $i -persistent
/opt/VRTS/bin/haconf -dump -makero

Categories: Clouding, HA, HA & HPC, IT Architecture Tags: ,