Archive for June, 2012

vmware vsphere esx cloud computing terminology

June 25th, 2012 Comments off

Here's some terminologies related to vmware vsphere/esx:

Relationships Between the Component Layers of VMware vSphere

What is a datastore?

A datastore is a logical container that holds virtual machine files and other files necessary for virtual machine operations. Datastores can exist on different types of physical storage, including local storage, iSCSI, Fibre Channel SAN, or NFS. A datastore can be VMFS-based or NFS-based.

You can create a new datastore by formatting LUNs or by mounting NFS volumes to an existing host. In addition, you can add a host with existing datastores to the inventory.

What is a datacenter?

A datacenter is the primary container of inventory objects such as hosts and virtual machines. From the datacenter, you can add and organize inventory objects. Typically, you add hosts, folders, and clusters to a datacenter.

vCenter Server can contain multiple datacenters. Large companies might use multiple datacenters to represent organizational units in their enterprise.

Inventory objects can interact within datacenters, but interaction across datacenters is limited. For example, you can move a virtual machine with vMotion technology across hosts within a datacenter but not to a host in another datacenter.

 What is a Folder?

A folder is a container used to group objects and organize them into hierarchies. Folders provide a natural structure upon which to apply permissions.

The folder structure you see in the inventory varies depending on the inventory view.

 What is a host?

A host is a computer that uses virtualization software, such as ESX or ESXi, to run virtual machines. Hosts provide the CPU and memory resources that virtual machines use and give virtual machines access to storage and network connectivity.

What is a host profile?

A host profile captures the configuration of a specific host and allows you to duplicate the configuration to other hosts or clusters or to validate that a host's configuration meets datacenter needs. Host profilers help reduce manual steps in cluster host configuration.

You can attach and apply host profiles to hosts or clusters in this view or in the Hosts and Clusters view. When you perform host profile operations in the Hosts and Clusters view, you can right-click individual hosts or clusters in the inventory for some operations or use the Profile Compliance tab for cluster-level host profile operations when a cluster is selected.

What is a Template?

A template is a master image of a virtual machine that can be used to create new virtual machines. This image typically includes an operating system, applications, and configuration settings for the virtual machine.

Use templates to create virtual machines by deploying the template as a virtual machine. When complete, the new virtual machine is added to the folder that was selected when the template was deployed. You can use a template to create identical new virtual machines.

What is a Virtual Machine?

A virtual machine is a software computer that, like a physical computer, runs an operating system and applications. An operating system installed on a virtual machine is called a guest operating system.

Because every virtual machine is an isolated computing environment, you can use virtual machines as desktop or workstation environments, as testing environments, or to consolidate server applications.

In vCenter Server, virtual machines run on hosts or clusters. The same host can run many virtual machines.

What is a Resource Pool?

Resource pools can be used to hierarchically partition available CPU and memory resources of a standalone host or a cluster.

Creating multiple resource pools allows you to think more about aggregate computing capacity and less about individual hosts. In addition, you do not need to set resources on each virtual machine. Instead, you can control the aggregate allocation of resources to the set of virtual machines by changing settings on their enclosing resource pool.

What is a Cluster?

A cluster is a group of hosts. When you add a host to a cluster, the host's resources become part of the cluster's resources. The cluster manages the resources of all hosts within it.

Clusters enable the VMware High Availability(HA) and VMware Distributed Resource Scheduler(DRS) solutions.

What is the Hosts & Clusters view?

This view displays the set of computing resources that run on a particular host, cluster, or resource pool. Using the Hosts & Clusters view, you can manage and organize your inventory of computing resources.

What is the Virtual Machines & Templates View?

This view displays all virtual machines and templates in the inventory, arranged by datacenter. Through this view you can organize virtual machines into folder hierarchies.

What is the Datastores view?

This view displays all datastores in the inventory, arranged by datacenter. Through this view, you can organize datastores into folder hierarchies, manage existing datastores, and add and remove datastores to your inventory.

What is the Networks view?

This view displays the set of networking objects available on vCenter. Using the Networking view, you can create and manage networking with vNetwork Distributed Switches and view networking with Standard Switches configuration.

vSphere provides two types of network architecture. Networking with vNetwork Distributed Switches manages virtual machine and host networking at the datacenter level, while networking with Standard Switches manages virtual machine and host networking at the host level.

What is a Standard Switch network?

A network with Standard Switches is a network of virtual machines running on a single physical machine that are connected logically to each other so that they can send data to and receive data from each other. A network and its associated vSwitches provide the interface between virtual machine NICs and physical network adapters.

What is the Virtual Machine Port Group/VMkernel Port/Service Console port?

There are three types of network connections:

  1. Service console port – access to ESX Server management network
  2. VMkernel port – access to VMotion, iSCSI and/or NFS/NAS networks
  3. Virtual machine port group – access to VM networks

More than one connection type can exist on a single virtual switch, or each connection type can exist on its own virtual switch. For more information, you can refer to the following pdf file:

 What is the Host Profiles view?

The Host Profiles view is the management area of the vSphere Client for host profiles. This view allows administrators to create, edit, or delete host profiles.

You can attach and apply host profiles to hosts or clusters in this view or in the Hosts and Clusters view. When you perform host profile operations in the Hosts and Clusters view, you can right-click individual hosts or clusters in the inventory for some operations or use the Profile Compliance tab for cluster-level host profile operations when a cluster is selected.


More info here

ORA-00600 internal error caused by /tmp swap full

June 22nd, 2012 Comments off

Today we encountered a problem when oracle failed to functioning. After some checking, this error was caused by /tmp running out of space. This also confirmed by OS logs:

Jun 20 17:43:59 tmpfs: [ID 518458 kern.warning] WARNING: /tmp: File system full, swap space limit exceeded

Oracle uses /tmp to compile PL/SQL code, so if there no space it unable to compile/execute. Which causing functions/procedures/packeges and trigers to timeout. The same also described in oracle note: ID 1389623.1

So in order to prevent further occurrences of this error, we should increase /tmp on the system to at least 4Gb.

There is an Oracle parameter to change the default location of these temporary files(_ncomp_shared_objects_dir), but it's not a dynamic parameter. And also, while there is a way to resize a tmpfs filesystem online but it's somehow risky. So the best idea is that, we firstly bring down Oracle DB on this host, then modify /etc/vfstab, and then reboot the whole system. This way will protect our data against the risk of corruption or lost etc, also it'll have some outage time.
So finally, here's the steps:
Amend the line in /etc/vfstab from:

swap - /tmp tmpfs - yes size=512m


swap - /tmp tmpfs - yes size=4096m

Reboot machine and bring up oracle DB

Resolved – yum return error No module named sqlite

June 18th, 2012 Comments off

Today I met an error when I tried running yum commands(like 'yum list' etc) on centos linux, the error message looked like below:

[root@doxer_#1]# yum list
There was a problem importing one of the Python modules
required to run yum. The error leading to this problem was:No module named sqlitePlease install a package which provides this module, or
verify that the module is installed correctly.It's possible that the above module doesn't match the
current version of Python, which is:
2.4.3 (#1, Sep 21 2011, 19:55:41)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-51)]

If you cannot solve this problem yourself, please go to
the yum faq at:

To resolve this issue, we need install sqlite and python-sqlite manually(ONLY for OS version 5, can not install it by yum now cause it's not working):

[root@doxer_#1]# wget ''
[root@doxer_#1]# wget ''
[root@doxer_#1]# rpm -Uvh python-sqlite-1.1.7-1.2.1.x86_64.rpm
[root@doxer_#1}# ldconfig
[root@doxer_#1}# rpm -Uvh sqlite-3.3.6-5.x86_64.rpm

After the installation of these two packages, yum commands worked ok.

  • You may need do "unset PYTHONPATH" if it's there, else you may meet error like "NameError: name 'xxxx' is not defined".
  • In some cases, you need sqlite-devel with the same version with sqlite, enter "import sqlite3" in /usr/bin/yum(may not be needed), deinstall python(rpm -e python-<xxx> --nodeps), then install the same version of python again from rpm package.
Categories: IT Architecture, Linux, Systems Tags:

McAfee Solidcore agent and McAfee agent management

June 14th, 2012 Comments off

The File Integrity Monitoring (FIM) agents on Solaris and Redhat servers is made up of 2 components, a Solidcore Agent and a McAfee Agent.

  • The Solidcore agent is the element that performs the file monitoring. It runs as a kernel module and needs a kernel restart ( reboot ) to disable it.
  • The McAfee agent is responsible for communication back to a central McAfee Enterprise Policy Orchestrator ( EPO ) server. It runs as a service (cma) that can be stopped with minimal impact to the running server. The software can also easily be removed or reinstalled without and impact.

With both Solidocre and a Mcafee agent running, the centralised ePO will control the 'policy' of files to be monitored on the host. this can be overridden in the OS if need be using commands in the Additional Tasks section below.

Status check
To query the status of solidcore on a server ( as root ) run
# sadmin status

To query the policy a server is running with, the local config needs to be 'unlocked' To do this, 'recover' the config and query the policy.
# sadmin recover #( password required, available from the epo administrator )
# sadmin mon list
# sadmin lockdown

The McAfee agent has an associated service 'cma' which can be stopped and restarted while the server is running.
- Stopping the service
service cma stop

- Starting the service
service cma start

The Solidcore agent has an associated service 'scsrvc'
- Stopping the service
service scsrvc stop

- Starting the service
service scsrvc start

The solidcore agent runs as part of the UNIX kernel. Stopping the 'scsrvc' service doesn't fully disable the solidcore software.
To do this :
- Open the local configuration for editing
sadmin recover #{ password needed from the ePO Administrator )

- Set the agent to be disabled at next reboot
sadmin disable

- Close the local configuration for edits
sadmin lockdown


when the server comes back the agent will be disabled. This can be confiremd by running :
sadmin status

error system dram available – some DIMMs not probed by solaris

June 12th, 2012 Comments off

We encountered error message as the following after we disabled some components on a SUN T2000 server:

SEP 29 19:34:35 ERROR: System DRAM  Available: 004096 MB  Physical: 008192 MB

This means that only 4Gb memory available although 8G is physically installed on the server. We can confirm the memory size from the following:

# prtdiag
System Configuration: Sun Microsystems sun4v Sun Fire T200
Memory size: 3968 Megabytes


1. Why the server panic :
We don't have clear data for that . But normally Solaris 10 on T2000 Systems may Panic When Low on Memory. The system panics with the following panic string:

hypervisor call 0x21 returned an unexpected error 2

But in our case , that's also not happened .

2. The error which we can see from Alom:

sc> showcomponent
Disabled Devices
MB/CMP0/CH0/R1/D1 : [Forced DIAG fail (POST)]

DIMMs with CEs are being unnecessarily flagged by POST as faulty. When POST encounters a single CE, the associated DIMM is declared faulty and half of system's memory is deconfigured and unavailable for Solaris. Since PSH (Predictive Self-Healing) is the primary means for detecting errors and diagnosing faults on the Niagara platforms, this policy is too aggressive (reference bug 6334560).
3.What action we can take now :

a) clear the SC log .
b)enable the component in SC .
c)Monitor the server .

if again same fault reports , we will replace the DIMM.


For more information about DIMM, you can refer to

Categories: Hardware, Servers Tags: ,

H/W under test during POST on SUN T2000 Series

June 12th, 2012 Comments off

We got the following error messages during POST on a SUN T2000 Series server:

0:0:0>ERROR: TEST = Queue Block Mem Test
0:0:0>H/W under test = MB/CMP0/CH0/R1/D1/S0 (J0901)
0:0:0>Repair Instructions: Replace items in order listed by 'H/W under
test' above.
0:0:0>MSG = Pin 236 failed on MB/CMP0/CH0/R1/D1/S0 (J0901)
ERROR: The following devices are disabled:
Aborting auto-boot sequence.

To resolve this issue, we can disable the components in ALOM/ILOM and power off /on then try to reboot the machine. Here's the steps:

If you use ALOM :
disablecomponent component

If you use ILOM :
-> set /SYS/component component_state=disabled
-> stop /SYS
-> start /SYS
Example :
-> set /SYS/MB/CMP0/CH0/R1/D1 component_state=disabled

-> stop /SYS
Are you sure you want to stop /SYS (y/n)? y
Stopping /SYS
-> start /SYS
Are you sure you want to start /SYS (y/n)? y
Starting /SYS

After you disabled the components, you should clear SC error log and FMA logs:

Clearing faults from SC:

a) Show the faults on the system controller
sc> showfaults -v

b) For each fault listed run
sc> clearfault <uuid>

c) re-enable the disabled components run
sc> clearasrdb

d) Clear ereports
sc> setsc sc_servicemode true
sc> clearereports -y

To clear the FMA faults and error logs from Solaris:
a) Show faults in FMA
# fmadm faulty

b) For each fault listed in the 'fmadm faulty' run
# fmadm repair <uuid>

c) Clear ereports and resource cache
# cd /var/fm/fmd
# rm e* f* c*/eft/* r*/*

d) Reset the fmd serd modules
# fmadm reset cpumem-diagnosis
# fmadm reset cpumem-retire
# fmadm reset eft
# fmadm reset io-retire

Categories: Hardware, Servers Tags:

vcs commands hang consistently

June 8th, 2012 Comments off

Today we encounter an issue that veritas vcs commands hang in a consistent manner. The commands like haconf -dump -makero just stuck there for a long time that we have to terminate it from console. When using truss(on solaris) or strace(on linux) to trace system calls and signals, we found the following output:

test# truss haconf -dump -makero

execve("/opt/VRTSvcs/bin/haconf", 0xFFBEF21C, 0xFFBEF22C) argc = 3
resolvepath("/usr/lib/", "/usr/lib/", 1023) = 16
open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT

open("//.vcspwd", O_RDONLY) Err#2 ENOENT
getuid() = 0 [0]
getuid() = 0 [0]
so_socket(1, 2, 0, "", 1) = 4
fcntl(4, F_GETFD, 0x00000004) = 0
fcntl(4, F_SETFD, 0x00000001) = 0
connect(4, 0xFFBE7E1E, 110, 1) = 0
fstat64(4, 0xFFBE7AF8) = 0
getsockopt(4, 65535, 8192, 0xFFBE7BF8, 0xFFBE7BF4, 0) = 0
setsockopt(4, 65535, 8192, 0xFFBE7BF8, 4, 0) = 0
fcntl(4, F_SETFL, 0x00000084) = 0
brk(0x000F6F28) = 0
brk(0x000F8F28) = 0
poll(0xFFBE8A60, 1, 0) = 1
send(4, " G\0\0\0 $\0\0\t15\0\0\0".., 57, 0) = 57
poll(0xFFBE8AA0, 1, -1) = 1
poll(0xFFBE68B8, 0, 0) = 0
recv(4, " G\0\0\0 $\0\0\r02\0\0\0".., 8192, 0) = 55
poll(0xFFBE8B10, 1, 0) = 1
send(4, " G\0\0\0 $\0\0\f 1\0\0\0".., 58, 0) = 58
poll(0xFFBE8B50, 1, -1) = 1
poll(0xFFBE6968, 0, 0) = 0
recv(4, " G\0\0\0 $\0\0\r02\0\0\0".., 8192, 0) = 49
getpid() = 10386 [10385]
poll(0xFFBE99B8, 1, 0) = 1
send(4, " G\0\0\0 $\0\0\f A\0\0\0".., 130, 0) = 130
poll(0xFFBE99F8, 1, -1) = 1
poll(0xFFBE7810, 0, 0) = 0
recv(4, " G\0\0\0 $\0\0\r02\0\0\0".., 8192, 0) = 62
fstat64(4, 0xFFBE9BB0) = 0
getsockopt(4, 65535, 8192, 0xFFBE9CB0, 0xFFBE9CAC, 0) = 0
setsockopt(4, 65535, 8192, 0xFFBE9CB0, 4, 0) = 0
fcntl(4, F_SETFL, 0x00000084) = 0
getuid() = 0 [0]
door_info(3, 0xFFBE78C8) = 0
door_call(3, 0xFFBE78B0) = 0
open("//.vcspwd", O_RDONLY) Err#2 ENOENT
poll(0xFFBEE370, 1, 0) = 1
send(4, " G\0\0\0 $\0\0\t13\0\0\0".., 42, 0) = 42
poll(0xFFBEE3B0, 1, -1) (sleeping...)

After some digging into the internet, we found the following solution to this weird problem:

1. Stop VCS on all nodes in the cluster by manually killing both had & hashadow processes on each node.
# ps -ef | grep had
root 27656 1 0 10:24:02 ? 0:00 /opt/VRTSvcs/bin/hashadow
root 27533 1 0 10:22:01 ? 0:02 /opt/VRTSvcs/bin/had -restart

# kill 27656 27533
GAB: Port h closed

2. Unconfig GAB & llt.
# gabconfig -U
GAB: Port a closed
GAB unavailable

# lltconfig -U
lltconfig: this will attempt to stop and reset LLT. Confirm (y/n)? y

3. Unload GAB & llt modules.
# modinfo | grep gab
100 60ea8000 38e9b 136 1 gab (GAB device)

# modunload -i 100
GAB unavailable

# modinfo | grep llt
84 60c6a000 fd74 137 1 llt (Low Latency Transport device)
# modunload -i 84
LLT Protocol unavailable

4. Restart llt.
# /etc/rc2.d/S70llt start
Starting LLT
LLT Protocol available

5. Restart gab.
# /etc/gabtab
GAB available
GAB: Port a registration waiting for seed port membership

6. Restart VCS :
# hastart -force
# VCS: starting on: <node_name>

Categories: Clouding, HA, HA & HPC, IT Architecture Tags:

using oracle materialized view with one hour refresh interval to reduce high concurrency

June 8th, 2012 Comments off

If your oracle DB is at a very high concurrency and you find that the top sqls are some views, then there's a quick way to resolve this: using oracle materialized view. You may consider setting the refresh interval to one hour which means the view will refresh every hour. After the setting go live, you'll find the normal performance will appear.

For more information about oracle materialized view, you can visit

Here's a image with high oracle concurrency:

oracle high concurrency

useful sed single line examples when clearing embedded trojans or embedded links

June 7th, 2012 Comments off

When your site is embedded with some links/trojans by somebody maliciously, the first thing you could think of would mostly like to clear these malicious links/trojans. sed is a useful stream editor based on line, and you would of course think of using sed to do the cleaning job.

Usually, the embedded codes would be several lines of html codes like the following:

<div class="trojans">
<a href="">malicous site's name</a>

To clear these html codes, you can use the following sed line:

sed  '/<div class=\"trojans\">/,/<\/div>/d' injected.htm

But usually the injected files are spread across several directories or even your whole website's directory. You can combine using find and sed together to clean these annoying trojans:

find /var/www/html/ -type f \( -name *.htm -o -name *.html -o -name *.php \) -exec sed  -i.bak' /<div class=\"trojans\">/,/<\/div>/d' {} \;

Please note I use -i.bak to backup file before doing the replacement.(you should also backup your data before cleaning trojans!)


For more info about sed examples/tutorials, you may refer to the following two resources: