Archive for the ‘Clouding’ Category

Document databases and Graph databases

March 27th, 2014 No comments
  • Document databases

Document databases are not document management systems. More often than not, developers starting out with NoSQL confuse document databases with document and content management systems. The worddocument in document databases connotes loosely structured sets of key/value pairs in documents, typically JSON (JavaScript Object Notation), and not documents or spreadsheets (though these could be stored too).

Document databases treat a document as a whole and avoid splitting a document into its constituent name/value pairs. At a collection level, this allows for putting together a diverse set of documents into a single collection. Document databases allow indexing of documents on the basis of not only its primary identifier but also its properties. A few different open-source document databases are available today but the most prominent among the available options are MongoDB and CouchDB.


  • Official Online Resources —
  • History — Created at 10gen.
  • Technologies and Language — Implemented in C++.
  • Access Methods — A JavaScript command-line interface. Drivers exist for a number of languages including C, C#, C++, Erlang. Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, and Scala.
  • Query Language — SQL-like query language.
  • Open-Source License — GNU Affero GPL (
  • Who Uses It — FourSquare, Shutterfly, Intuit, Github, and more.


  • Official Online Resources — and www.couchbase. Most of the authors are part of Couchbase, Inc.
  • History — Work started in 2005 and it was incubated into Apache in 2008.
  • Technologies and Language — Implemented in Erlang with some C and a JavaScript execution environment.
  • Access Methods — Upholds REST above every other mechanism. Use standard web tools and clients to access the database, the same way as you access web resources.
  • Open-Source License — Apache License version 2.
  • Who Uses It — Apple, BBC, Canonical, Cern, and more at

A lot of details on document databases are covered starting in the next chapter.

  • Graph Databases

So far I have listed most of the mainstream open-source NoSQL products. A few other products like Graph databases and XML data stores could also qualify as NoSQL databases. This book does not cover Graph and XML databases. However, I list the two Graph databases that may be of interest and something you may want to explore beyond this book: Neo4j and FlockDB:

Neo4J is an ACID-compliant graph database. It facilitates rapid traversal of graphs.


  • Official Online Resources —
  • History — Created at Neo Technologies in 2003. (Yes, this database has been around before the term NoSQL was known popularly.)
  • Technologies and Language — Implemented in Java.
  • Access Methods — A command-line access to the store is provided. REST interface also available. Client libraries for Java, Python, Ruby, Clojure, Scala, and PHP exist.
  • Query Language — Supports SPARQL protocol and RDF Query Language.
  • Open-Source License — AGPL.
  • Who Uses It —


  • Official Online Resources —
  • History — Created at Twitter and open sourced in 2010. Designed to store the adjacency lists for followers on Twitter.
  • Technologies and Language — Implemented in Scala.
  • Access Methods — A Thrift and Ruby client.
  • Open-Source License — Apache License version 2.
  • Who Uses It — Twitter.

A number of NoSQL products have been covered so far. Hopefully, it has warmed you up to learn more about these products and to get ready to understand how you can leverage and use them effectively in your stack.


This article is from book <Professional NoSQL>.


Categories: Clouding, Databases, IT Architecture Tags:

sorted ordered column-oriented stores VS key/value stores in NoSQL

March 27th, 2014 No comments
  • sorted ordered column-oriented stores

Google’s Bigtable espouses a model where data in stored in a column-oriented way. This contrasts with the row-oriented format in RDBMS. The column-oriented storage allows data to be stored effectively. It avoids consuming space when storing nulls by simply not storing a column when a value doesn’t exist for that column.

Each unit of data can be thought of as a set of key/value pairs, where the unit itself is identified with the help of a primary identifier, often referred to as the primary key. Bigtable and its clones tend to call this primary key the row-key. Also, as the title of this subsection suggests, units are stored in an ordered-sorted manner. The units of data are sorted and ordered on the basis of the row-key. To explain sorted ordered column-oriented stores, an example serves better than a lot of text, so let me present an example to you. Consider a simple table of values that keeps information about a set of people. Such a table could have columns like first_name, last_name, occupation, zip_code, and gender. A person’s information in this table could be as follows:

first_name: John
last_name: Doe
zip_code: 10001
gender: male

Another set of data in the same table could be as follows:

first_name: Jane
zip_code: 94303

The row-key of the first data point could be 1 and the second could be 2. Then data would be stored in a sorted ordered column-oriented store in a way that the data point with row-key 1 will be stored before a data point with row-key 2 and also that the two data points will be adjacent to each other.

Next, only the valid key/value pairs would be stored for each data point. So, a possible column-family for the example could be name with columns first_name and last_name being its members. Another column-family could be location with zip_code as its member. A third column-family could be profile. The gender column could be a member of the profile column-family. In column-oriented stores similar to Bigtable, data is stored on a column-family basis. Column-families are typically defined at configuration or startup time. Columns themselves need no a-priori definition or declaration. Also, columns are capable of storing any data types as far as the data can be persisted to an array of bytes.

So the underlying logical storage for this simple example consists of three storage buckets: name, location, and profile. Within each bucket, only key/value pairs with valid values are stored. Therefore, the name column-family bucket stores the following values:

For row-key: 1

first_name: John
last_name: Doe

For row-key: 2

first_name: Jane

The location column-family stores the following:

For row-key: 1

zip_code: 10001

For row-key: 2

zip_code: 94303

The profile column-family has values only for the data point with row-key 1 so it stores only the following:

For row-key: 1

gender: male

In real storage terms, the column-families are not physically isolated for a given row. All data pertaining to a row-key is stored together. The column-family acts as a key for the columns it contains and the row-key acts as the key for the whole data set.

Data in Bigtable and its clones is stored in a contiguous sequenced manner. As data grows to fill up one node, it is spilt into multiple nodes. The data is sorted and ordered not only on each node but also across nodes providing one large continuously sequenced set. The data is persisted in a fault-tolerant manner where three copies of each data set are maintained. Most Bigtable clones leverage a distributed filesystem to persist data to disk. Distributed filesystems allow data to be stored among a cluster of machines.

The sorted ordered structure makes data seek by row-key extremely efficient. Data access is less random and ad-hoc and lookup is as simple as finding the node in the sequence that holds the data. Data is inserted at the end of the list. Updates are in-place but often imply adding a newer version of data to the specific cell rather than in-place overwrites. This means a few versions of each cell are maintained at all times. The versioning property is usually configurable.

A bullet-point enumeration of some of the Bigtable open-source clones’ properties is listed next.

  • HBase

Official Online Resources —
History — Created at Powerset (now part of Microsoft) in 2007. Donated to the Apache foundation before Powerset was acquired by Microsoft.
Technologies and Language — Implemented in Java.
Access Methods — A JRuby shell allows command-line access to the store. Thrift, Avro, REST, and protobuf clients exist. A few language bindings are also available. A Java API is available with the distribution.
Query Language — No native querying language. Hive ( provides a SQL-like interface for HBase.
Open-Source License — Apache License version 2.
Who Uses It — Facebook, StumbleUpon, Hulu, Ning, Mahalo, Yahoo!, and others.

  • Hypertable

Official Online Resources —
History — Created at Zvents in 2007. Now an independent open-source project.
Technologies and Language — Implemented in C++, uses Google RE2 regular expression library. RE2 provides a fast and efficient implementation. Hypertable promises performance boost over HBase, potentially serving to reduce time and cost when dealing with large amounts of data.
Access Methods — A command-line shell is available. In addition, a Thrift interface is supported. Language bindings have been created based on the Thrift interface. A creative developer has even created a JDBC-compliant interface for Hypertable.
Query Language — HQL (Hypertable Query Language) is a SQL-like abstraction for querying Hypertable data. Hypertable also has an adapter for Hive.
Open-Source License — GNU GPL version 2.
Who Uses It — Zvents, Baidu (China’s biggest search engine), Rediff (India’s biggest portal).

  • Cloudata

Official Online Resources —
History — Created by a Korean developer named YK Kwon ( Not much is publicly known about its origins.
Technologies and Language — Implemented in Java.
Access Methods — A command-line access is available. Thrift, REST, and Java API are available.
Query Language — CQL (Cloudata Query Language) defines a SQL-like query language.
Open-Source License — Apache License version 2.
Who Uses It — Not known.

Sorted ordered column-family stores form a very popular NoSQL option. However, NoSQL consists of a lot more variants of key/value stores and document databases. Next, I introduce the key/value stores.

  • key/value stores

A HashMap or an associative array is the simplest data structure that can hold a set of key/value pairs. Such data structures are extremely popular because they provide a very efficient, big O(1) average algorithm running time for accessing data. The key of a key/value pair is a unique value in the set and can be easily looked up to access the data.

Key/value pairs are of varied types: some keep the data in memory and some provide the capability to persist the data to disk. Key/value pairs can be distributed and held in a cluster of nodes.

A simple, yet powerful, key/value store is Oracle’s Berkeley DB. Berkeley DB is a pure storage engine where both key and value are an array of bytes. The core storage engine of Berkeley DB doesn’t attach meaning to the key or the value. It takes byte array pairs in and returns the same back to the calling client. Berkeley DB allows data to be cached in memory and flushed to disk as it grows. There is also a notion of indexing the keys for faster lookup and access. Berkeley DB has existed since the mid-1990s. It was created to replace AT&T’s NDBM as a part of migrating from BSD 4.3 to 4.4. In 1996, Sleepycat Software was formed to maintain and provide support for Berkeley DB.

Another type of key/value store in common use is a cache. A cache provides an in-memory snapshot of the most-used data in an application. The purpose of cache is to reduce disk I/O. Cache systems could be rudimentary map structures or robust systems with a cache expiration policy. Caching is a popular strategy employed at all levels of a computer software stack to boost performance. Operating systems, databases, middleware components, and applications use caching.

Robust open-source distributed cache systems like EHCache ( are widely used in Java applications. EHCache could be considered as a NoSQL solution. Another caching system popularly used in web applications is Memcached (, which is an open-source, high-performance object caching system. Brad Fitzpatrick created Memcached for LiveJournal in 2003. Apart from being a caching system, Memcached also helps effective memory management by creating a large virtual pool and distributing memory among nodes as required. This prevents fragmented zones where one node could have excess but unused memory and another node could be starved for memory.

As the NoSQL movement has gathered momentum, a number of key/value pair data stores have emerged. Some of these newer stores build on the Memcached API, some use Berkeley DB as the underlying storage, and a few others provide alternative solutions built from scratch.

Many of these key/value pairs have APIs that allow get-and-set mechanisms to get and set values. A few, like Redis (, provide richer abstractions and powerful APIs. Redis could be considered as a data structure server because it provides data structures like string (character sequences), lists, and sets, apart from maps. Also, Redis provides a very rich set of operations to access data from these different types of data structures.

This book covers a lot of details on key/value pairs. For now, I list a few important ones and list out important attributes of these stores. Again, the presentation resorts to a bullet-point-style enumeration of a few important characteristics.

  • Membase (Proposed to be merged into Couchbase, gaining features from CouchDB after the creation of Couchbase, Inc.)

Official Online Resources —
History — Project started in 2009 by NorthScale, Inc. (later renamed as Membase). Zygna and NHN have been contributors since the beginning. Membase builds on Memcached and supports Memcached’s text and binary protocol. Membase adds a lot of additional features on top of Memcached. It adds disk persistence, data replication, live cluster reconfiguration, and data rebalancing. A number of core Membase creators are also Memcached contributors.
Technologies and Language — Implemented in Erlang, C, and C++.
Access Methods — Memcached-compliant API with some extensions. Can be a drop-in replacement for Memcached.
Open-Source License — Apache License version 2.
Who Uses It — Zynga, NHN, and others.

  • Kyoto Cabinet

Official Online Resources —
History — Kyoto Cabinet is a successor of Tokyo Cabinet ( The database is a simple data file containing records; each is a pair of a key and a value. Every key and value are serial bytes with variable length.
Technologies and Language — Implemented in C++.
Access Methods — Provides APIs for C, C++, Java, C#, Python, Ruby, Perl, Erlang, OCaml, and Lua. The protocol simplicity means there are many, many clients.
Open-Source License — GNU GPL and GNU LGPL.
Who Uses It — Mixi, Inc. sponsored much of its original work before the author left Mixi to join Google. Blog posts and mailing lists suggest that there are many users but no public list is available.

  • Redis

Official Online Resources —
History — Project started in 2009 by Salvatore Sanfilippo. Salvatore created it for his startup LLOOGG ( Though still an independent project, Redis primary author is employed by VMware, who sponsor its development.
Technologies and Language — Implemented in C.
Access Methods — Rich set of methods and operations. Can access via Redis command-line interface and a set of well-maintained client libraries for languages like Java, Python, Ruby, C, C++, Lua, Haskell, AS3, and more.
Open-Source License — BSD.
Who Uses It — Craigslist.

The three key/value pairs listed here are nimble, fast implementations that provide storage for real-time data, temporary frequently used data, or even full-scale persistence.

The key/value pairs listed so far provide a strong consistency model for the data it stores. However, a few other key/value pairs emphasize availability over consistency in distributed deployments. Many of these are inspired by Amazon’s Dynamo, which is also a key/value pair. Amazon’s Dynamo promises exceptional availability and scalability, and forms the backbone for Amazon’s distributed fault tolerant and highly available system. Apache Cassandra, Basho Riak, and Voldemort are open-source implementations of the ideas proposed by Amazon Dynamo.

Amazon Dynamo brings a lot of key high-availability ideas to the forefront. The most important of the ideas is that of eventual consistency. Eventual consistency implies that there could be small intervals of inconsistency between replicated nodes as data gets updated among peer-to-peer nodes. Eventual consistency does not mean inconsistency. It just implies a weaker form of consistency than the typical ACID type consistency found in RDBMS.

For now I will list the Amazon Dynamo clones and introduce you to a few important characteristics of these data stores.

  • Cassandra

Official Online Resources —
History — Developed at Facebook and open sourced in 2008, Apache Cassandra was donated to the Apache foundation.
Technologies and Language — Implemented in Java.
Access Methods — A command-line access to the store. Thrift interface and an internal Java API exist. Clients for multiple languages including Java, Python, Grails, PHP, .NET. and Ruby are available. Hadoop integration is also supported.
Query Language — A query language specification is in the making.
Open-Source License — Apache License version 2.
Who Uses It — Facebook, Digg, Reddit, Twitter, and others.

  • Voldemort

Official Online Resources —
History — Created by the data and analytics team at LinkedIn in 2008.
Technologies and Language — Implemented in Java. Provides for pluggable storage using either Berkeley DB or MySQL.
Access Methods — Integrates with Thrift, Avro, and protobuf ( interfaces. Can be used in conjunction with Hadoop.
Open-Source License — Apache License version 2.
Who Uses It — LinkedIn.

  • Riak

Official Online Resources —
History — Created at Basho, a company formed in 2008.
Technologies and Language — Implemented in Erlang. Also, uses a bit of C and JavaScript.
Access Methods — Interfaces for JSON (over HTTP) and protobuf clients exist. Libraries for Erlang, Java, Ruby, Python, PHP, and JavaScript exist.
Open-Source License — Apache License version 2.
Who Uses It — Comcast and Mochi Media.

All three — Cassandra, Riak and Voldemort — provide open-source Amazon Dynamo capabilities. Cassandra and Riak demonstrate dual nature as far their behavior and properties go. Cassandra has properties of both Google Bigtable and Amazon Dynamo. Riak acts both as a key/value store and a document database.


This article is from book <Professional NoSQL>.

Categories: Clouding, Databases, IT Architecture Tags:

map/reduce framework definition and introduction

March 27th, 2014 No comments

MapReduce is a parallel programming model that allows distributed processing on large data sets on a cluster of computers. The MapReduce framework is patented (,650,331.PN.&OS=PN/7,650,331&RS=PN/7,650,331) by Google, but the ideas are freely shared and adopted in a number of open-source implementations.

MapReduce derives its ideas and inspiration from concepts in the world of functional programming. Map and reduce are commonly used functions in the world of functional programming. In functional programming, a map function applies an operation or a function to each element in a list. For example, a multiply-by-two function on a list [1, 2, 3, 4] would generate another list as follows: [2, 4, 6, 8]. When such functions are applied, the original list is not altered. Functional programming believes in keeping data immutable and avoids sharing data among multiple processes or threads. This means the map function that was just illustrated, trivial as it may be, could be run via two or more multiple threads on the list and these threads would not step on each other, because the list itself is not altered.

Like the map function, functional programming has a concept of a reduce function. Actually, a reduce function in functional programming is more commonly known as a fold function. A reduce or a fold function is also sometimes called an accumulate, compress, or inject function. A reduce or fold function applies a function on all elements of a data structure, such as a list, and produces a single result or output. So applying a reduce function-like summation on the list generated out of the map function, that is, [2, 4, 6, 8], would generate an output equal to 20.

So map and reduce functions could be used in conjunction to process lists of data, where a function is first applied to each member of a list and then an aggregate function is applied to the transformed and generated list.

This same simple idea of map and reduce has been extended to work on large data sets. The idea is slightly modified to work on collections of tuples or key/value pairs. The map function applies a function on every key/value pair in the collection and generates a new collection. Then the reduce function works on the new generated collection and applies an aggregate function to compute a final output. This is better understood through an example, so let me present a trivial one to explain the flow. Say you have a collection of key/value pairs as follows:

[{ "94303": "Tom"}, {"94303": "Jane"}, {"94301": "Arun"}, {"94302": "Chen"}]

This is a collection of key/value pairs where the key is the zip code and the value is the name of a person who resides within that zip code. A simple map function on this collection could get the names of all those who reside in a particular zip code. The output of such a map function is as follows:

[{"94303":["Tom", "Jane"]}, {“94301″:["Arun"]}, {“94302″:["Chen"]}]

Now a reduce function could work on this output to simply count the number of people who belong to particular zip code. The final output then would be as follows:

[{"94303": 2}, {"94301": 1}, {"94302": 1}]

This example is extremely simple and a MapReduce mechanism seems too complex for such a manipulation, but I hope you get the core idea behind the concepts and the flow.


This article is from book <Professional NoSQL>.

AWS – relationship between Elastic Load Balancing, CloudWatch, and Auto Scale

March 20th, 2014 No comments


The monitoring, auto scaling, and elastic load balancing features of the Amazon EC2 services give you easy on-demand access to capabilities that once required a complicated system architecture and a large hardware investment.
Any real-world web application must have the ability to scale. This can take the form of vertical scaling, where larger and higher capacity servers are rolled in to replace the existing ones, or horizontal scaling, where additional servers are placed side-by-side (architecturally speaking) with the existing resources. Vertical scaling is sometimes called a scale-up model, and horizontal scaling is sometimes called a scale-out model.

Vertical Scaling

At first, vertical scaling appears to be the easiest way to add capacity. You start out with a server of modest means and use it until it no longer meets your needs. You purchase a bigger one, move your code and data over to it, and abandon the old one. Performance is good until the newer, larger system reaches its capacity. You purchase again, repeating the process until your hardware supplier informs you that you’re running on the largest hardware that they have, and that you’ve no more room to grow. At this point you’ve effectively painted yourself into a corner.
Vertical scaling can be expensive. Each time you upgrade to a bigger system you also make a correspondingly larger investment. If you’re actually buying hardware, your first step-ups cost you thousands of dollars; your later ones cost you tens or even hundreds of thousands of dollars. At some point you may have to invest in a similarly expensive backup system, which will remain idle unless the unthinkable happens and you need to use it to continue operations.

Horizontal Scaling

Horizontal scaling is slightly more complex, but far more flexible and scalable in the long term. Instead of upgrading to a bigger server, you obtain another one (presumably of the same size, although there’s no requirement for this to be the case) and arrange to share the storage and processing load across two servers. When two servers no longer meet your needs, you add a third, a fourth, and so on. This scale-out model allows you to add resources incrementally and economically. As your fleet of servers grow, you can actually increase the reliability of your system by eliminating dependencies on any particular server.
Of course, sharing the storage and processing load across a fleet of servers is sometimes easier said than done. Loosely coupled systems tied together with SQS message queues like those we saw and built in the previous chapter can usually scale easily. Systems with a reliance on a traditional relational database or another centralized storage can be more difficult.

Monitoring, Scaling, and Load Balancing

We’ll need several services in order to build a horizontally scaled system that automatically scales to handle load.
First, we need to know how hard each server is working. We have to establish how much data is moving in and out across the network, how many disk reads and writes are taking place, and how much of the time the CPU (Central Processing Unit) is busy. This functionality is provided by Amazon CloudWatch. After CloudWatch has been enabled for an EC2 instance or an elastic load balancer, it captures and stores this information so that it can be used to control scaling decisions.
Second, we require a way to observe the system performance, using it to make decisions to add more EC2 instances (because the system is too busy) or to remove some running instances (because there’s too little work for them to do). This functionality is provided by the EC2 auto scaling feature. The auto scaling feature uses a rule-driven system to encode the logic needed to add and remove EC2 instances.
Third, we need a method for routing traffic to each of the running instances. This is handled by the EC2 elastic load balancing feature. Working in conjunction with auto scaling, elastic load balancing distributes traffic to EC2 instances located in one or more Availability Zones within an EC2 region. It also uses configurable health checks to detect failing instances and to route traffic away from them.
Figure 7-1 depicts how these features relate to each other.
An incoming HTTP load is balanced across a collection of EC2 instances. CloudWatch captures and stores system performance data from the instances. This data is used by auto scale to regulate the number of EC2 instances in the collection.
As you’ll soon see, you can use each of these features on their own or you can use them together. This modular model gives you a lot of flexibility and also allows you to learn about the features in an incremental fashion.


This article is from book <Host Your Web Site In The Cloud: Amazon Web Services Made Easy>.


Categories: Clouding Tags:

Oracle VM operations – poweron, poweroff, status, stat -r

January 27th, 2014 No comments

Here’s the script:

#1.OVM must be running before operations status before running poweroff or poweron
use Net::SSH::Perl;
$host = $ARGV[0];
$operation = $ARGV[1];
$user = ‘root’;
$password = ‘password’;

if($host eq “help”) {
print “$0 OVM-name status|poweron|poweroff|stat-r\n”;

$ssh = Net::SSH::Perl->new($host);

if($operation eq “status”) {
($stdout,$stderr,$exit) = $ssh->cmd(“ovm -uadmin -pwelcome1 vm ls|grep -v VM_test”);
select $host_fd;
print $stdout;
close $host_fd;
} elsif($operation eq “poweroff”) {
if($_ =~ “Server_Pool|OVM|Powered”) {
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+([a-zA-Z]{1,})\s+(.*)/){
$ssh->cmd(“ovm -uadmin -pwelcome1 vm poweroff -n $1 -s $6″);
sleep 12;
} elsif($operation eq “poweron”) {
if($_ =~ “Server_Pool|OVM|Running”) {
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+([a-zA-Z]{1,})\s+Off(.*)/){
$ssh->cmd(“ovm -uadmin -pwelcome1 vm poweron -n $1 -s $6″);
#print “ovm -uadmin -pwelcome1 vm poweron -n $1 -s $6″;
sleep 20;
} elsif($operation eq “stat-r”) {
if($_ =~ /(.*?)\s+([0-9]{1,})\s+([0-9]{1,})\s+([0-9]{1,})\s+(Shutting\sDown|Initializing)\s+(.*)/){
#print “ovm -uadmin -pwelcome1 vm stat -r -n $1 -s $6″;
$ssh->cmd(“ovm -uadmin -pwelcome1 vm stat -r -n $1 -s $6″);
sleep 1;

You can use the following to make the script run in parallel:

for i in <all OVMs>;do (./ $i status &);done

Categories: Clouding, IT Architecture, Oracle Cloud, Perl Tags:

resolved – ESXi Failed to lock the file

January 13th, 2014 No comments

When I was power on one VM in ESXi, one error occurred:

An error was received from the ESX host while powering on VM doxer-test.
Cannot open the disk ‘/vmfs/volumes/4726d591-9c3bdf6c/doxer-test/doxer-test_1.vmdk’ or one of the snapshot disks it depends on.
Failed to lock the file

And also:

unable to access file since it is locked

This apparently was caused by some storage issue. I firstly googled and found most of the posts were telling stories about ESXi working mechanism, and I tried some of them but with no luck.

Then I thought of that our storage datastore was using NFS/ZFS, and NFS has file lock issue as you know. So I mount the nfs share which datastore was using and removed one file named lck-c30d000000000000. After this, the VM booted up successfully! (or we can log on ESXi host, and remove lock file there also)

Categories: NAS, Oracle Cloud, Storage Tags:

Common storage multi path Path-Management Software

December 12th, 2013 No comments
Vendor Path-Management Software URL
Hewlett-Packard AutoPath, SecurePath
Microsoft MPIO
Hitachi Dynamic Link Manager
EMC PowerPath
IBM RDAC, MultiPath Driver
VERITAS Dynamic Multipathing (DMP)
Categories: HA, Hardware, IT Architecture, SAN, Storage Tags:

VLAN in windows hyper-v

November 26th, 2013 No comments

Briefly, a virtual LAN (VLAN) can be regarded as a broadcast domain. It operates on the OSI
network layer 2. The exact protocol definition is known as 802.1Q. Each network packet belong-
ing to a VLAN has an identifier. This is just a number between 0 and 4095, with both 0 and 4095
reserved for other uses. Let’s assume a VLAN with an identifier of 10. A NIC configured with
the VLAN ID of 10 will pick up network packets with the same ID and will ignore all other IDs.
The point of VLANs is that switches and routers enabled for 802.1Q can present VLANs to dif-
ferent switch ports in the network. In other words, where a normal IP subnet is limited to a set
of ports on a physical switch, a subnet defined in a VLAN can be present on any switch port—if
so configured, of course.

Getting back to the VLAN functionality in Hyper-V: both virtual switches and virtual NICs
can detect and use VLAN IDs. Both can accept and reject network packets based on VLAN ID,
which means that the VM does not have to do it itself. The use of VLAN enables Hyper-V to
participate in more advanced network designs. One limitation in the current implementation is
that a virtual switch can have just one VLAN ID, although that should not matter too much in
practice. The default setting is to accept all VLAN IDs.

Hyper-V architecture: the hypervisor, the virtual machines, and their relations

November 25th, 2013 No comments
hyper-v architecture

hyper-v architecture

cpu rings

cpu rings

hybrid virtualization - microsoft virtual server

hybrid virtualization – microsoft virtual server

PS: This is from book Mastering Windows Server® 2008 R2

Categories: Clouding Tags:

Configuring Active/Passive Clustering for Apache Tomcat in Oracle RAC

October 1st, 2013 1 comment

Note: this is from book <Pro Oracle Database 11g RAC on Linux>

A slightly more complex example involves making Apache Tomcat or another web-accessible
application highly available. The difference in this setup compared to the database setup described in
the previous chapter lies in the fact that you need to use a floating virtual IP address. Floating in this
context means that the virtual IP address moves jointly with the application. Oracle calls its
implementation of a floating VIP an application VIP. Application VIPs were introduced in Oracle
Clusterware 10.2. Previous versions only had a node VIP.
The idea behind application VIPs is that, in the case of a node failure, both VIP and the application
migrate to the other node. The example that follows makes Apache Tomcat highly available, which is
accomplished by installing the binaries for version 6.0.26 in /u01/tomcat on two nodes in the cluster. The
rest of this section outlines the steps you must take to make Apache Tomcat highly available.
Oracle Grid Infrastructure does not provide an application VIP by default, so you have to create one.
A new utility, called appvipcfg, can be used to set up an application VIP, as in the following example:

[root@london1 ~]# appvipcfg
Production Copyright 2007, 2008, Oracle.All rights reserved

Usage: appvipcfg create -network=<network_number> -ip=<ip_address> -vipname=<vipname>
delete -vipname=<vipname>
[root@london1 ~]# appvipcfg create -network=1 \
> -ip -vipname httpd-vip -user=root
Production Copyright 2007, 2008, Oracle.All rights reserved
2010-06-18 16:07:12: Creating Resource Type
2010-06-18 16:07:12: Executing cmd: /u01/app/crs/bin/crsctl add type app.appvip.type -basetype
cluster_resource -file /u01/app/crs/crs/template/appvip.type
2010-06-18 16:07:13: Create the Resource
2010-06-18 16:07:13: Executing cmd: /u01/app/crs/bin/crsctl add resource httpd-vip -type
app.appvip.type -attr USR_ORA_VIP=,START_DEPENDENCIES=hard(


The preceding output shows that the new resource has been created, and it is owned by root
exclusively. You could use crsctl setperm to change the ACL, but this is not required for this process.
Bear in mind that no account other than root can start the resource at this time. You can verify the result
of this operation by querying the resource just created. Note how the httpd-vip does not have an ora.

[root@london1 ~]# crsctl status resource httpd-vip

Checking the resource profile reveals that it matches the output of the appvipcfg command; the
output has been shortened for readability, and it focuses only on the most important keys (the other
keys were removed for the sake of clarity):

[root@london1 ~]# crsctl stat res httpd-vip –p

The dependencies on the network ensure that, if the network is not started, it will be started as part
of the VIP start. The resource is controlled by the CRSD orarootagent because changes to the network
configuration require root privileges in Linux. The status of the resource revealed it was stopped; you
can use the following command to start it:

[root@london1 ~]# crsctl start res httpd-vip
CRS-2672: Attempting to start ‘httpd-vip’ on ‘london2′
CRS-2676: Start of ‘httpd-vip’ on ‘london2′ succeeded
[root@london1 ~]#

In this case, Grid Infrastructure decided to start the resource on server london2.

[root@london1 ~]# crsctl status resource httpd-vip
STATE=ONLINE on london2

You can verify this by querying the network setup, which has changed. The following output is again
shortened for readability:

[root@london2 source]# ifconfig

eth0:3 Link encap:Ethernet HWaddr 00:16:36:2B:F2:F6
inet addr: Bcast: Mask:

Next, you need an action script that controls the Tomcat resource. Again, the requirement is to
implement start, stop, clean, and check functions in the action script. The Oracle documentation lists
C, C++, and shell scripts as candidate languages for an action script. We think that the action script can
be any executable, as long as it returns 0 or 1, as required by Grid Infrastructure. A sample action script
that checks for the Tomcat webserver could be written in plan bash, as in the following example:


export CATALINA_HOME=/u01/tomcat
export ORA_CRS_HOME=/u01/app/crs
export JAVA_HOME=$CRS_HOME/jdk
export CHECKURL=””

case $1 in
# download a simple, small image from the tomcat server
/usr/bin/wget -q –delete-after $CHECKURL
# A 0 indicates success, return 1 for an error.
if [ $RET -eq 0 ]; then

exit 0
exit 1

In our installation, we created a $GRID_HOME/hadaemon/ directory on all nodes in the cluster to save
the Tomcat action script,
The next step is to ensure that the file is executable, which you can accomplish by running test to
see whether the file works as expected. Once you are confident that the script is working, you can add
the Tomcat resource.
The easiest way to configure the new resource is by creating a text file with the required attributes,
as in this example:

[root@london1 hadaemon]# cat tomcat.profile
HOSTING_MEMBERS=london1 london2

The following command registers the resource tomcat in Grid Infrastructure:

[root@london1 ~]# crsctl add resource tomcat –type cluster_resource -file tomcat.profile

Again, the profile registered matches what has been defined in the tomcat.profile file, plus the
default values:

[root@london1 hadaemon]# crsctl status resource tomcat –p
HOSTING_MEMBERS=london1 london2


This example includes a hard dependency on the httpd-vip resource, which is started now. If you
try to start the Tomcat resource, you will get the following error:

[root@london1 hadaemon]# crsctl start resource tomcat
CRS-2672: Attempting to start ‘tomcat’ on ‘london1′
CRS-2674: Start of ‘tomcat’ on ‘london1′ failed
CRS-2527: Unable to start ‘tomcat’ because it has a ‘hard’ dependency
on ‘httpd-vip’
CRS-2525: All instances of the resource ‘httpd-vip’ are already running;
relocate is not allowed because the force option was not specified
CRS-4000: Command Start failed, or completed with errors.

To get around this problem, you need begin by shutting down httpd-vip and then trying again:

[root@london1 hadaemon]# crsctl stop res httpd-vip
CRS-2673: Attempting to stop ‘httpd-vip’ on ‘london1′
CRS-2677: Stop of ‘httpd-vip’ on ‘london1′ succeeded
[root@london1 hadaemon]# crsctl start res tomcat
CRS-2672: Attempting to start ‘httpd-vip’ on ‘london1′
CRS-2676: Start of ‘httpd-vip’ on ‘london1′ succeeded
CRS-2672: Attempting to start ‘tomcat’ on ‘london1′
CRS-2676: Start of ‘tomcat’ on ‘london1′ succeeded

The Tomcat servlet and JSP container is now highly available. However, please bear in mind that the
session state of an application will not fail over to the passive node in the case of a node failure. The
preceding example could be further enhanced by using a shared cluster logical ACFS volume to store the
web applications used by Tomcat, as well as and the Tomcat binaries themselves.

Categories: HA, Oracle DB Tags:

hadoop installation on centos linux – pseudodistributed mode

September 18th, 2013 No comments

First, install JDK and set JAVA_HOME:

yum install jdk-1.6.0_30-fcs export JAVA_HOME=/usr/java/jdk1.6.0_30

Now install hadoop rpm:

rpm -Uvh hadoop-1.2.1-1.x86_64.rpm

run hadoop version to verify that hadoop was successfully installed:

[root@node3 hadoop]# hadoop version Hadoop 1.2.1 Subversion -r 1503152 Compiled by mattf on Mon Jul 22 15:27:42 PDT 2013 From source with checksum 6923c86528809c4e7e6f493b6b413a9a This command was run using /usr/share/hadoop/hadoop-core-1.2.1.jar

After this, let’s config hadoop to run in Pseudodistributed mode:

[root@node3 hadoop]# cat /etc/hadoop/core-site.xml <?xml version=”1.0″?> <?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?> <!– Put site-specific property overrides in this file. –> <configuration> <property> <name></name> <value>hdfs://localhost/</value> </property> </configuration> [root@node3 hadoop]# cat /etc/hadoop/hdfs-site.xml <?xml version=”1.0″?> <?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?> <!– Put site-specific property overrides in this file. –> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> [root@node3 hadoop]# cat /etc/hadoop/mapred-site.xml <?xml version=”1.0″?> <?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?> <!– Put site-specific property overrides in this file. –> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:8021</value> </property> </configuration>

We need configure password-less configuration for ssh localhost if we’re running in Pseudodistributed mode. This mainly means you can “ssh localhost” without a password(ssh-keygen -t rsa/append to authorized_keys). After above ssh configuration, let’s go formating HDFS filesystem:

hadoop namenode -format

Now we can start daemons:

PS: I found that and and some other hadoop related scripts are not with execution permission initially, so you may run the following script to fix this:

for i in `find /usr/sbin/ -type f ! -perm -u+x`;do chmod u+x $i;done

That’s all for hadoop installation on linux. You can now visit http://<ip of node>:50030/jobtracker.jsp and http://<ip of node>:50070/dfshealth.jsp to see status of hadoop jobtracker/namenode respectively.


Hadoop: The Definitive Guide, 3rd Edition is a good book about hadoop.

Categories: Clouding Tags:

chef installation on centos linux

July 19th, 2013 No comments

We need install chef server, chef workstation, chef nodes to make chef working, here’s the steps:

###Install chef server
1.Go to
2.Click the Chef Server tab.
3.Select the operating system, version, and architecture.
4.Select the version of Chef Server 11.x to download, and then click the link that appears to download the package.
5.Install the downloaded package using the correct method for the operating system on which Chef Server 11.x will be installed.
6.Configure Chef Server 11.x by running the following command:
$ sudo chef-server-ctl reconfigure

add sudo even you’re root, or you may encouter some errors like the following:
/opt/chef-server/embedded/service/chef-pedant/lib/pedant/config.rb:34:in `from_argv’: Configuration file ‘/var/opt/chef-server/chef-pedant/etc/pedant_config.rb’ not found! (RuntimeError)
from /opt/chef-server/embedded/service/chef-pedant/lib/pedant.rb:44:in `setup’
from ./bin/chef-pedant:26:in `<main>’
Error reading file /etc/chef-server/chef-webui.pem

This command will set up all of the required components, including Erchef, RabbitMQ, PostgreSQL, and all of the cookbooks that are used by chef-solo to maintain Chef Server 11.x.

7.Verify the the hostname for the Chef Server by running the hostname command. The hostname for the Chef Server must be a FQDN(

8.Verify the installation of Chef Server 11.x by running the following command:

$ sudo chef-server-ctl test
This will run the chef-pedant test suite against the installed Chef Server 11.x and will report back that everything is working and installed correctly.

###Install chef workstation
1.Go to:, select the operating system, version, and architecture appropriate for your environment, and identify the URL that will be used to download the package or download the package directly.
2.Run the commands identified above: curl -L | sudo bash (you can also wget the script and run it)
3.After installation, you can run chef-client -v to see the version of chef
4.Install git(yum install git for centos/redhat if you’re using EPEL repo)
5.Clone the Chef repository: cd ~; git clone git:// (If you met some connection errors, you can change git:// to http://, i.e. git clone
6.Create the .chef directory: mkdir -p ~/chef-repo/.chef
7.Copy admin.pem, chef-validator.pem from chef server(in /etc/chef-server/ on chef server) to workstation(/etc/chef, if not exists, create the directory)
8.Now generate knife.rb using command knife configure –initial

Overwrite /root/.chef/knife.rb? (Y/N) y
Please enter the chef server URL: []
Please enter a name for the new user: [root] user1
Please enter the existing admin name: [admin]
Please enter the location of the existing admin’s private key: [/etc/chef/admin.pem]
Please enter the validation clientname: [chef-validator]
Please enter the location of the validation key: [/etc/chef/validation.pem] /etc/chef/chef-validator.pem
Please enter the path to a chef repository (or leave blank):
Creating initial API user…
Please enter a password for the new user:
Created user[user1]
Configuration file written to /root/.chef/knife.rb

[root@workstation]# cp /root/.chef/knife.rb /root/chef-repo/.chef/
[root@workstation]# cp /etc/chef/admin.pem /root/chef-repo/.chef/
[root@workstation]# cp /etc/chef/chef-validator.pem /root/chef-repo/.chef/

9.Add ruby to PATH: echo ‘export PATH=”/opt/chef/embedded/bin:$PATH”‘ >> ~/.bash_profile && source ~/.bash_profile
10.Verify the chef workstation install
cd ~/chef-repo
knife client list
knife node show <node name>
knife user list

###Install chef client on nodes
On chef workstation:
knife bootstrap <node ip or FQDN> -x username -P password
knife client show <node ip or FQDN just added>

Here’s some definitions that may help you out of mouthful of definitions in chef(from

  • Declare policy using resources
  • Collect resources into recipes
  • Package recipes and supporting code into cookbooks
  • Apply cookbooks to nodes using roles
  • Run Chef to configure nodes according to their assigned roles

###First try of chef populating

I want to install nginx on client named chef-client01 which is a redhat box. Here’s the steps to do this:

The following steps are executed on chef workstation:

First, we need create environment and assign node to the environment:

cd /root/chef-repo
vi environments/prod.rb

name “prod”
description “The production environment”

git add environments
git commit -m ‘Add production environments.’
knife environment from file environments/prod.rb

knife node edit <node name> #change environment from _default to prod
Now let’s install php cookbook and its dependencies:

knife cookbook site install yum
knife cookbook site install runit
knife cookbook site install ohai
knife cookbook site install php
knife cookbook upload –all

Now let’s create a role and make recipes available to environment “prod” which contains our node:

vi roles/db_master.rb

name “db_master”
description “Master database server”

all_env = [


“_default” => all_env,
“prod” => all_env,
“dev” => all_env,

git add roles
git commit -m ‘Add LAMP roles.’
knife role from file roles/db_master.rb

Finally, we should make the node install php:

knife ssh “name:<node name>” “chef-client” -x root -P <your password on chef node>

Or you can run chef-client from chef node.

1.Here’s all aspects of chef
2.Here’s more detailed installation guide of installing chef
3.You may need set http proxy when doing some of the downloading or knife bootstrap steps. You may try export http_proxy=http://<your_proxy:port> for wget or try knife bootstrap <other options> –bootstrap-proxy http://<your_proxy:port> for knife bootstrap

4.For chef solo install and configure, you can refer to the following article


Categories: Clouding Tags:

Best practices for NFS security and performance

July 18th, 2013 No comments

The recommendations here apply to general purpose Network File System (NFS) servers based on Linux or other Unix derivatives.  They do not apply to high-end filers from vendors such as NetApp, EMC, or others, who provide their own proprietary interfaces and settings.


The following example shows how to restrict access on a Linux-based NFS server with settings in /etc/exports. (How to set this configuration varies from manufacturer to manufacture of network appliances.)
Only the network has full access to the export; all other networks have only read access:
/test-export1  (rw,no_root_squash,sync,no_subtree_check) (ro,root_squash,sync,no_subtree_check)

Align partitions with blocks on physical or virtual disks


NFS mount options rsize and wsize not less than 1Mb

The NFS mount options rsize and wsize set the size of data transfers for reading and writing.
For best performance, do not set these sizes to less than 1Mb.


Ethernet jumbo frame size on NICs and switches

For best performance, jumbo frames should be enabled on the NFS server’s network interface cards (NICs), the
network switches or routers they are connected to. A jumbo frame is 9,000 bytes, which greatly reduces packet fragmentation.

You can use the ping -s 8192 command from the cluster to the NFS server to determine if jumbo frames
have been enabled. A successful response from the specified IP address means that jumbo frames are enabled.
For example:

ping -s 8192 IPaddressOrNameOfNfsServer

Identifying NFS retransmissions (this is from book <Managing NFS and NIS, Second Edition>)

Inspection of the load average and disk activity on the servers may indicate that the servers are heavily loaded and imposing the tightest constraint. The NFS client-side statistics provide the most concrete evidence that one or more slow servers are to blame:

% nfsstat -rc 
Client rpc:
calls       badcalls    badxids     timeouts    newcreds    badverfs    
1753584     1412        18          64          0           0           
timers      cantconn    nomem       interrupts  
0           1317        0           18          
calls       badcalls    retrans     badxids     timeouts    newcreds    
12443       41          334         80          166         0           
badverfs    timers      nomem       cantsend    
0           4321        0           206


The -rc option is given to nfsstat to look at the RPC statistics only, for client-side NFS operations. The call type demographics contained in the NFS-specific statistics are not of value in this analysis. The test for a slow server is having badxid and timeout of the same magnitude. In the previous example, badxid is nearly a third the value of timeout for connection-oriented RPC, and nearly half the value oftimeout for connectionless RPC. Connection-oriented transports use a higher timeout than connectionless transports, therefore the number of timeouts will generally be less for connection-oriented transports. The high badxid count implies that requests are reaching the various NFS servers, but the servers are too loaded to send replies before the local host’s RPC calls time out and are retransmitted.badxid is incremented each time a duplicate reply is received for a retransmitted request (an RPC request retains its XID through all retransmission cycles). In this case, the server is replying to all requests, including the retransmitted ones. The client is simply not patient enough to wait for replies from the slow server. If there is more than one NFS server, the client may be outpacing all of them or just one particularly sluggish node.

If the server has a duplicate request cache, retransmitted requests that match a non-idempotent NFS call currently in progress are ignored. Only those requests in progress are recognized and filtered, so it is still possible for a sufficiently loaded server to generate duplicate replies that show up in the badxid counts of its clients. Without a duplicate request cache, badxid and timeout may be nearly equal, while the cache will reduce the number of duplicate replies. With or without a duplicate request cache, if the badxid and timeout statistics reported by nfsstat (on the client) are of the same magnitude, then server performance is an issue deserving further investigation.

A mixture of network and server-related problems can make interpretation of the nfsstat figures difficult. A client served by four hosts may find that two of the hosts are particularly slow while a third is located across a network router that is digesting streams of large write packets. One slow server can be masked by other, faster servers: a retransmission rate of 10% (calculated as timeout/calls) would indicate short periods of server sluggishness or network congestion if the retransmissions were evenly distributed among all servers. However, if all timeouts occurred while talking to just one server, the retransmission rate for that server could be 50% or higher.

A simple method for finding the distribution of retransmitted requests is to perform the same set of disk operations on each server, measuring the incremental number of RPC timeouts that occur when loading each server in turn. This experiment may point to a server that is noticeably slower than its peers, if a large percentage of the RPC timeouts are attributed to that host. Alternatively, you may shift your focus away from server performance if timeouts are fairly evenly distributed or if no timeouts occur during the server loading experiment. Fluctuations in server performance may vary by the time of day, so that more timeouts occur during periods of peak server usage in the morning and after lunch, for example.

Server response time may be clamped at some minimum value due to fixed-cost delays of sending packets through routers, or due to static configurations that cannot be changed for political or historical reasons. If server response cannot be improved, then the clients of that server must adjust their mount parameters to avoid further loading it with retransmitted requests. The relative patience of the client is determined by the timeout, retransmission count, and hard-mount variables.



You can read more about NFS here

Categories: Clouding, Network, Security Tags:

make linux image template for use on OVM EC2 Esxi

July 18th, 2013 No comments

No default gateway:

[root@centos images]# cat /etc/sysconfig/network


Comment out udev for NICs:

[root@centos images]# cat /etc/udev/rules.d/70-persistent-net.rules
# This file was automatically generated by the /lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.

# PCI device 0x15ad:0x07b0 (vmxnet3)
SUBSYSTEM==”net”, ACTION==”add”, DRIVERS==”?*”, ATTR{address}==”00:0c:29:f5:c1:86″, ATTR{type}==”1″, KERNEL==”eth*”, NAME=”eth0″

# PCI device 0x15ad:0x07b0 (vmxnet3)
SUBSYSTEM==”net”, ACTION==”add”, DRIVERS==”?*”, ATTR{address}==”00:0c:29:f5:c1:90″, ATTR{type}==”1″, KERNEL==”eth*”, NAME=”eth1″


[root@centos images]# cat /etc/sysconfig/network-scripts/ifcfg-eth0

Categories: Clouding Tags:

install vm with virt-install libvirt

July 8th, 2013 No comments
  • whether your cpu support full virtualization aka HVM

egrep ‘(vmx|svm)’ –color=always /proc/cpuinfo

  • Install required packages

yum install kvm libvirt libvirt-python python-virtinst bridge-utils virt-viewer virt-manager

  • Now add bridge and attach Interface to bridge

brctl addbr virbr0 #you’ll need bridge set up before using virt-install
vi ifcfg-eth1 #only wrote there with DEVICE,HWADDR,ONBOOT,TYPE=”Ethernet”,BRIDGE=”virbr0″, BOOTPROTO=none
brctl addif virbr0 eth1 #eth1 will disconnect
vi ifcfg-virbr0

  • Prepare storage

dd if=/dev/zero of=oel5.8_20G.img bs=1 count=0 seek=20G

  • Starting to install vm

/etc/init.d/libvirtd start

virt-install –name=oel5.8-15G –arch=x86_64 –vcpus=2 –ram=1024 –os-type=linux –os-variant=rhel5 –virt-type=kvm –connect=qemu:///system –network bridge:virbr0 –cdrom=iso/OLinux5.8_64.iso –disk path=oel5.8_15g.img,size=15 –accelerate –vnc –keymap=us

  • Connect to vm console after installation of vm



1.You can read more docs about virtualization here

2.If you’re using OVM, this command may help you:

virt-install -n <name of VM> -r 1024 –vcpus=2 -f <path to System.img> -b <bridge name> –vif-type=netfront -p -l <http://path-to-mount-dir-of-OS-ISO> –nographics –os-type=linux –os-variant=rhel6

3.You can use oraclevm-template –config –force to config VM’s network/gateway/hostname automatically. Here’s the package:  or (version may change)

Categories: IT Architecture, Oracle Cloud Tags:

create vm image from qemu-kvm

July 1st, 2013 No comments


Categories: Clouding Tags:

o2cb for OCFS2

July 1st, 2013 No comments
o2cb – Default cluster stack for the OCFS2 file system, it includes
  • a node manager (o2nm) to keep track of the nodes in the cluster,
  • a heartbeat agent (o2hb) to detect live nodes
  • a network agent (o2net) for intra-cluster node communication
  • a distributed lock manager (o2dlm) to keep track of lock resources
  • All these components are in-kernel.
  • It also includes an in-memory file system, dlmfs, to allow userspace to access the in-kernel dlm
  • /etc/ocfs2/cluster.conf, /etc/sysconfig/o2cb, /sys/kernel/config/cluster
Categories: Clouding, Oracle Cloud Tags: ,

xen tips

July 1st, 2013 No comments
cat /proc/cpuinfo |egrep -i ‘vmx|svm|hvm’ #whether supported by CPU
xm info|grep hvm #xen, whether OS support HVM
xm block-attach guestdomain file://path/to/dsl-2.0RC2.iso  /dev/hdc r #then mount it on guest. or w for read/write
brctl show #bridge
service xendomains start #boot up all VMs under /etc/xen/auto
xm sched-credit -d <domain> -c <cap>/-w <weight>#priority
Categories: Clouding, Oracle Cloud, tips Tags:

vsphere esxi tips

July 1st, 2013 No comments
vicfg-<esxcfg- deprecated> and other vCLI commands, include ESXCLI<from the server with vCLI package installed OR from the vMA virtual machine OR through vcenter server<-vihost parameter>>
esxcli<better use vCLI or PowerCLI instead. directly from esxi shell<console> OR from the server with vCLI package installed OR from the vMA virtual machine OR from vsphere PowerCLI prompt by using Get-EsxCli> OR through vcenter server<-vihost parameter>
localcli <localcli commands are equivalent to ESXCLI commands, but bypass hostd. The localcli commands are only for situations when hostd is unavailable and cannot be restarted. After you run a localcli command, you must restart hostd. Run ESXCLI commands after the restart. If you use a localcli command in other situations, an inconsistent system state and potential failure can result.>
PowerCLI cmdlets<windows powershell>
Some examples:
vicfg-hostops <conn_options> –operation shutdown –force
vicfg-hostops <conn_options> –operation shutdown –cluster <my_cluster>
vmware-cmd –config esxhome.cfg -l
vmware-cmd –config esxhome.cfg ‘/vmfs/volumes/505f5efb-38f8b83f-e1ce-1c6f65d2477b/OracleLinux/OracleLinux.vmx’ getuptime
esxcli [options] {namespace}+ {cmd} [cmd options]
esxcli –config esxhome.cfg network ip interface list
esxcli –config esxhome.cfg fcoe adapter list
esxcli –config esxhome.cfg storage nfs add -H <hostname> -s <sharepoint> -v <volumename>
esxcli –config esxhome.cfg –formatter=csv network ip interface list
esxcli –config esxhome.cfg –reason <reason> system shutdown poweroff <must be in maintenance mode>
esxcli –config esxhome.cfg –reason <reason> system shutdown reboot
esxcli <conn_options> system maintenanceMode set –enable true
Categories: Clouding, tips, VMware Cloud Tags: ,

enable vm virtualization support in esxi

June 24th, 2013 No comments

If you want to enable your newly created VM’s virtualization support, you can follow these steps:

  1. In Vm setting -> Options -> CPU/MMU Virtualization, select either the third for forth checkbox:enable_virtualization
  2. Go to esxi console, locate your VM’s vmx configuration file(under /vmfs/volumes/Datastore/Nimbula_Node05 in my case), and add a line:

vhv.enable = TRUE

After these steps, your vm should now support nested virtualization. You can run egrep ‘(vmx|svm)’ –color=always /proc/cpuinfo to confirm whether virtualization is enabled or not now.

Categories: Clouding, VMware Cloud Tags: ,

cpu usage in xen vm – using xentop

June 7th, 2013 No comments

To check how much cpu one vm is consuming, we can use xentop for this analyzing:

[test@test ~]# xentop -b -i 2 -d 1
11572_test_0106_us_oracle_com –b— 8412 0.0 34603008 34.4 34603008 34.4 6 2 196796 1111779 2 90 37651 3174172 0
16026_test_0093_us_oracle_com –b— 4255 0.0 1048576 1.0 1048576 1.0 2 2 2092803 2914101 3 851 49446 1918010 0
16051_test_0094_us_oracle_com —–r 3636909 0.0 56623104 56.3 56623104 56.3 24 2 1553871 970055 2 417 101921 10195220 0
Domain-0 —–r 36197 0.0 2621440 2.6 no limit n/a 24 0 0 0 0 0 0 0 0
11572_test_0106_us_oracle_com –b— 8412 0.1 34603008 34.4 34603008 34.4 6 2 196796 1111780 2 90 37651 3174172 0
16026_test_0093_us_oracle_com –b— 4255 0.1 1048576 1.0 1048576 1.0 2 2 2092803 2914102 3 851 49446 1918015 0
16051_test_0094_us_oracle_com —–r 3636933 2396.8 56623104 56.3 56623104 56.3 24 2 1553895 970090 2 417 101921 10195220 0
Domain-0 —–r 36197 2.7 2621440 2.6 no limit n/a 24 0 0 0 0 0 0 0 0

So we can see that for vm ’16051_test_0094_us_oracle_com’, it has 24 vcpus, but the CPU(%) has reached 2396.8. We can calculate from 2396.8/24, that’s almost 100% usage of all the vcpus. So we can see that this vm is quite busy.

Categories: Clouding, Oracle Cloud Tags:

howto about xen vm live migration from standalone Oracle Virtual Server(OVS) to Oracle VM Manager

May 24th, 2013 No comments
    1. standalone OVS(source server) and OVS managed by Oracle VM Manager(destination server) must be the same type of machine, we can use command dmidecode |grep ‘Product Name’ to confirm.
    2. make sure xend relocation server has been configured and is running, run the following commands to confirm:

grep xend-relocation /etc/xen/xend-config.sxp |grep -v ‘#’
(xend-relocation-server yes)
(xend-relocation-ssl-server yes)
(xend-relocation-port 8002)
(xend-relocation-server-ssl-key-file /etc/ovs-agent/cert/key.pem)
(xend-relocation-server-ssl-cert-file /etc/ovs-agent/cert/certificate.pem)
(xend-relocation-address ”)

lsof -i :8002
xend 8372 root 5u IPv4 17979 TCP *:teradataordbms (LISTEN)

3.make sure ports are open between source and destination servers, run telnet <server_name> 8002 to confirm

4.Make sure source & destination servers are in the same subnet


To live migrate xen vm, the source & destination servers should have one NFS mounted. In Oracle VM, we can fulfill this by creating another storage repo for the current server pool.
The steps for creating storgage repo:
First, make sure the NFS share are writable to OVSes managed by Oracle VM Manager;
Second, run “/opt/ovs-agent-2.3/utils/ -n <NFS share>” on master OVS;
Third, run “/opt/ovs-agent-2.3/utils/ -i” on master OVS to make the storage repo seen by all OVSes managed by Oracle VM Manager;


For live migration, the mount directories of the NFS share must be the same on source & destination OVS. But as the mount directory is automatically created by Oracle VM when creating the storage repo, so we must create symbolic link on destination OVS.
Assuming we have xen VM configuration on source OVS like the following:

disk = ['file:/repo_standalone/testvm/System.img,xvda,w']

Then we’ll link storage repo dir to /repo_standalone:

cd /
ln -s /var/ovs/mount/<uuid> /repo_standalone


Now on source OVS, let’s do the migration to destination OVS which has enough free memory

time xm migrate -l <vm> <destination OVS>


After the VM live migrated to destination OVS, we’ll need import the migrated VM to Oracle VM Manager. We’ll create another soft link under running_pool so that Oracle VM Manager can see the image:

cd /var/ovs/mount/<uuid>
ln -s /var/ovs/mount/<uuid>/<vm> .

After this, open GUI of Oracle VM Manager and then import & approve the system image.

You don’t need change VM configuration file(vm.cfg) manually, as after image imported to Oracle VM Manager the configuration file will be changed automatically by Oracle VM.

vmware vsphere esxi vicfg esxcli localcli PowerCLI

May 21st, 2013 No comments
vicfg-<esxcfg- deprecated> and other vCLI commands, include ESXCLI<from the server with vCLI package installed OR from the vMA virtual machine OR through vcenter server<-vihost parameter>>
esxcli<better use vCLI or PowerCLI instead. directly from esxi shell<console> OR from the server with vCLI package installed OR from the vMA virtual machine OR from vsphere PowerCLI prompt by using Get-EsxCli> OR through vcenter server<-vihost parameter>
localcli <localcli commands are equivalent to ESXCLI commands, but bypass hostd. The localcli commands are only for situations when hostd is unavailable and cannot be restarted. After you run a localcli command, you must restart hostd. Run ESXCLI commands after the restart. If you use a localcli command in other situations, an inconsistent system state and potential failure can result.>
PowerCLI cmdlets<windows powershell>
Some examples:
vicfg-hostops <conn_options> –operation shutdown –force
vicfg-hostops <conn_options> –operation shutdown –cluster <my_cluster>
vmware-cmd –config esxhome.cfg -l
vmware-cmd –config esxhome.cfg ‘/vmfs/volumes/505f5efb-38f8b83f-e1ce-1c6f65d2477b/OracleLinux/OracleLinux.vmx’ getuptime
esxcli [options] {namespace}+ {cmd} [cmd options]
esxcli –config esxhome.cfg network ip interface list
esxcli –config esxhome.cfg fcoe adapter list
esxcli –config esxhome.cfg storage nfs add -H <hostname> -s <sharepoint> -v <volumename>
esxcli –config esxhome.cfg –formatter=csv network ip interface list
esxcli –config esxhome.cfg –reason <reason> system shutdown poweroff <must be in maintenance mode>
esxcli –config esxhome.cfg –reason <reason> system shutdown reboot
esxcli <conn_options> system maintenanceMode set –enable true
Categories: VMware Cloud Tags:

oracle ocfs2 cluster filesystem best practise

May 21st, 2013 No comments
  • To check current settings of o2cb, check files under /sys/kernel/config/cluster/ocfs2/
  • To set new value for o2cb:

service o2cb unload
service o2cb configure

heartbeat dead threshold 151 #Iterations before a node is considered dead
network idle timeout 120000 #Time in ms before a network connection is considered dead
network keepalive delay 5000 #Max time in ms before a keepalive packet is sent
network reconnect delay 5000 #Min time in ms between connection attempts

service o2cb load

service o2cb status #will show new configuration if OVS in server pool; or it will show offline


o2cb – Default cluster stack for the OCFS2 file system, it includes
  • a node manager (o2nm) to keep track of the nodes in the cluster,
  • a heartbeat agent (o2hb) to detect live nodes
  • a network agent (o2net) for intra-cluster node communication
  • a distributed lock manager (o2dlm) to keep track of lock resources
  • All these components are in-kernel.
  • It also includes an in-memory file system, dlmfs, to allow userspace to access the in-kernel dlm
  • main conf files: /etc/ocfs2/cluster.conf, /etc/sysconfig/o2cb
  • more info here
Categories: Clouding, HA, HA & HPC, Oracle Cloud Tags:

SaaS, PaaS, IaaS cloud differences in three illustrations

May 21st, 2013 No comments







Categories: Clouding Tags: , ,

resolved – change xen vm root password

May 21st, 2013 No comments

You can change Virtual Machine(xen) root password with following ways:

losetup -f #to check the next usable loop device
#vgs;pvs #if LVM is implemented in Virtual Machine
losetup <output of losetup -f> System.img #associate loop devices with regular file System.img. Read/Write to /dev/loop<x> will be redirected to System.img
fdisk -l /dev/loop0
  • If there’re multiple partitions:
kpartx -av /dev/loop0
#vgchange -a y <VGroup> #may need run vgscan first
#mount /dev/mapper/<vg name>-<lv name> /mnt
mount -t ext3 /dev/mapper/<partition name of /etc> /mnt
  • If there’s only one root partition:

#vgchange -a y <VGroup>

mount /dev/loop0 /mnt

After mounting, you can change root password now:

vi /mnt/etc/rc.local #echo password | passwd –stdin root
umount /mnt
#vgchange -a n <VGroup>
kpartx -d /dev/loop0
losetup  -d /dev/loop0
vi /etc/rc.local #comment out “echo test| passwd –stdin root”
After all these steps, boot up the VM using xm create vm.cfg, and you’ll find password for root has been changed.
Categories: Clouding, Oracle Cloud Tags:

difference between paused and suspended

May 2nd, 2013 No comments

Note: this is from book <Oracle VM Implementation and Administration Guide>

  • Paused

This state preserves the machine’s current settings and application
states without releasing system resources, allowing the machine to resume
this state with a short load period. In this state, the virtual machine
consumes memory and disk resources but very little CPU resources. When
in the paused state, you can unpause the virtual machine.

  • Suspended

In this state, the machine’s current settings and application

states are preserved by saving them to respective files and essentially
turning off the virtual machine, releasing system resources and allowing
the machine to resume the same settings, applications, and processes upon
leaving the state. In this state, the virtual machine only consumes disk
resources. From the suspended state, you can resume the virtual machine.

  • Summary

Both the paused and suspended states offer the ability to stop the virtual machine
in its exact current operating state and to return the machine to that state upon
resuming. The difference between the two states is the manner in which the
machine’s settings are preserved and how the resources that the virtual machine
uses are affected.
Putting a virtual machine into the paused state simply stops the execution of
further commands momentarily, much like a Windows desktop enters Sleep mode.
In the paused state, the machine’s applications and settings are left in the state that
they were in when the paused state was entered—simply stopped. The settings and
application states are not saved to files that are then used upon resuming; they are
simply stopped as they are. This allows for a fairly short load period upon resuming,
but the virtual machine also continues to utilize (some of) the machine’s resources.
If the desire is to simply halt execution of the virtual machine for a short period
of time and to restart it quickly, then you should choose to pause. This option
provides a fast restart but will hold system resources. If the VM Server were to fail,
the paused system will be lost and require recovery if applicable.
Suspending a virtual machine essentially turns the machine off while preserving
its current settings. Application states, data, and other settings are copied to their
respective files and any resources used by the virtual machine are released. Upon
resuming a suspended virtual machine, after an initial load period, during which the
machine retrieves its settings and application states from these saved files and resumes
use of the server resources, all applications resume the same state they were in.
Suspending a system is a good option if you want to stop the system at a
particular point in time and keep it in that state for an extended period of time.
However, if you want to not use the system for a while, simply shutting it down
might be a better option.

Categories: Clouding Tags:

xen netfront netback

May 2nd, 2013 No comments

Note: this is from

Xen hypervisor itself doesn’t has any networking support (in fact, no any other device drivers except the console). All the networking infrastructure is done in dom0.

Dom0 and domU use the split network driver to communicate. Dom0 will create vif. using netback, and connect it with the virtual interface in domU, which is initialed by netfront.

Dom0 also has vif0.0, vif0.1, etc and veth0, veth1, etc. But it’s absolutely different things, and nothing to do with netback/netfront. They are created by the net loopback driver, mainly for dom0 to communicate with the bridged networking. The net loopback driver is obsolete in xen 3.2+ (dom0 will use the bridge directly).

The net loopback driver usually compiled to the kernel. To prevent it from creating the looped vifs, by passing “netloop.nloopbacks=0″ to the kernel command line.

You can attach virtual network interface to Dom0, just as for domU:

# xm network-attach 0

By default, the newly attached vif will named vif0.0. So the name will fail duo to conflicting with the loopback vifs. To make it work, either prevent the net loopback driver from creating these vifs, or specify a different vif name when attaching:

# xm network-attach 0 vifname=<uniq-vif-name>

Even after netback create the vif, then netfront driver in Dom0 still cannot initial it. See the codes in drivers/xen/netfront/netfront.c:netif_init:

if (is_initial_xendomain())
    return 0;


You can find more details on (about Oracle VM, but applicable to XEN)

Categories: Clouding Tags:

xen domu to dom0 interaction – PV channel and QEMU-DM

May 2nd, 2013 No comments

Note: This is from book <Oracle VM Implementation and Administration Guide>.

Because of the interaction between domU and dom0, several communication channels are created between the two.

  • In a PV environment, a communication channel is created between dom0 and each domU, and a shared memory channel is created for each domU that is used for the backend drivers.
  • In an HVM environment, the Qemu-DM handles the interception of system calls that are made. Each domU has a Qemu-DM daemon, which allows for the use of network and I/Os from the virtual machine.
Categories: Clouding, Oracle Cloud Tags:

Resolved – ORA-00001: unique constraint (OVS.OVS_OS_RESOURCE_PK) violated

May 2nd, 2013 No comments

If you met error “Resolved – ORA-00001: unique constraint (OVS.OVS_OS_RESOURCE_PK) violated” during starting up OVM oc4j, you can try the following workaround:

  • log on OVM vm and connect to oracle XE

[oracle@test-host ~]$ ps -ef|grep LISTENER
oracle 1726 1 0 Apr28 ? 00:00:00 /usr/lib/oracle/xe/app/oracle/product/10.2.0/server/bin/tnslsnr LISTENER -inherit
oracle 24737 24712 0 03:53 pts/0 00:00:00 grep LISTENER
[oracle@test-host ~]$
[oracle@test-host ~]$ export ORACLE_SID=XE

[oracle@test-host ~]$export ORACLE_HOME=’/usr/lib/oracle/xe/app/oracle/product/10.2.0/server’

[oracle@test-host ~]$ $ORACLE_HOME/bin/sqlplus / as sysdba

SQL*Plus: Release – Production on Thu May 2 03:54:22 2013

Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Express Edition Release – Production


  • Now backup OVS.OVS_OS_RESOURCE table

SQL> create table OVS.OVS_OS_RESOURCE_backup as select * from OVS.OVS_OS_RESOURCE;

  • Now we can truncate the table OVS.OVS_OS_RESOURCE

SQL> commit;

  • Now you can redo the oc4j restart, the error message should gone
  • After oc4j is up, you should now recover OVS.OVS_OS_RESOURCE to its original state

SQL> select OWNER,TABLE_NAME from dba_all_tables t where t.TABLE_NAME like ‘OVS_OS_RESOURCE%’;

—————————— ——————————


Categories: Clouding, Oracle Cloud Tags: