Archive

Archive for the ‘Kernel’ Category

/proc filesystem – day 1

May 24th, 2013 No comments

Assumption:

[root@centos-doxer proc]# env|grep SSH_TTY
SSH_TTY=/dev/pts/3

[root@centos-doxer proc]# ps -ef|grep ‘root@pts/3′
root 3157 2185 0 08:11 ? 00:00:00 sshd: root@pts/3

[root@centos-doxer 3157]# cd /proc/3157

Now we’re going to see /proc/[pid]/:

/proc/[pid]/cmdline
This holds the complete command line for the process, unless the
process is a zombie. In the latter case, there is nothing in
this file: that is, a read on this file will return 0 charac-
ters. The command-line arguments appear in this file as a set
of null-separated strings, with a further null byte (‘\0′) after
the last string.
[root@centos-doxer 3157]# cat cmdline
sshd: root@pts/3
/proc/[pid]/cwd
This is a symbolic link to the current working directory of the
process. To find out the current working directory of process
20, for instance, you can do this:

$ cd /proc/20/cwd; /bin/pwd
[root@centos-doxer cwd]# cd /proc/3157/cwd
[root@centos-doxer cwd]# pwd -P
/
[root@centos-doxer cwd]# pwdx 3157
3157: /
Under Linux 2.0 and earlier /proc/[pid]/exe is a pointer to the
binary which was executed, and appears as a symbolic link.
[root@centos-doxer 3157]# ls -l exe
lrwxrwxrwx 1 root root 0 May 24 08:11 exe -> /usr/sbin/sshd
[root@centos-doxer 3157]# readlink exe
/usr/sbin/sshd
/proc/[pid]/fd
This is a subdirectory containing one entry for each file which
the process has open, named by its file descriptor, and which is
a symbolic link to the actual file. Thus, 0 is standard input,
1 standard output, 2 standard error, etc.
/proc/self/fd/N is approximately the same as /dev/fd/N in some
Unix and Unix-like systems. Most Linux MAKEDEV scripts symboli-
cally link /dev/fd to /proc/self/fd, in fact.

Most systems provide symbolic links /dev/stdin, /dev/stdout, and
/dev/stderr, which respectively link to the files 0, 1, and 2 in
/proc/self/fd. Thus the example command above could be written
as:

$ foobar -i /dev/stdin -o /dev/stdout …
[root@centos-doxer 3157]# ls -l fd/
total 0
lrwx—— 1 root root 64 May 24 08:39 0 -> /dev/null
lrwx—— 1 root root 64 May 24 08:39 1 -> /dev/null
lrwx—— 1 root root 64 May 24 08:39 2 -> /dev/null
lrwx—— 1 root root 64 May 24 08:39 3 -> socket:[10835]
lrwx—— 1 root root 64 May 24 08:39 4 -> socket:[10866]
lr-x—— 1 root root 64 May 24 08:39 5 -> pipe:[10870]
l-wx—— 1 root root 64 May 24 08:39 6 -> pipe:[10870]
lrwx—— 1 root root 64 May 24 08:39 7 -> /dev/ptmx
lrwx—— 1 root root 64 May 24 08:11 8 -> /dev/ptmx
lrwx—— 1 root root 64 May 24 08:39 9 -> /dev/ptmx

[root@centos-doxer 3157]# ls -l /dev/stdin
lrwxrwxrwx 1 root root 15 May 24 08:07 /dev/stdin -> /proc/self/fd/0
[root@centos-doxer 3157]# ls -l /proc/self/fd/0
lrwx—— 1 root root 64 May 24 08:44 /proc/self/fd/0 -> /dev/pts/3
[root@centos-doxer 3157]#
[root@centos-doxer 3157]# ls -l /dev/stdout
lrwxrwxrwx 1 root root 15 May 24 08:07 /dev/stdout -> /proc/self/fd/1
[root@centos-doxer 3157]# ls -l /proc/self/fd/1
lrwx—— 1 root root 64 May 24 08:44 /proc/self/fd/1 -> /dev/pts/3
[root@centos-doxer 3157]#
[root@centos-doxer 3157]# ls -l /dev/stderr
lrwxrwxrwx 1 root root 15 May 24 08:07 /dev/stderr -> /proc/self/fd/2
[root@centos-doxer 3157]# ls -l /proc/self/fd/2
lrwx—— 1 root root 64 May 24 08:44 /proc/self/fd/2 -> /dev/pts/3

/proc/[pid]/fdinfo/ (since kernel 2.6.22)
This is a subdirectory containing one entry for each file which
the process has open, named by its file descriptor. The con-
tents of each file can be read to obtain information about the
corresponding file descriptor, for example:

$ cat /proc/12015/fdinfo/4
[root@centos-doxer 3157]# ls -l fdinfo/
total 0
-r——– 1 root root 0 May 24 08:45 0
-r——– 1 root root 0 May 24 08:45 1
-r——– 1 root root 0 May 24 08:45 2
-r——– 1 root root 0 May 24 08:45 3
-r——– 1 root root 0 May 24 08:45 4
-r——– 1 root root 0 May 24 08:45 5
-r——– 1 root root 0 May 24 08:45 6
-r——– 1 root root 0 May 24 08:45 7
-r——– 1 root root 0 May 24 08:45 8
-r——– 1 root root 0 May 24 08:45 9
/proc/[pid]/maps
A file containing the currently mapped memory regions and their
access permissions.

The format is:

address perms offset dev inode pathname
08048000-08056000 r-xp 00000000 03:0c 64593 /usr/sbin/gpm
08056000-08058000 rw-p 0000d000 03:0c 64593 /usr/sbin/gpm
08058000-0805b000 rwxp 00000000 00:00 0
40000000-40013000 r-xp 00000000 03:0c 4165 /lib/ld-2.2.4.so
40013000-40015000 rw-p 00012000 03:0c 4165 /lib/ld-2.2.4.so
4001f000-40135000 r-xp 00000000 03:0c 45494 /lib/libc-2.2.4.so
40135000-4013e000 rw-p 00115000 03:0c 45494 /lib/libc-2.2.4.so
4013e000-40142000 rw-p 00000000 00:00 0
bffff000-c0000000 rwxp 00000000 00:00 0

where “address” is the address space in the process that it
occupies, “perms” is a set of permissions:

r = read
w = write
x = execute
s = shared
p = private (copy on write)

“offset” is the offset into the file/whatever, “dev” is the
device (major:minor), and “inode” is the inode on that device.
0 indicates that no inode is associated with the memory region,
as the case would be with BSS (uninitialized data).
[root@centos-doxer 3157]# cat maps
2b98ef380000-2b98ef3e1000 r-xp 00000000 fd:00 5348968 /usr/sbin/sshd
2b98ef5e1000-2b98ef5e4000 rw-p 00061000 fd:00 5348968 /usr/sbin/sshd
2b98ef5e4000-2b98ef5ed000 rw-p 2b98ef5e4000 00:00 0
……

[root@centos-doxer 3157]# pmap -d 3157
3157: sshd: root@pts/3
Address Kbytes Mode Offset Device Mapping
00002b98ef380000 388 r-x– 0000000000000000 0fd:00000 sshd
00002b98ef5e1000 12 rw— 0000000000061000 0fd:00000 sshd
00002b98ef5e4000 36 rw— 00002b98ef5e4000 000:00000 [ anon ]
00002b98ef5ed000 112 r-x– 0000000000000000 0fd:00000 ld-2.5.so
……
mapped: 98380K writeable/private: 1180K shared: 2560K
[root@centos-doxer 3157]# ps aux|egrep ‘USER|3157′
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 3157 0.0 0.3 90192 3460 ? Ss 08:11 0:01 sshd: root@pts/3
root 3521 0.0 0.0 6056 552 pts/3 R+ 09:08 0:00 egrep USER|3157
[root@centos-doxer 3157]# cat stat | awk ‘{print $23 / 1024}’
90192

[root@centos-doxer 3157]# top -p 3157 -n 1
top – 08:58:15 up 51 min, 7 users, load average: 0.00, 0.00, 0.00
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.4%us, 0.4%sy, 0.1%ni, 92.2%id, 6.3%wa, 0.1%hi, 0.6%si, 0.0%st
Mem: 1026080k total, 728268k used, 297812k free, 22092k buffers
Swap: 2064376k total, 0k used, 2064376k free, 332616k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3157 root 15 0 90192 3460 2684 S 0.0 0.3 0:01.41 sshd

VIRT = Virtual Image (kb)
RES = Resident size (kb)
%MEM = Memory usage (RES)
SHR = Shared Mem size (kb)

/proc/[pid]/status
Provides much of the information in /proc/[pid]/stat and
/proc/[pid]/statm in a format that’s easier for humans to parse.
Here’s an example:

$ cat /proc/$$/status
Name: bash
State: S (sleeping)
Tgid: 3515
Pid: 3515
PPid: 3452
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 100 100 100 100
FDSize: 256
Groups: 16 33 100
VmPeak: 9136 kB
VmSize: 7896 kB
VmLck: 0 kB
VmHWM: 7572 kB
VmRSS: 6316 kB
VmData: 5224 kB
VmStk: 88 kB
VmExe: 572 kB
VmLib: 1708 kB
VmPTE: 20 kB
Threads: 1
SigQ: 0/3067
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000010000
SigIgn: 0000000000384004
SigCgt: 000000004b813efb
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed: 00000001

stop)”, “Z (zombie)”, or “X (dead)”.

* Tgid: Thread group ID (i.e., Process ID).

* Pid: Thread ID (see gettid(2)).

* TracerPid: PID of process tracing this process (0 if not being
traced).

* Uid, Gid: Real, effective, saved set, and file system UIDs
(GIDs).

* FDSize: Number of file descriptor slots currently allocated.

* Groups: Supplementary group list.

* VmPeak: Peak virtual memory size.

* VmSize: Virtual memory size.

* VmLck: Locked memory size.

* VmHWM: Peak resident set size (“high water mark”).

* VmRSS: Resident set size.

* VmData, VmStk, VmExe: Size of data, stack, and text segments.

* VmLib: Shared library code size.

* VmPTE: Page table entries size (since Linux 2.6.10).

* Threads: Number of threads in process containing this thread.

* SigPnd, ShdPnd: Number of signals pending for thread and for
process as a whole (see pthreads(7) and signal(7)).

* SigBlk, SigIgn, SigCgt: Masks indicating signals being
blocked, ignored, and caught (see signal(7)).

* CapInh, CapPrm, CapEff: Masks of capabilities enabled in
inheritable, permitted, and effective sets (see capabili-
ties(7)).

* CapBnd: Capability Bounding set (since kernel 2.6.26, see
capabilities(7)).

* Cpus_allowed: Mask of CPUs on which this process may run
(since Linux 2.6.24, see cpuset(7)).

* Cpus_allowed_list: Same as previous, but in “list format”
(since Linux 2.6.26, see cpuset(7)).

* Mems_allowed: Mask of memory nodes allowed to this process
numerical thread ID ([tid]) of the thread (see gettid(2)).
Within each of these subdirectories, there is a set of files
with the same names and contents as under the /proc/[pid] direc-
tories. For attributes that are shared by all threads, the con-
tents for each of the files under the task/[tid] subdirectories
will be the same as in the corresponding file in the parent
/proc/[pid] directory (e.g., in a multithreaded process, all of
the task/[tid]/cwd files will have the same value as the
/proc/[pid]/cwd file in the parent directory, since all of the
threads in a process share a working directory). For attributes
that are distinct for each thread, the corresponding files under
task/[tid] may have different values (e.g., various fields in
each of the task/[tid]/status files may be different for each
thread).

In a multithreaded process, the contents of the /proc/[pid]/task
directory are not available if the main thread has already ter-
minated (typically by calling pthread_exit(3)).

[root@centos-doxer 3157]# cat status
Name: sshd
State: S (sleeping)
SleepAVG: 98%
Tgid: 3157
Pid: 3157
PPid: 2185
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 64
Groups:
VmPeak: 90224 kB
VmSize: 90192 kB
VmLck: 0 kB
VmHWM: 3460 kB
VmRSS: 3460 kB
VmData: 728 kB
VmStk: 88 kB
VmExe: 388 kB
VmLib: 6228 kB
VmPTE: 188 kB
StaBrk: 2b98f6299000 kB
Brk: 2b98f62d4000 kB
StaStk: 7fff6bde15f0 kB
Threads: 1
SigQ: 1/8191
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000001000
SigCgt: 0000000180014006
CapInh: 0000000000000000
CapPrm: 00000000fffffeff
CapEff: 00000000fffffeff
Cpus_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed: 00000000,00000001
/proc/[pid]/smaps (since Linux 2.6.14)
This file shows memory consumption for each of the process’s
mappings. For each of mappings there is a series of lines such
as the following:

08048000-080bc000 r-xp 00000000 03:02 13130 /bin/bash
Size: 464 kB

This file is only present if the CONFIG_MMU kernel configuration
option is enabled.
[root@centos-doxer 3157]# cat smaps
2b98ef380000-2b98ef3e1000 r-xp 00000000 fd:00 5348968 /usr/sbin/sshd
Size: 388 kB
Rss: 332 kB
Shared_Clean: 332 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Swap: 0 kB
Pss: 78 kB
2b98ef5e1000-2b98ef5e4000 rw-p 00061000 fd:00 5348968 /usr/sbin/sshd
Size: 12 kB
Rss: 12 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 12 kB
Swap: 0 kB
Pss: 12 kB
……
/proc/[pid]/oom_adj (since Linux 2.6.11)
This file can be used to adjust the score used to select which
process should be killed in an out-of-memory (OOM) situation.
The kernel uses this value for a bit-shift operation of the
process’s oom_score value: valid values are in the range -16 to
+15, plus the special value -17, which disables OOM-killing
altogether for this process. A positive score increases the
likelihood of this process being killed by the OOM-killer; a
negative score decreases the likelihood. The default value for
this file is 0; a new process inherits its parent’s oom_adj set-
ting. A process must be privileged (CAP_SYS_RESOURCE) to update
this file.

/proc/[pid]/oom_score (since Linux 2.6.11)
This file displays the current score that the kernel gives to
this process for the purpose of selecting a process for the OOM-
killer. A higher score means that the process is more likely to
be selected by the OOM-killer. The basis for this score is the
amount of memory used by the process, with increases (+) or
decreases (-) for factors including:

* whether the process creates a lot of children using fork(2)
(+);

* whether the process has been running a long time, or has used
a lot of CPU time (-);

* whether the process has a low nice value (i.e., > 0) (+);

* whether the process is privileged (-); and

* whether the process is making direct hardware access (-).

The oom_score also reflects the bit-shift adjustment specified
by the oom_adj setting for the process.
/proc/[pid]/stat
Status information about the process. This is used by ps(1).
It is defined in /usr/src/linux/fs/proc/array.c.

The fields, in order, with their proper scanf(3) format speci-
fiers, are:

pid %d The process ID.

comm %s The filename of the executable, in parentheses.
This is visible whether or not the executable is
swapped out.

state %c One character from the string “RSDZTW” where R is
running, S is sleeping in an interruptible wait, D
is waiting in uninterruptible disk sleep, Z is zom-
bie, T is traced or stopped (on a signal), and W is
paging.

ppid %d The PID of the parent.

pgrp %d The process group ID of the process.

session %d The session ID of the process.

tty_nr %d The controlling terminal of the process. (The minor
device number is contained in the combination of
bits 31 to 20 and 7 to 0; the major device number is
in bits 15 t0 8.)

tpgid %d The ID of the foreground process group of the con-
trolling terminal of the process.

flags %u (%lu before Linux 2.6.22)
The kernel flags word of the process. For bit mean-
ings, see the PF_* defines in <linux/sched.h>.
Details depend on the kernel version.

minflt %lu The number of minor faults the process has made
which have not required loading a memory page from
disk.

cminflt %lu The number of minor faults that the process’s
waited-for children have made.

majflt %lu The number of major faults the process has made
which have required loading a memory page from disk.

cmajflt %lu The number of major faults that the process’s
waited-for children have made.

cutime %ld Amount of time that this process’s waited-for chil-
dren have been scheduled in user mode, measured in
clock ticks (divide by sysconf(_SC_CLK_TCK). (See
also times(2).) This includes guest time,
cguest_time (time spent running a virtual CPU, see
below).

cstime %ld Amount of time that this process’s waited-for chil-
dren have been scheduled in kernel mode, measured in
clock ticks (divide by sysconf(_SC_CLK_TCK).

priority %ld
(Explanation for Linux 2.6) For processes running a
real-time scheduling policy (policy below; see
sched_setscheduler(2)), this is the negated schedul-
ing priority, minus one; that is, a number in the
range -2 to -100, corresponding to real-time priori-
ties 1 to 99. For processes running under a non-
real-time scheduling policy, this is the raw nice
value (setpriority(2)) as represented in the kernel.
The kernel stores nice values as numbers in the
range 0 (high) to 39 (low), corresponding to the
user-visible nice range of -20 to 19.

Before Linux 2.6, this was a scaled value based on
the scheduler weighting given to this process.

nice %ld The nice value (see setpriority(2)), a value in the
range 19 (low priority) to -20 (high priority).

num_threads %ld
Number of threads in this process (since Linux 2.6).
Before kernel 2.6, this field was hard coded to 0 as
a placeholder for an earlier removed field.

itrealvalue %ld
The time in jiffies before the next SIGALRM is sent
to the process due to an interval timer. Since ker-
nel 2.6.17, this field is no longer maintained, and
is hard coded as 0.

starttime %llu (was %lu before Linux 2.6)
The time in jiffies the process started after system
boot.

vsize %lu Virtual memory size in bytes.

rss %ld Resident Set Size: number of pages the process has
in real memory. This is just the pages which count
towards text, data, or stack space. This does not
include pages which have not been demand-loaded in,
or which are swapped out.
kstkesp %lu The current value of ESP (stack pointer), as found
in the kernel stack page for the process.

kstkeip %lu The current EIP (instruction pointer).

signal %lu The bitmap of pending signals, displayed as a deci-
mal number. Obsolete, because it does not provide
information on real-time signals; use
/proc/[pid]/status instead.

blocked %lu The bitmap of blocked signals, displayed as a deci-
mal number. Obsolete, because it does not provide
information on real-time signals; use
/proc/[pid]/status instead.

sigignore %lu
The bitmap of ignored signals, displayed as a deci-
mal number. Obsolete, because it does not provide
information on real-time signals; use
/proc/[pid]/status instead.

sigcatch %lu
The bitmap of caught signals, displayed as a decimal
number. Obsolete, because it does not provide
information on real-time signals; use
/proc/[pid]/status instead.

wchan %lu This is the “channel” in which the process is wait-
ing. It is the address of a system call, and can be
looked up in a namelist if you need a textual name.
(If you have an up-to-date /etc/psdatabase, then try
ps -l to see the WCHAN field in action.)

nswap %lu Number of pages swapped (not maintained).

cnswap %lu Cumulative nswap for child processes (not main-
tained).

exit_signal %d (since Linux 2.1.22)
Signal to be sent to parent when we die.

processor %d (since Linux 2.2.8)
CPU number last executed on.

rt_priority %u (since Linux 2.5.19; was %lu before Linux 2.6.22)
Real-time scheduling priority, a number in the range
1 to 99 for processes scheduled under a real-time
policy, or 0, for non-real-time processes (see
sched_setscheduler(2)).

policy %u (since Linux 2.5.19; was %lu before Linux 2.6.22)
Scheduling policy (see sched_setscheduler(2)).
Decode using the SCHED_* constants in linux/sched.h.
clock ticks (divide by sysconf(_SC_CLK_TCK).
[root@centos-doxer 3157]# cat stat
3157 (sshd) S 2185 3157 3157 0 -1 4202752 1159 409 0 0 41 124 0 0 15 0 1 0 29275 92356608 870 18446744073709551615 47935848448000 47935848844780 140735003104752 18446744073709551615 47935894114547 0 0 4096 81926 0 0 0 17 0 0 0 22

/proc/[pid]/statm
Provides information about memory usage, measured in pages. The
columns are:

size total program size
(same as VmSize in /proc/[pid]/status)
resident resident set size
(same as VmRSS in /proc/[pid]/status)
share shared pages (from shared mappings)
text text (code)
lib library (unused in Linux 2.6)
data data + stack
dt dirty pages (unused in Linux 2.6)
[root@centos-doxer 3157]# cat statm
22548 870 674 97 0 204 0

Categories: Kernel, Linux, Systems Tags:

cpu hyperthreading vs dual core

May 14th, 2013 No comments

Note: This is from http://www.richweb.com/cpu_info

A hyperthreaded processor has the same number of function units as an older, non-hyperthreaded processor. It just has two execution contexts, so it can maybe achieve better function unit utilization by letting more than one program execute concurrently. On the other hand, if you’re running two programs which compete for the same function units, there is no advantage at all to having both running “concurrently.” When one is running, the other is necessarily waiting on the same function units.

A dual core processor literally has two times as many function units as a single-core processor, and can really run two programs concurrently, with no competition for function units.

A dual core processor is built so that both cores share the same level 2 cache. A dual processor (separate physical cpus) system differs in that each cpu will have its own level 2 cache. This may sound like an advantage, and in some situations it can be but in many cases new research and testing shows that the shared cache can be faster when the cpus are sharing the same or very similar tasks.

In general Hyperthreading is considered older technology and is no longer supported in newer cpus. Hyperthreading can provide a marginal (10%) for some server workloads like mysql, but dual core technology has essentially replaced hyperthreading in newer systems.

A dual core cpu running at 3.0Ghz should be faster then a dual cpu (separate core) system running at 3.0Ghz due to the ability to share the cache at higher bus speeds.

The examples below details how we determine what kind of cpu(s) are present.

The kernel data Linux exposes in /proc/cpuinfo will show each logical cpu with a unique processor number. A logical cpu can be a hyperthreading sibling, a shared core in a dual or quad core, or a separate physical cpu. We must look at the siblings, cpu cores and core id to tell the difference.

If the number of cores = the number of siblings for a given physical processor, then hyperthreading is OFF.

/bin/cat /proc/cpuinfo | /bin/egrep ‘processor|model name|cache size|core|sibling|physical’

 

Example 1: Single processor, 1 core, no Hyperthreading

processor	: 0
model name	: AMD Duron(tm) processor
cache size	: 64 KB

 

Example 2: Single processor, 1 core, Hyperthreading is enabled.

Notice how we have 2 siblings, but only 1 core. The physical cpu id is the same for both: 0.

processor	: 0
model name	: Intel(R) Pentium(R) 4 CPU 2.80GHz
cache size	: 1024 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
processor	: 1
model name	: Intel(R) Pentium(R) 4 CPU 2.80GHz
cache size	: 1024 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1

 

Example 3. Single socket Quad Core

Notice how each processor has its own core id. The number of siblings matches the number of cores so there are no Hyperthreading siblings. Also notice the huge l2 cache – 6 MB. That makes sense though, when considering 4 cores share that l2 cache.

processor	: 0
model name	: Intel(R) Xeon(R) CPU           E5410  @ 2.33GHz
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
processor	: 1
model name	: Intel(R) Xeon(R) CPU           E5410  @ 2.33GHz
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 4
processor	: 2
model name	: Intel(R) Xeon(R) CPU           E5410  @ 2.33GHz
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 2
cpu cores	: 4
processor	: 3
model name	: Intel(R) Xeon(R) CPU           E5410  @ 2.33GHz
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4

 

Example 3a. Single socket Dual Core

Again, each processor has its own core so this is a dual core system.

 

processor	: 0
model name	: Intel(R) Pentium(R) D CPU 3.00GHz
cache size	: 2048 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
processor	: 1
model name	: Intel(R) Pentium(R) D CPU 3.00GHz
cache size	: 2048 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2

 

Example 4. Dual Single core CPU, Hyperthreading ENABLED

This example shows that processer 0 and 2 share the same physical cpu and 1 and 3 share the same physical cpu. The number of siblings is twice the number of cores, which is another clue that this is a system with hyperthreading enabled.

 

processor	: 0
model name	: Intel(R) Xeon(TM) CPU 3.60GHz
cache size	: 1024 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
processor	: 1
model name	: Intel(R) Xeon(TM) CPU 3.60GHz
cache size	: 1024 KB
physical id	: 3
siblings	: 2
core id		: 0
cpu cores	: 1
processor	: 2
model name	: Intel(R) Xeon(TM) CPU 3.60GHz
cache size	: 1024 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
processor	: 3
model name	: Intel(R) Xeon(TM) CPU 3.60GHz
cache size	: 1024 KB
physical id	: 3
siblings	: 2
core id		: 0
cpu cores	: 1

 

Example 5. Dual CPU Dual Core No hyperthreading

Of the 5 examples this should be the most capable system processor-wise. There are a total of 4 cores; 2 cores in 2 separate socketed physical cpus. Each core shares the 4MB cache with its sibling core. The higher clock rate (3.0 Ghz vs 2.3Ghz) should offer slightly better performance than example 3.

 

processor	: 0
model name	: Intel(R) Xeon(R) CPU            5160  @ 3.00GHz
cache size	: 4096 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
processor	: 1
model name	: Intel(R) Xeon(R) CPU            5160  @ 3.00GHz
cache size	: 4096 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
processor	: 2
model name	: Intel(R) Xeon(R) CPU            5160  @ 3.00GHz
cache size	: 4096 KB
physical id	: 3
siblings	: 2
core id		: 0
cpu cores	: 2
processor	: 3
model name	: Intel(R) Xeon(R) CPU            5160  @ 3.00GHz
cache size	: 4096 KB
physical id	: 3
siblings	: 2
core id		: 1
cpu cores	: 2

PS:
For explanation about flags in linux /proc/cpuinfo, you can refer to following:
http://blog.incase.de/index.php/cpu-feature-flags-and-their-meanings/
Categories: Kernel, Linux Tags:

resolved – differences between zfs ARC L2ARC ZIL

January 31st, 2013 No comments
  • ARC

zfs ARC(adaptive replacement cache) is a very fast cache located in the server’s memory.

For example, our ZFS server with 12GB of RAM has 11GB dedicated to ARC, which means our ZFS server will be able to cache 11GB of the most accessed data. Any read requests for data in the cache can be served directly from the ARC memory cache instead of hitting the much slower hard drives. This creates a noticeable performance boost for data that is accessed frequently.

  • L2ARC

As a general rule, you want to install as much RAM into the server as you can to make the ARC as big as possible. At some point, adding more memory is just cost prohibitive. That is where the L2ARC becomes important. The L2ARC is the second level adaptive replacement cache. The L2ARC is often called “cache drives” in the ZFS systems.

L2ARC is a new layer between Disk and the cache (ARC) in main memory for ZFS. It uses dedicated storage devices to hold cached data. The main role of this cache is to boost the performance of random read workloads. The intended L2ARC devices include 10K/15K RPM disks like short-stroked disks, solid state disks (SSD), and other media with substantially faster read latency than disk.

  • ZIL

ZIL(ZFS Intent Log) exists for performance improvement on synchronous writes. Synchronous write is very slow than asynchronous write, but it’s more stable. Essentially, the intent log of a file system is nothing more than an insurance against power failures, a to-do list if you will, that keeps track of the stuff that needs to be updated on disk, even if the power fails (or something else happens that prevents the system from updating its disks).

To get better performance, use separated disks(SSD) for ZIL, such as zpool add pool log c2d0.

Now I’m giving you an true example about zfs ZIL/L2ARC/ARC on SUN ZFS 7320 head:

test-zfs# zpool iostat -v exalogic
capacity operations bandwidth
pool alloc free read write read write
————————- —– —– —– —– —– —–
exalogic 6.78T 17.7T 53 1.56K 991K 25.1M
mirror 772G 1.96T 6 133 111K 2.07M
c0t5000CCA01A5FDCACd0 – - 3 36 57.6K 2.07M #these are the physical disks
c0t5000CCA01A6F5CF4d0 – - 2 35 57.7K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A6F5D00d0 – - 2 36 56.2K 2.07M
c0t5000CCA01A6F64F4d0 – - 2 35 57.3K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A76A7B8d0 – - 2 36 56.3K 2.07M
c0t5000CCA01A746CCCd0 – - 2 36 56.8K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A749A88d0 – - 2 35 56.7K 2.07M
c0t5000CCA01A759E90d0 – - 2 35 56.1K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A767FDCd0 – - 2 35 56.1K 2.07M
c0t5000CCA01A782A40d0 – - 2 35 57.1K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A782D10d0 – - 2 35 57.2K 2.07M
c0t5000CCA01A7465F8d0 – - 2 35 56.3K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A7597FCd0 – - 2 35 57.6K 2.07M
c0t5000CCA01A7828F4d0 – - 2 35 56.2K 2.07M
mirror 772G 1.96T 5 133 110K 2.07M
c0t5000CCA01A7829ACd0 – - 2 35 57.1K 2.07M
c0t5000CCA01A78278Cd0 – - 2 35 57.4K 2.07M
mirror 772G 1.96T 6 133 111K 2.07M
c0t5000CCA01A736000d0 – - 3 35 57.3K 2.07M
c0t5000CCA01A738000d0 – - 2 35 57.3K 2.07M
c0t5000A72030061B82d0 224M 67.8G 0 98 1 1.62M #ZIL(SSD write cache, ZFS Intent Log)
c0t5000A72030061C70d0 224M 67.8G 0 98 1 1.62M
c0t5000A72030062135d0 223M 67.8G 0 98 1 1.62M
c0t5000A72030062146d0 224M 67.8G 0 98 1 1.62M
cache – - – - – -
c2t2d0 334G 143G 15 6 217K 652K #L2ARC(SSD cache drives)
c2t3d0 332G 145G 15 6 215K 649K
c2t4d0 333G 144G 11 6 169K 651K
c2t5d0 333G 144G 13 6 192K 650K
c2t2d0 – - 0 0 0 0
c2t3d0 – - 0 0 0 0
c2t4d0 – - 0 0 0 0
c2t5d0 – - 0 0 0 0

And as for ARC:

test-zfs:> status memory show
Memory:
Cache 63.4G bytes #ARC
Unused 17.3G bytes
Mgmt 561M bytes
Other 491M bytes
Kernel 14.3G bytes

Categories: Kernel, NAS, SAN, Storage Tags: ,

resolved – bnx2i dev eth0 does not support iscsi

September 19th, 2012 No comments

There’s a weird incident occurred on a linux box. The linux box turned not responsible to ping or ssh, although from ifconfig and /proc/net/bonding/bond0 file, the system said it’s running ok. After some google work, I found that the issue may related to the NIC driver. I tried bring down/bring up NICs one by one, but got error:

Bringing up loopback interface bond0: bnx2i: dev eth0 does not support iscsi

bnx2i: iSCSI not supported, dev=eth0

bonding: no command found in slaves file for bond bond0. Use +ifname or -ifname

At last, I tried restart the whole network i.e. /etc/init.d/network restart. And that did the trick, the networking was then running ok and can ping/ssh to it without problem.

resolved – semget failed with status 28 failed oracle database starting up

August 2nd, 2012 No comments

Today we met a problem with semaphore and unable to start oracle instances. Here’s the error message:

ORA-27154: post/wait create failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpcreates

So it turns out, the max number of arrays have been reached:
#check limits of all IPC
root@doxer# ipcs -al

—— Shared Memory Limits ——–
max number of segments = 4096
max seg size (kbytes) = 67108864
max total shared memory (kbytes) = 17179869184
min seg size (bytes) = 1

—— Semaphore Limits ——–
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 1024000
max ops per semop call = 100
semaphore max value = 32767

—— Messages: Limits ——–
max queues system wide = 16
max size of message (bytes) = 65536
default max size of queue (bytes) = 65536

#check summary of semaphores
root@doxer# ipcs -su

—— Semaphore Status ——–
used arrays = 127
allocated semaphores = 16890

To resolve this, we need increase value of max number of semaphore arrays:

root@doxer# cat /proc/sys/kernel/sem
250 1024000 100 128
^—needs to be increased

PS:

Here’s an example with toilets that describes differences between mutex and semaphore LOL http://koti.mbnet.fi/niclasw/MutexSemaphore.html

Categories: Kernel, Oracle DB Tags:

Resolved – bash /usr/bin/find Arg list too long

July 3rd, 2012 No comments

Have you ever met error like the following?

root@doxer# find /PRD/*/connectors/A01/QP*/*/logFiles/* -prune -name “*.log” -mtime +7 -type f |wc -l

bash: /usr/bin/find: Arg list too long

0

The cause of issue is kernel limitation for argument count which can be passed to find (as well as ls, and other utils). ARG_MAX defines

the maximum length of arguments for a new process. You can get the number of it using command:

root@doxer# getconf ARG_MAX
1048320

To quickly fix this, you can move your actions into the directory(replace * with subdir_NAME):

cd /PRD/subdir_NAME/connectors/A01/QP*/*/logFiles/;find . -prune -name “*.log” -mtime +7 -type f |wc -l

11382

PS:

  1. you can get all configuration values with getconf -a.
  2. For more solutions about the error “bash: /usr/bin/find: Arg list too long”, you can refer to http://www.in-ulm.de/~mascheck/various/argmax/
Categories: Kernel, Linux Tags: