- Each Exadata Database Machine has completely redundant hardware including redundant InfiniBand networking, redundant Power Distribution Units (PDU), redundant power upplies, and redundant database and storage servers.
- Oracle RAC protects against database server failure.
- ASM provides data mirroring to protect against disk or storage server failures.
- Oracle RMAN provides extremely fast and efficient backups to disk or tape.
- Oracle’s Flashback technology allows backing out user errors at the database, table or even row level.
- Using Oracle Data Guard, a second Exadata Database Machine can be configured to maintain a real-time copy of the database at a remote site to provide full protection against site failures and disasters.
Today we met a problem with semaphore and unable to start oracle instances. Here’s the error message:
ORA-27154: post/wait create failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpcreates
So it turns out, the max number of arrays have been reached:
#check limits of all IPC
root@doxer# ipcs -al
—— Shared Memory Limits ——–
max number of segments = 4096
max seg size (kbytes) = 67108864
max total shared memory (kbytes) = 17179869184
min seg size (bytes) = 1
—— Semaphore Limits ——–
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 1024000
max ops per semop call = 100
semaphore max value = 32767
—— Messages: Limits ——–
max queues system wide = 16
max size of message (bytes) = 65536
default max size of queue (bytes) = 65536
#check summary of semaphores
root@doxer# ipcs -su
—— Semaphore Status ——–
used arrays = 127
allocated semaphores = 16890
To resolve this, we need increase value of max number of semaphore arrays:
root@doxer# cat /proc/sys/kernel/sem
250 1024000 100 128
^—needs to be increased
Here’s an example with toilets that describes differences between mutex and semaphore LOL http://koti.mbnet.fi/niclasw/MutexSemaphore.html
In one sentence, to backup Oracle with OS(BCV for example), database should be put into hot backup mode. But RMAN backups can be performed while the database is online.
Also, oracle GoldenGate is used to replicate DB between heterogeneous systems, for example, oracle replicated to mysql/sql server etc.
Here’s some excerpts from oracle document about oracle golden gate:
Robust Modular Architecture
The Oracle GoldenGate software architecture is comprised of three primary components:
Capture, Trail Files, and Delivery. This modular approach allows each component to perform
its tasks independently of the others, accelerating data replication and ensuring data integrity.
Figure 1: Oracle GoldenGate leverages a component-based architecture to optimize real-time
information access and availability.
Oracle GoldenGate’s Capture module resides on the source database and looks for new
transactional activity. The Capture module reads the result of insert, update, and delete
operations by directly accessing the database transaction (redo) logs, and then immediately
captures new and changed data for distribution.
The Capture module only moves committed transactions—filtering out intermediate activities
and rolled-back operations—which not only reduces infrastructure load but also eliminates
potential data inconsistencies. Further optimization is achieved through transaction grouping
and optional compression features.
Oracle GoldenGate 11g can also capture messages from JMS messaging systems to deliver to
heterogeneous databases in real time for scalable and reliable data distribution.
- Trail Files
Oracle GoldenGate’s Trail Files contain the database operations for the changed data in a
transportable, platform-independent data format. Trail Files are a critical component within
Oracle GoldenGate’s optimized queuing mechanism. They reside on the source and/or target
server but exist outside of the database to ensure heterogeneity, improved reliability, and
minimal data loss. This architecture minimizes impact to the source system because no
additional tables or queries to the database are required to support the data capture process.
The Capture module reads once, and then immediately moves the captured data to the external
Trail File for delivery to the target(s).
In the event of an outage at the source and/or target, the Trail Files contain the most-recent
data up to the point of the outage, and the data is applied once the systems are online again.
Oracle GoldenGate’s Delivery module takes the changed data from the latest Trail File and
applies it to the target database using native SQL for the appropriate relational database
management system. Delivery can be made to any open database connectivity–compliant
database. The Delivery module applies each transaction in the same order as it was committed
and within the same transactional context as at the source, enabling consistency and referential
integrity at the target. To enhance IT flexibility, captured data can also be delivered to a Java
Message Service destination or as a flat file using Oracle GoldenGate Application Adapters.
For full documentation, you can refer to the following pdf file: http://www.oracle.com/us/products/middleware/data-integration/goldengate11g-ds-168062.pdf?ssSourceSiteId=otnen
Today we encountered a problem when oracle failed to functioning. After some checking, this error was caused by /tmp running out of space. This also confirmed by OS logs:
Jun 20 17:43:59 tmpfs: [ID 518458 kern.warning] WARNING: /tmp: File system full, swap space limit exceeded
Oracle uses /tmp to compile PL/SQL code, so if there no space it unable to compile/execute. Which causing functions/procedures/packeges and trigers to timeout. The same also described in oracle note: ID 1389623.1
So in order to prevent further occurrences of this error, we should increase /tmp on the system to at least 4Gb.
There is an Oracle parameter to change the default location of these temporary files(_ncomp_shared_objects_dir), but it’s not a dynamic parameter. And also, while there is a way to resize a tmpfs filesystem online but it’s somehow risky. So the best idea is that, we firstly bring down Oracle DB on this host, then modify /etc/vfstab, and then reboot the whole system. This way will protect our data against the risk of corruption or lost etc, also it’ll have some outage time.
So finally, here’s the steps:
Amend the line in /etc/vfstab from:
swap – /tmp tmpfs – yes size=512m
swap – /tmp tmpfs – yes size=4096m
Reboot machine and bring up oracle DB
If your oracle DB is at a very high concurrency and you find that the top sqls are some views, then there’s a quick way to resolve this: using oracle materialized view. You may consider setting the refresh interval to one hour which means the view will refresh every hour. After the setting go live, you’ll find the normal performance will appear.
For more information about oracle materialized view, you can visit http://en.wikipedia.org/wiki/Materialized_view
Here’s a image with high oracle concurrency: