Jagjeet's Oracle Blog!

June 30, 2010

Recover Corrupt OCR with no backup

Filed under: RAC — Tags: , , — Jagjeet Singh @ 7:43 am

While working with OCR backup and recovery scenarios, one question came to my mind what if my OCR gets corrupted and there is no backup available to recover. This is rare but still it happens as Oracle takes backups after every 4 hours. I wanted to explore if there is any other way we can get business back and running without installing whole clusterware which could be a lengthy process.

If you are on 10g R2 and later version then this can be done without re-installing Clusterware ( if you have backup of root.sh or it’s not overwritten by any
subsequent patch ) I simulated this on my test machines (OracleVM).

Current Voting Disk

[root@rac1 ~]# crsctl query css votedisk
 0. 0 /OCFS/VOT

located 1 votedisk(s).

 

Current OCR Files location

[root@rac1 ~]# ocrcheck

Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262144
Used space (kbytes) : 4344
Available space (kbytes) : 257800
ID : 601339441

Device/File Name : /OCFS/OCR
Device/File integrity check succeeded

Device/File Name : /OCFS/OCR2
Device/File integrity check succeeded

Cluster registry integrity check succeeded

 

Output of CRS_STAT

[root@rac1 ~]# crs_stat -t

Name Type Target State Host
————————————————————
ora….SM1.asm application ONLINE ONLINE rac1
ora….C1.lsnr application ONLINE ONLINE rac1
ora.rac1.gsd   application ONLINE ONLINE rac1
ora.rac1.ons   application ONLINE ONLINE rac1
ora.rac1.vip   application ONLINE ONLINE rac1
ora….SM2.asm application ONLINE ONLINE rac2
ora….C2.lsnr application ONLINE ONLINE rac2
ora.rac2.gsd   application ONLINE ONLINE rac2
ora.rac2.ons   application ONLINE ONLINE rac2
ora.rac2.vip   application ONLINE ONLINE rac2
ora.test.AP.cs application ONLINE ONLINE rac1
ora….st1.srv application ONLINE ONLINE rac1
ora.test.db    application ONLINE ONLINE rac2
ora….t1.inst application ONLINE ONLINE rac1
ora….t2.inst application ONLINE ONLINE rac2

I stopped clusterware on both nodes and removed OCR & Voting Disks.

[root@rac1 ~]# ls -lrt /OCFS/*

-rw-r–r– 1 root root 399507456 Jun 29 14:05 /OCFS/OCR2
-rw-r—– 1 root oinstall 10485760 Jun 29 14:05 /OCFS/OCR
-rw-r–r– 1 oracle oinstall 10240000 Jun 29 14:05 /OCFS/VOT

[root@rac1 ~]# rm -fr /OCFS/*

Tried again to start Cluster

[root@rac1 ~]# crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly

 

Clusterware could not startup.

[root@rac1 ~]# crsctl check crs

Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM

Thrown error in /tmp/crsct.* file about OCR

[root@rac1 ~]# cat /tmp/crsc*

OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]

 

Here, I lost all my OCR & Voting disk.

Below procedure can be used for recovery.

1) Execute rootdelete.sh script from All Nodes.
2) Execute rootdeinstall.sh from Primary Node.
3) Run root.sh from Primary node.
4) Run root.sh from all remaining nodes.
5) Execute remaining configurations (ONS,netca,register required resources)

1) Executing rootdelete.sh on all nodes, this script can be found under $ORA_CRS_HOME/install/

[root@rac1 ~]# /u01/app/oracle/product/crs/install/rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down…
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in ‘/etc/oracle/scls_scr’

[root@rac2 ~]# /u01/app/oracle/product/crs/install/rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down…
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in ‘/etc/oracle/scls_scr’

OCR initialization error can be safely ignored.

2) Execute rootdeinstall.sh on Primary Node, this script can also be found under $ORA_CRS_HOME/install

[root@rac1 ~]# /u01/app/oracle/product/crs/install/rootdeinstall.sh
Removing contents from OCR mirror device
2560+0 records in
2560+0 records out
10485760 bytes (10 MB) copied, 0.031627 seconds, 332 MB/s
Removing contents from OCR device
2560+0 records in
2560+0 records out
10485760 bytes (10 MB) copied, 0.029947 seconds, 350 MB/s

3) Run root.sh on Primary node, this will create VOT & OCR files.

[root@rac1 ~]# $ORA_CRS_HOME/root.sh
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01′ is not owned by root
“/OCFS/VOT” does not exist. Create it before proceeding.
Make sure that this file is shared across cluster nodes.
1

I had to touch this file to proceed

[root@rac1 ~]# touch /OCFS/VOT
[root@rac1 ~]# $ORA_CRS_HOME/root.sh
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01′ is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01′ is not owned by root
assigning default hostname rac1 for node 1.
assigning default hostname rac2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: rac1 rac1-priv rac1
node 2: rac2 rac2-priv rac2
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.
Now formatting voting device: /OCFS/VOT
Format of 1 voting devices complete.
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
rac1
CSS is inactive on these nodes.
rac2
Local node checking complete.
Run root.sh on remaining nodes to start CRS daemons.

4) Run root.sh from all remaining nodes.

[root@rac2 crs]# ./root.sh
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01′ is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01′ is not owned by root
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
assigning default hostname rac1 for node 1.
assigning default hostname rac2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: rac1 rac1-priv rac1
node 2: rac2 rac2-priv rac2
clscfg: Arguments check out successfully.

NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
rac1
rac2
CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps

Creating VIP application resource on (2) nodes…
Creating GSD application resource on (2) nodes…
Creating ONS application resource on (2) nodes…
Starting VIP application resource on (2) nodes…
Starting GSD application resource on (2) nodes…
Starting ONS application resource on (2) nodes…

Done.

Clusterware is up and running

[root@rac2 crs]# crs_stat -t

Name Type Target State Host
————————————————————
ora.rac1.gsd application ONLINE ONLINE rac1
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip application ONLINE ONLINE rac2
ora.rac2.gsd application ONLINE ONLINE rac2
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip application ONLINE ONLINE rac2

5) Remaining Configuration

       a) Configuring Server side ONS

[root@rac1 crs]# $ORA_CRS_HOME/bin/racgons add_config rac1:6200 rac2:6200

         b) Listener Configuration usint netca

you might want to remove listener.ora from both nodes as entries may exist already. Take backup or orignial listener.ora  and use netca to configure &
register with OCR. Till 10g, we can not register listener using srvctl

Renaming orginal listener.ora

[oracle@rac1 ~]$ mv $ORACLE_HOME/network/admin/listener.ora $ORACLE_HOME/network/admin/listener.ora.orig
[oracle@rac1 ~]$ ssh rac2 mv $ORACLE_HOME/network/admin/lstener.ora $ORACLE_HOME/network/admin/listener.ora.orig

          c)Adding ASM, Instance, Database

[oracle@rac1 ~]$ srvctl add asm -i +ASM1 -n rac1 -o /u01/app/oracle/product/10.2.0/db_1
[oracle@rac1 ~]$ srvctl add asm -i +ASM2 -n rac2 -o /u01/app/oracle/product/10.2.0/db_1
[oracle@rac1 ~]$ srvctl add database -d test -o /u01/app/oracle/product/10.2.0/db_1
[oracle@rac1 ~]$ srvctl add instance -d test -i test1 -n rac1
[oracle@rac1 ~]$ srvctl add instance -d test -i test2 -n rac2

I restarted both nodes, got everything back. Yes, Services can be re-created.

[oracle@rac1 ~]$ crs_stat -t

Name Type Target State Host
————————————————————
ora….SM1.asm application ONLINE ONLINE rac1
ora….C1.lsnr application ONLINE ONLINE rac1
ora.rac1.gsd   application ONLINE ONLINE rac1
ora.rac1.ons   application ONLINE ONLINE rac1
ora.rac1.vip   application ONLINE ONLINE rac1
ora….SM2.asm application ONLINE ONLINE rac2
ora….C2.lsnr application ONLINE ONLINE rac2
ora.rac2.gsd   application ONLINE ONLINE rac2
ora.rac2.ons   application ONLINE ONLINE rac2
ora.rac2.vip   application ONLINE ONLINE rac2
ora.test.db    application ONLINE ONLINE rac2
ora….t1.inst application ONLINE ONLINE rac1
ora….t2.inst application ONLINE ONLINE rac2

Note : That’s why it’s recommended to take backup of root.sh after fresh install. Subsequent patches can
overwrite root.sh script.

This is described on Metalink Note: 399482.1

About these ads

5 Comments »

  1. Hi.

    You forgot to add dependencies between ASM and database instances…

    odenysenko

    Comment by odenysenko — January 16, 2011 @ 7:02 pm

  2. You put really great efforts because this type of documents are very few on Net.

    Comment by Jyoti M Misra — January 19, 2011 @ 3:09 pm

  3. Jagjeet,

    Just to mention that things are quite different when the Voting Disk and OCR would be on the ASM as what we have in 11.2. Other than that, it was a good write up.

    Aman….

    Comment by Aman.... — September 22, 2011 @ 12:15 pm

  4. Hi Jagjeet,

    Thanks for sharing ideas, Good work!!

    We specially request can you please shed some light on bench marking stuff on Exadata or can be a normal database also where we can learn some differnet pros and cons or ideas about bench marking stuff…

    Thanks for your help.

    Keep going…

    Thanks

    Mohammed.

    Comment by Mohammed — June 29, 2012 @ 3:45 am

  5. Just right with this write-up, I personally assume this site wants a lot more consideration. I’ll most likely end up being once more to understand a lot more, thank you for which info.

    Comment by Legal Help — August 13, 2013 @ 10:14 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Silver is the New Black Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 41 other followers

%d bloggers like this: