Oracle RAC Interview questions and answers

What are Oracle Clusterware processes for 10g on Unix and Linux

Cluster Synchronization Services (ocssd) Manages cluster node membership and runs as the oracle user; failure of this process results in cluster restart.

Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application process, and so on) based on the resource's configuration information that is stored in the OCR. This includes start, stop, monitor and failover operations. This process runs as the root user

Event manager daemon (evmd) —A background process that publishes events that crs creates.

Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O fencing. OPROCD performs its check, stops running, and if the wake up is beyond the expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on Linux platforms.

RACG (racgmain, racgimon) —Extends clusterware to support Oracle-specific requirements and complex resources. Runs server callout scripts when FAN events occur.

What are Oracle database background processes specific to RAC

•LMS—Global Cache Service Process

•LMD—Global Enqueue Service Daemon

•LMON—Global Enqueue Service Monitor

•LCK0—Instance Enqueue Process

To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction, Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances.

What are Oracle Clusterware Components

Voting Disk — Oracle RAC uses the voting disk to manage cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. The voting disk must reside on shared disk.

Oracle Cluster Registry (OCR) — Maintains cluster configuration information as well as configuration information about any cluster database within the cluster. The OCR must reside on shared disk that is accessible by all of the nodes in your cluster

How do you troubleshoot node reboot

Please check metalink ...

Note 265769.1 Troubleshooting CRS Reboots
Note.559365.1 Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions.

How do you backup the OCR

There is an automatic backup mechanism for OCR. The default location is : $ORA_CRS_HOME\cdata\"clustername"\

To display backups :
#ocrconfig -showbackup
To restore a backup :
#ocrconfig -restore

With Oracle RAC 10g Release 2 or later, you can also use the export command:
#ocrconfig -export -s online, and use -import option to restore the contents back.
With Oracle RAC 11g Release 1, you can do a manaual backup of the OCR with the command:
# ocrconfig -manualbackup

How do you backup voting disk

#dd if=voting_disk_name of=backup_file_name

How do I identify the voting disk location

#crsctl query css votedisk

How do I identify the OCR file location

check /var/opt/oracle/ocr.loc or /etc/ocr.loc ( depends upon platform)
or
#ocrcheck

Is ssh required for normal Oracle RAC operation ?

"ssh" are not required for normal Oracle RAC operation. However "ssh" should be enabled for Oracle RAC and patchset installation.

What is SCAN?

Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g Release 2 feature that provides a single name for clients to access an Oracle Database running in a cluster. The benefit is clients using SCAN do not need to change if you add or remove nodes in the cluster.

Click here for more details from Oracle

What is the purpose of Private Interconnect ?

Clusterware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the the clustered nodes. This communication is based on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP). Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches of participating nodes in the cluster.

Why do we have a Virtual IP (VIP) in Oracle RAC?

Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you don't really have a good HA solution without using VIPs.
When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.

What do you do if you see GC CR BLOCK LOST in top 5 Timed Events in AWR Report?

This is most likely due to a fault in interconnect network.
Check netstat -s
if you see "fragments dropped" or "packet reassemblies failed" , Work with your system administrator find the fault with network.

How many nodes are supported in a RAC Database?

10g Release 2, support 100 nodes in a cluster using Oracle Clusterware, and 100 instances in a RAC database.

Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215, however sqlplus can start it on both nodes? How do you identify the problem?

Set the environmental variable SRVM_TRACE to true.. And start the instance with srvctl. Now you will get detailed error stack.

what is the purpose of the ONS daemon?

The Oracle Notification Service (ONS) daemon is an daemon started by the CRS clusterware as part of the nodeapps. There is one ons daemon started per clustered node.
The Oracle Notification Service daemon receive a subset of published clusterware events via the local evmd and racgimon clusterware daemons and forward those events to application subscribers and to the local listeners.

This in order to facilitate:

a. the FAN or Fast Application Notification feature or allowing applications to respond to database state changes.
b. the 10gR2 Load Balancing Advisory, the feature that permit load balancing accross different rac nodes dependent of the load on the different nodes. The rdbms MMON is creating an advisory for distribution of work every 30seconds and forward it via racgimon and ONS to listeners and applications.

27 comments:

  1. how many ethernet card and how many IPs in RAC ?
    What is the master daemon in RAC ?

    ReplyDelete
    Replies
    1. 2 NIC card and 3 IPs for each node in RAC for 10g. for 11g 3 extra IPs for SCAN

      Delete
  2. When did a node becomes as "MASTER NODE" ?

    ReplyDelete
    Replies
    1. The node with the lowest node number will become master node and dynamic remastering of the resources will take place.

      To find out the master node for particular resource, you can query v$ges_resource for MASTER_NODE column.

      To find out which is the master node, you can see ocssd.log file and search for "master node number".

      Delete
    2. when the first master node fails in the cluster the lowest node number will become master node

      Delete
  3. what is dynamic remastering ?
    When will the dynamic remastering happens?

    ReplyDelete
    Replies
    1. dynamic remastering is ability to move the ownership of resource from one instance to another instance in RAC. dynamic resource remastering is used to implement for resource affinity for increased performance. resource affinity optimized the system in situation where update transactions are being executed in one instance. when activity shift to another instance the resource affinity correspondingly move to another instance. If activity is not localized then resource ownership is hashed to the instance.

      In 10g dynamic remastering happens in file+object level.the process of remastering is very stringent.for one instance should touch more than 50 times than the other instance in particular period(say 10 mints). this touch ratio and time can be tuned by gc_affinity_limit and _gc_affinity_time parameter.

      Delete
  4. good questions and post in depth

    ReplyDelete
  5. why we maintaning odd number of voting disks?

    ReplyDelete
    Replies
    1. In oracle RAC A node must be able to access more than half of the voting disks at any time. For example, if you have five voting disks configured, then a node must be able to access at least three of the voting disks at any time. If a node cannot access the minimum required number of voting disks it is evicted, or removed, from the cluster.

      as 4 disks will not be any more highly available than 3 disks, 1/2 of 3 is 1.5...rounded to 2, 1/2 of 4 is 2, once we lose 2 disks, our cluster will fail with both 4 voting disks or 3 voting disks.

      Delete
    2. Odd number of disk are to avoid split brain, When Nodes in cluster can't talk to each other they run to lock the Voting disk and whoever lock the more disk will survive, if disk number are even there are chances that node might lock 50% of disk (2 out of 4) then how to decide which node to evict.
      whereas when number is odd, one will be higher than other and each for cluster to evict the node with less number.
      Thanks
      krishan

      Delete
  6. A node must be able to access more than half of the voting disks at any time
    For example, if you have three voting disks configured, then a node must be able to
    access at least two of the voting disks at any time. If a node cannot access the minimum required number of voting disks it is evicted, or removed, from the cluster.

    ReplyDelete
  7. Excellent sharing. Kindly provide us more questions and answers specially for RMAN with RAC.

    ReplyDelete
  8. Very Good blog and All Question--answers...

    Q.1 Can You explain a bit about checkpoint and local & Remote listener ?

    Q.2 regarding dbms scheduler jobs in RAC DB, i have observed that all scheduled jobs will run from one instance only. if manually run from other instance then it will run from that instance.
    does instance_stickness parameter has to do anything in this issue ?

    ReplyDelete
    Replies
    1. In RAC Database, we usually see both Local and Remote Listeners. Remote listener will be scan listener wherein it acts as a load balancer.

      Delete
  9. Its really a one and only blog which talks about RAC FAQ in the entire web.

    The article "When exactly during the installation process are clusterware components created?"
    is really very good. But its for 10g. If you update the blog with the 11gR2 would be really great

    ReplyDelete
  10. ANY IDEA ON LOCK Monitoring in RAC? and Can some one explain?

    ReplyDelete
  11. Hi,

    what is the meaning of Re-arps the world in VIP

    ReplyDelete
  12. Hey, i feeling that you have so much knowledge on moving services.
    Would you please give me some knowledge about global relocation services

    ReplyDelete
  13. How many nodes will be supported in 11g Rac?

    ReplyDelete
  14. Hi im eva Oracle consultant. I was just browsing blogs there I found your blog is interesting.. thanks for posting… keep on posting oracleconnections

    ReplyDelete
  15. how to find how many nodes are use in RAC setup?

    ReplyDelete
  16. You can use command olsnodes/lsnodes to find out the number of nodes in oracle RAC.

    ReplyDelete
  17. Oracle Real Application Cluster [10g/11 g R2 RAC]
    www.21cssindia.com/courses_view.html?id=1‎
    Oracle-Application Online Training, Oracle-Application training, Oracle-Application course contents, Oracle-Application , call us: +919000444287 ...

    ReplyDelete