RAC uses cluster interconnects to transfer blocks between the nodes participating in the cluster. A block transfer occurs when a user session connected to one instance requests a block being held in the cache of another instance. This feature of RAC to transfer information from the cache of one instance to the cache of the requesting instance is called cache fusion.
Oracle introduced Cache Fusion with Oracle 9i. Prior to Oracle 9i under Oracle Parallel Server, data was shared between users by forcing the instance holding the data to first write it to disk so that the requesting instance could then read it. With Cache Fusion, when users on one instance request data held in cache on another instance, the holding instance transfers the data across a cluster interconnect and avoids any writes and reads to disk. Disk I/O is significantly slower than cache transfers via the cluster interconnects.
The performance of the cluster interconnects is crucial to the performance of the RAC cluster and more specifically to the movement of cached data between instances. Its performance is measured by determining the average time a block takes to reach the requested node i.e., from the moment that a block was requested by an instance to the moment the requesting instance received the block. As in any application or database instance, occasional spikes of user activity are expected. However, when the average of such spikes remains high for an extended period of time, it could indicate a correctable performance degradation of the database cluster.
While high average GCS CR block receive time indicates possible interconnect performance issues, tuning and monitoring the following areas may help improve the performance of the cluster interconnects.
-
Ensure that dedicated private interconnects are configured between instances participating in the cluster for cache fusion activity.
-
The average GCS CR block receive time could also be influenced by a high value of the DB_FILE_MULTIBLOCK_READ_COUNT parameter. This is because a requesting process can issue more than one request for a block depending on the setting of this parameter and in turn the requesting process may have to wait longer. Thus, sizing of this parameter in a RAC environment should be based on the interconnect latency and the packet sizes defined by the hardware vendor and after taking into consideration operating system limitations. On certain platforms this could also be related to bug # 2475236. Bug#2475236 caused cr request timeouts due to the UDP parameters being hardcoded in the Oracle code.
-
Process scheduling priorities and process queues lengths at the operating system level would help identify if additional processors would be required based on the run queue backlogs.
-
The LMS background process (represents global cache services or GCS) of the requested instance may not be able to keep up with the requests from other instances and may cause the LMT background process (represents global enqueue services or GES) to queue new requests increasing its backlog and thus causing delayed response. The LMS process accepts requests on a First-In-First-Out (FIFO) basis. Oracle by default creates one LMS process for every two CPU’s on the system. While Oracle has not provided any direct method to tune the number of LMS processes, one method available is to increase them by modifying a hidden parameter _LM_DLMD_PROCS (see note below regarding modifying hidden parameters).
-
Multiple cluster interconnects can be configured and allocated to Oracle using the parameter CLUSTER_INTERCONNECTS. The parameter overrides the default interconnect setting at the operating system level with a preferred traffic network and in certain platforms disables the high availability feature of RAC.
-
Evaluate and reduce the number of full table scans on the database. Queries that retrieve data by performing full table scans could cause large cache movement between instances and thus cause significant load on the cluster interconnects.
Note: On certain platforms such as SUN, enabling the CLUSTER_INTERCONNECT parameter could disable the high availability feature of RAC.
Disclaimer: Hidden parameters (those that begin with an underscore) such as _LM_DLMD_PRCS should be modified with caution and only after first consulting with Oracle Support. It should be noted that Oracle could change or remove such parameters without notice in subsequent releases.
While single instance performance remains important, in a RAC or Oracle clustered environment, applications and users share multiple instances connected to the same physical copy of the database and the cluster interconnects plays a very critical role in the performance of these systems.
Start the discussion at forums.toadworld.com