What is the Oracle Database Cluster, it is a grid of interconnected instances on separated nodes with one shared storage. And it is the main concept of any Clustered system. But there is one single point of failure and it is a storage. If we loose storage our nodes will be useless. The best solution will be distributing data across nodes. We can use replication for this purpose, but it will be a pain to maintain it. Starting from Oracle 11g R2, there is two new possibilities within ASM technology, that will help us to build Highly Available System or Stretch Cluster. It is when general storage, that based on ASM, have a diskgroup that consists of disks from local storages of cluster member nodes. In other words each node in the cluster shares its local disks with others, each of them can access others disks. And data is equally distributed, mirrored between this nodes. In this case, if we loose one node with its storage, two others will function without interruption, because it’s data have mirrored copies on other nodes, thanks to ASM mirroring. Main think is that, it can be made from the cheep servers with local disk storages.
It was also possible to do in 10g version, but because of some next issues it was very inconvenient:
- The risk that because of network failure, or some other failure, the connection to disk will be lost
- The huge network traffic
- OCR and Voting Disks mirroring and redundancy
In 10g ASM, when we loose connectivity, not data, to the some disk, ASM automatically drops this disk. Then after we take it back, rebalancing operation will occur to redistribute mirrors back. All this operations is very resource consuming. Also in 10g, instance can read data only from primary ASM extent and not from it’s mirrored copy, when one instance reads data from that part of storage which resides on local disk of the other node, huge network IO will be generated! We also must consider OCR and Vote Disk mirroring and redundancy, we need to share by one disk from each node to store mirrors of this files, it adds additional overheads and complexity to the configuration.
Three new features of 11g R2 Database solves this issues:
- The attribute of the diskgroup DISK_REPAIR_TIME
- New ASM instance initialization parameter asm_preferred_read_failure_groups
- Now we can store OCR and Voting Disk files on the ASM diskgroup and benefit from it’s mirroring capabilities
Now, we can specify disk group attribute DISK_REPAIR_TIME, which by default is 3.6 hours. It means that if we loose connection to the disk, it will be offlined, but not dropped. System will wait DISK_REPAIR_TIME minutes and if disk will be back in this time frame, only changes that was made in the period of absence will be synchronized with this disk. It will be much faster than dropping and adding it back. The new initialization parameter of the ASM instance asm_preferred_read_failure_groups is used to specify failure groups, separated by comma, in the format <DISKGROUP NAME>.<FAILUREGROUP NAME>, it means, that we can specify local disks of the node as preferred disks to read. It means that, now in 11g, instance can read not only from primary ASM data extent, but from it’s mirror copy also. Also solved the problem with OCR and Vote Disk file mirroring, now we can store them directly on the ASM diskgroup and benefit from it’s mirroring capabilities. This files will be mirrored on the failure groups and tolerates loss of the same number of disks as the underlying disk group.
The optimal number of nodes for Stretch Cluster will be 3 nodes, because ASM maximum supports 3-way mirroring, high redundancy. Let’s assume that we have 3 nodes: Node1, Node2 and Node3. To build 3-node Stretch Cluster we need first of all to share their disks, they must have access to disk devices of each other. You can use, for example, ATA over Ethernet (AoE) network protocol that will share disks over interconnect network. Ask Your system administrator to perform this task for You. The main schema of the Cluster will look like this:
If we have 3 cheep servers that separately couldn’t provide desired level of productivity and availability, then we can build on top of them the system with highest level of availability that possible to build. As You can see from the figure, each node have a local storage disk /dev/disk/d1. After sharing them over Ethernet using AoE protocol, each server will have access to disks of other ones. On Node1 it will be it’s local /dev/disk/d1 and /dev/dsk/d1n2 from Node2 and /dev/dsk/d1n3 from Node3. On Node2 it will be it’s local /dev/disk/d1 and /dev/dsk/d1n1 from Node1 and /dev/dsk/d1n3 from Node3. On Node3 it will be it’s local /dev/disk/d1 and /dev/dsk/d1n1 from Node1 and /dev/dsk/d1n2 from Node2. Now, after installation of 11g R2 Grid Infrastructure, we can create ASM diskgroup DGROUP1 that will consist from this 3 disks. DGROUP1 will be created as high redundancy protection mode disk group and each disk will be in its separate failure group FGROUP1, FGROUP2 and FGROUP3 respectively. To avoid network I/O overhead we will use asm_preferred_read_failure_groups parameter on each instance, it will force instances to read data from local copies of mirrored ASM extents. As the result, we have the system that can survive crash of 2 nodes from 3 existing ones, because each node have it is copy of data extents, as we can see from the figure, each local storage have extents of green, blue and red colors.
If we will add one remote Physical Standby, to this database, we will have really deathless system. Also, please note that it is just theory, this configuration doesn’t tested by me. If You have done some tests with such kind of configuration, please comment Your observations.
(c) Aychin Gasimov, 08/2011, Azerbaijan Republic