Aychin's Oracle RDBMS Blog

Only for Advanced level Professionals

Stretch Cluster or Highly Available System

What is the Oracle Database Cluster, it is a grid of interconnected instances on separated nodes with one shared storage. And it is the main concept of any Clustered system. But there is one single point of failure and it is a storage. If we loose storage our nodes will be useless. The best solution will be distributing data across nodes. We can use replication for this purpose, but it will be a pain to maintain it. Starting from Oracle 11g R2, there is two new possibilities within ASM technology, that will help us to build Highly Available System or Stretch Cluster. It is when general storage, that based on ASM, have a diskgroup that consists of disks from local storages of cluster member nodes. In other words each node in the cluster shares its local disks with others, each of them can access others disks. And data is equally distributed, mirrored between this nodes. In this case, if we loose one node with its storage, two others will function without interruption, because it’s data have mirrored copies on other nodes, thanks to ASM mirroring. Main think is that, it can be made from the cheep servers with local disk storages.

It was also possible to do in 10g version, but because of some next issues it was very inconvenient:

  • The risk that because of network failure, or some other failure, the connection to disk will be lost
  • The huge network traffic
  • OCR and Voting Disks mirroring and redundancy

In 10g ASM, when we loose connectivity, not data, to the some disk, ASM automatically drops this disk. Then after we take it back, rebalancing  operation will occur to redistribute mirrors back. All this operations is very resource consuming. Also in 10g, instance can read data only from primary ASM extent and not from it’s mirrored copy, when one instance reads data from that part of storage which resides on local disk of the other node, huge network IO will be generated! We also must consider OCR and Vote Disk mirroring and redundancy, we need to share by one disk from each node to store mirrors of this files, it adds additional overheads and complexity to the configuration.

Three new features of 11g R2 Database solves this issues:

  • The attribute of the diskgroup DISK_REPAIR_TIME
  • New ASM instance initialization parameter asm_preferred_read_failure_groups
  • Now we can store OCR and Voting Disk files on the ASM diskgroup and benefit from it’s mirroring capabilities

Now, we can specify disk group attribute DISK_REPAIR_TIME, which by default is 3.6 hours. It means that if we loose connection to the disk, it will be offlined, but not dropped. System will wait DISK_REPAIR_TIME minutes and if disk will be back in this time frame, only changes that was made in the period of absence will be synchronized with this disk. It will be much faster than dropping and adding it back. The new initialization parameter of the ASM instance asm_preferred_read_failure_groups is used to specify failure groups, separated by comma, in the format <DISKGROUP NAME>.<FAILUREGROUP NAME>, it means, that we can specify local disks of the node as preferred disks to read. It means that, now in 11g, instance can read not only from primary ASM data extent, but from it’s mirror copy also. Also solved the problem with OCR and Vote Disk file mirroring, now we can store them directly on the ASM diskgroup and benefit from it’s mirroring capabilities. This files will be mirrored on the failure groups and tolerates loss of the same number of disks as the underlying disk group.

The optimal number of nodes for Stretch Cluster will be 3 nodes, because ASM maximum supports 3-way mirroring, high redundancy.  Let’s assume that we have 3 nodes: Node1, Node2 and Node3. To build 3-node Stretch Cluster we need first of all to share their disks, they must have access to disk devices of each other. You can use, for example, ATA over Ethernet (AoE) network protocol that will share disks over interconnect network. Ask Your system administrator to perform this task for You. The main schema of the Cluster will look like this:

Stretch Cluster

If we have 3 cheep servers that separately couldn’t provide desired level of productivity and availability, then we can build on top of them the system with highest level of availability that possible to build. As You can see from the figure, each node have a local storage disk /dev/disk/d1. After sharing them over Ethernet using AoE protocol, each server will have access to disks of other ones. On Node1 it will be it’s local /dev/disk/d1 and /dev/dsk/d1n2 from Node2 and /dev/dsk/d1n3 from Node3. On Node2 it will be it’s local /dev/disk/d1 and /dev/dsk/d1n1 from Node1 and /dev/dsk/d1n3 from Node3. On Node3 it will be it’s local /dev/disk/d1 and /dev/dsk/d1n1 from Node1 and /dev/dsk/d1n2 from Node2. Now, after installation of 11g R2 Grid Infrastructure, we can create ASM diskgroup DGROUP1 that will consist from this 3 disks. DGROUP1 will be created as high redundancy protection mode disk group and each disk will be in its separate failure group FGROUP1, FGROUP2 and FGROUP3 respectively. To avoid network I/O overhead we will use asm_preferred_read_failure_groups parameter on each instance, it will force instances to read data from local copies of mirrored ASM extents. As the result, we have the system that can survive crash of 2 nodes from 3 existing ones, because each node have it is copy of data extents, as we can see from the figure, each local storage have extents of green, blue and red colors.

If we will add one remote Physical Standby, to this database, we will have really deathless system. Also, please note that it is just theory, this configuration doesn’t tested by me. If You have done some tests with such kind of configuration, please comment Your observations.


(c) Aychin Gasimov, 08/2011, Azerbaijan Republic


Advertisements

8 responses to “Stretch Cluster or Highly Available System

  1. fthib January 3, 2012 at 17:42

    Hello,

    thanks for this use case of asm_preferred_read_failure_groups parameters.
    But how do you ensure that the data are write to the local group ?
    At this time we cannot force the streeping direction ?

    I had seen that a parameters like ASM_PREFERRED_FAILURE_GROUPS will be exist but I not sure.

    Thank a lot.

  2. goran July 7, 2012 at 12:12

    Hi Aychin,

    Thanks for a nice post … very interesting topic.

    Did you already build this POC case?

    Any load tests done? Did you measured what additional network traffic is caused by ASM mirroring if load test done?

    would be great if you can share some load figures.

    What make me concerns is ASM mirroring over interconnect … if mirroring saturate network bandwidth it will lead to node evictions.

    Did you considered using dedicated network for ASM mirroring or maybe VLAN?

    regards,
    goran

    • aychin July 17, 2012 at 12:07

      Hi, sorry for delayed response.

      No, I didn’t built this case, it is just theory that must be tested. And it will be very good if someone
      will test this case and post results here.

      And yes, sure it will be better to separate ASM mirroring, it can be made on OS layer. It will be perfect to have two
      interconnect networs one for cluster interconnect and one for sharing disks.

      Best Regards,
      Aychin

  3. gurbanadigozalov March 4, 2013 at 06:39

    Hi Aychin Thank you for your post. I think this theory justify if count of nodes is equal to 2 if we have normal redundany
    and 3 node if we have high redundancy.

    For.example if we have 6 nodes and 6 DG

    A B C D E F
    AM,CF BM,DF CM,EF DM,AF EM,FF FM,BF

    A node name
    AM Main DG of node A
    CF Failgroup of DG from node C
    In this case chance that node will read from local disk is equal to 20%.

  4. graco pack n play May 13, 2013 at 19:42

    Write more, thats all I have to say. Literally, it seems as
    though you relied on the video to make your point. You clearly know what youre talking about, why throw away your intelligence on
    just posting videos to your weblog when you could be giving us something informative to
    read?

  5. white desk hutch June 5, 2013 at 06:35

    Wonderful blog! I found it while surfing around on Yahoo News.

    Do you have any tips on how to get listed in Yahoo News?

    I’ve been trying for a while but I never seem to get there! Thank you

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: