How to Confuse ASM

Don’t try this one at home folks.

I was adding a disk to an already existing diskgroup, and I think I must have got a little bit excited as I enabled one node in the cluster to see the new device, but I forgot about the second node and proceeded blindly to add it into the diskgroup without the first node having access to the device.

Definitely my bad, but ASM got into quite a tangle. This is my attempt to add the device into the diskgroup:

SQL> alter diskgroup DATA4 add disk 'ORCL:VOL7';


alter diskgroup DATA4 add disk 'ORCL:VOL7';
*
ERROR at line1:
ORA-15032: not all alterations performed
ORA-15075: disk(s) are not visible cluster-wide

This is the entry in ASM alert log was the following:

Wed Aug 13 12:31:04 2008
SQL> alter diskgroup DATA4 add disk ‘ORCL:VOL7′
Wed Aug 13 12:31:04 2008
NOTE: reconfiguration of group 4/0xf0384f6d (DATA4), full=1
Wed Aug 13 12:31:04 2008
NOTE: initializing header on grp 4 disk VOL7
NOTE: cache opening disk 2 of grp 4: VOL7 label:VOL7
NOTE: PST update: grp = 4
NOTE: requesting all-instance disk validation for group=4
Wed Aug 13 12:31:04 2008
NOTE: disk validation pending for group 4/0xf0384f6d (DATA4)
SUCCESS: validated disks for 4/0xf0384f6d (DATA4)
Wed Aug 13 12:31:05 2008
NOTE: requesting all-instance membership refresh for group=4
Wed Aug 13 12:31:05 2008
NOTE: membership refresh pending for group 4/0xf0384f6d (DATA4)
SUCCESS: refreshed membership for 4/0xf0384f6d (DATA4)
Wed Aug 13 12:31:08 2008
WARNING: offlining disk 2.3915956128 (VOL7) with mask 0x3
NOTE: PST update: grp = 4, dsk = 2, mode = 0x6
NOTE: PST update: grp = 4, dsk = 2, mode = 0x4
NOTE: cache closing disk 2 of grp 4: VOL7
NOTE: PST update: grp = 4
NOTE: requesting all-instance membership refresh for group=4
Wed Aug 13 12:31:14 2008
NOTE: membership refresh pending for group 4/0xf0384f6d (DATA4)
NOTE: cache closing disk 2 of grp 4: VOL7

So this node has in fact initialised the header information on the new disk but has failed to properly put the disk into diskgroup.

Now here is the killer, once I had enabled the other node to have access to the new device also, I could not just come along and add this device back in to the diskgroup:


SQL> alter diskgroup DATA4 add disk 'ORCL:VOL7';

alter diskgroup DATA4 add disk 'ORCL:VOL7'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15033: disk 'ORCL:VOL7' belongs to diskgroup "DATA4"

Oh great, it already is part of the diskgroup, that must be job done. However checking V$ASM_DISK_STAT reveals:

GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST NAME
------------ ----------- ------- ------------ ------- ------------------------------
0	       6 CLOSED  MEMBER       ONLINE
1	       0 CACHED  MEMBER       ONLINE  VOL1
2	       0 CACHED  MEMBER       ONLINE  VOL2
3	       0 CACHED  MEMBER       ONLINE  VOL3
4	       0 CACHED  MEMBER       ONLINE  VOL4

So my disk (in this case number 6) is marked as a member but has a mount_state of closed, it also does not have a name.

hmm. So lets try dropping this diskgroup then:


SQL> alter diskgroup DATA4 drop disk 'ORCL:VOL7';

alter diskgroup DATA4 drop disk 'ORCL:VOL7'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15054: disk "ORCL:VOL7" does not exist in diskgroup "DATA4"

Whooops!. So my new device both belongs diskgroup DATA4 and does not exist in DATA4.

Now is it me or is there something contradictory in those 2 statements?

The disk has been partially added and to get things back into a rational position we need to clear the ASM metadata from the device. The way to do this is with the dd command:

dd if=/dev/zero of=/dev/asm1 bs=8192 count=1000

It’s also recommended to run a rebalance command:

SQL> alter diskgroup DATA4 rebalance power 11;

Once this rebalance has completed (check V$ASM_OPERATION) and assuming you have ensured all nodes in your cluster can now see the new device you can add it back into the diskgroup as you originally intended.

Oracle claim this is not a bug, but it seems such stupid thing to do, and leaving the disk in such a limbo state, being both part of and not part of the diskgroup does not seem like a sensible outcome to me.

I believe 11g may behave more sensibly in this regard.

About these ads
Leave a comment

15 Comments

  1. Luca

     /  August 20, 2008

    Hi Jason,

    good post! This has happened to me too and it’s indeed quite an annoying ‘feature’ of ASM.

    Cheers,
    Luca

    Reply
  2. jarneil

     /  August 20, 2008

    Hi Luca,

    Thanks for reading!

    It just seems such a dumb thing for ASM to do, you can’t imagine that this was “designed” to happen!

    cheers,

    jason.

    Reply
    • deep

       /  October 16, 2009

      Hi Jason,

      I really thank you for this wonderful post.

      I have faced the same issue and this work around is absolutely worked with me.

      Pradeep

      Reply
  3. Neil

     /  November 16, 2009

    Sorry to say that this is still a “feature” in 11g…..

    Reply
    • jarneil

       /  November 17, 2009

      Hi Neil,

      Thanks for sharing! Was that 11gR2?

      jason.

      Reply
      • Jeff

         /  July 19, 2010

        I can confirm that oracle has indeed left this lovely “feature” in place in 11gR2. Why improve on perfection, right: ;)

        Anyway, thanks Jason for the article!

  4. Thank You :)
    Excellence

    Reply
  5. Aha…rebalance command resolve issue. Wonderful sharing for alter diskgroup DATA4 rebalance power 11 command.

    Reply
  6. Vijay

     /  August 20, 2010

    Thanks, Jason.

    It was indeed a great post. Many Thanks.

    What I really liked about it is the presentation style (simple yet lucid).

    We do have a disk (in our DATA disk-group) that’s currently in “limbo” state!

    Needless to say, it must be fixed and I will give it a shot using information given this post.

    Regards,
    Vijay

    Reply
  7. Mary

     /  July 27, 2011

    Hi Jason
    This post is a lifesaver! NOTHING on metalink about this or how to fix it. BTW – 11.1.0.7 ASM has exactly the same ‘undocumented feature’ as 10g.

    Reply
  8. mel

     /  July 28, 2011

    This works to resolve the issue in 11g:

    Use the option FORCE of the “create diskgroup” or “alter diskgroup add disk” to enforce overriding the previous content of the disk.

    I think I’d use the dd command as a last resort.

    Reply
    • Mike T

       /  January 25, 2012

      I hit this problem, and attempted FORCE in 11.1.0.7. No dice. Had to do the dd.

      Reply
  9. OJ

     /  October 4, 2011

    Hi Jason,

    Thanks a lot! The post may be 3 years old, but still helped to resolve our issue. Thanks again! :D

    Reply
  10. nobody

     /  February 16, 2012

    Thanks for this Jason. Another perfect example of why Oracle is the best db in the world! ROFL.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 52 other followers

%d bloggers like this: