How to Confuse ASM

Don’t try this one at home folks.

I was adding a disk to an already existing diskgroup, and I think I must have got a little bit excited as I enabled one node in the cluster to see the new device, but I forgot about the second node and proceeded blindly to add it into the diskgroup without the first node having access to the device.

Definitely my bad, but ASM got into quite a tangle. This is my attempt to add the device into the diskgroup:

SQL> alter diskgroup DATA4 add disk 'ORCL:VOL7';


alter diskgroup DATA4 add disk 'ORCL:VOL7';
*
ERROR at line1:
ORA-15032: not all alterations performed
ORA-15075: disk(s) are not visible cluster-wide

This is the entry in ASM alert log was the following:

Wed Aug 13 12:31:04 2008
SQL> alter diskgroup DATA4 add disk ‘ORCL:VOL7’
Wed Aug 13 12:31:04 2008
NOTE: reconfiguration of group 4/0xf0384f6d (DATA4), full=1
Wed Aug 13 12:31:04 2008
NOTE: initializing header on grp 4 disk VOL7
NOTE: cache opening disk 2 of grp 4: VOL7 label:VOL7
NOTE: PST update: grp = 4
NOTE: requesting all-instance disk validation for group=4
Wed Aug 13 12:31:04 2008
NOTE: disk validation pending for group 4/0xf0384f6d (DATA4)
SUCCESS: validated disks for 4/0xf0384f6d (DATA4)
Wed Aug 13 12:31:05 2008
NOTE: requesting all-instance membership refresh for group=4
Wed Aug 13 12:31:05 2008
NOTE: membership refresh pending for group 4/0xf0384f6d (DATA4)
SUCCESS: refreshed membership for 4/0xf0384f6d (DATA4)
Wed Aug 13 12:31:08 2008
WARNING: offlining disk 2.3915956128 (VOL7) with mask 0x3
NOTE: PST update: grp = 4, dsk = 2, mode = 0x6
NOTE: PST update: grp = 4, dsk = 2, mode = 0x4
NOTE: cache closing disk 2 of grp 4: VOL7
NOTE: PST update: grp = 4
NOTE: requesting all-instance membership refresh for group=4
Wed Aug 13 12:31:14 2008
NOTE: membership refresh pending for group 4/0xf0384f6d (DATA4)
NOTE: cache closing disk 2 of grp 4: VOL7

So this node has in fact initialised the header information on the new disk but has failed to properly put the disk into diskgroup.

Now here is the killer, once I had enabled the other node to have access to the new device also, I could not just come along and add this device back in to the diskgroup:


SQL> alter diskgroup DATA4 add disk 'ORCL:VOL7';

alter diskgroup DATA4 add disk 'ORCL:VOL7'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15033: disk 'ORCL:VOL7' belongs to diskgroup "DATA4"

Oh great, it already is part of the diskgroup, that must be job done. However checking V$ASM_DISK_STAT reveals:

GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST NAME
------------ ----------- ------- ------------ ------- ------------------------------
0	       6 CLOSED  MEMBER       ONLINE
1	       0 CACHED  MEMBER       ONLINE  VOL1
2	       0 CACHED  MEMBER       ONLINE  VOL2
3	       0 CACHED  MEMBER       ONLINE  VOL3
4	       0 CACHED  MEMBER       ONLINE  VOL4

So my disk (in this case number 6) is marked as a member but has a mount_state of closed, it also does not have a name.

hmm. So lets try dropping this diskgroup then:


SQL> alter diskgroup DATA4 drop disk 'ORCL:VOL7';

alter diskgroup DATA4 drop disk 'ORCL:VOL7'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15054: disk "ORCL:VOL7" does not exist in diskgroup "DATA4"

Whooops!. So my new device both belongs diskgroup DATA4 and does not exist in DATA4.

Now is it me or is there something contradictory in those 2 statements?

The disk has been partially added and to get things back into a rational position we need to clear the ASM metadata from the device. The way to do this is with the dd command:

dd if=/dev/zero of=/dev/asm1 bs=8192 count=1000

It’s also recommended to run a rebalance command:

SQL> alter diskgroup DATA4 rebalance power 11;

Once this rebalance has completed (check V$ASM_OPERATION) and assuming you have ensured all nodes in your cluster can now see the new device you can add it back into the diskgroup as you originally intended.

Oracle claim this is not a bug, but it seems such stupid thing to do, and leaving the disk in such a limbo state, being both part of and not part of the diskgroup does not seem like a sensible outcome to me.

I believe 11g may behave more sensibly in this regard.

15 thoughts on “How to Confuse ASM

  1. Hi Jason,

    good post! This has happened to me too and it’s indeed quite an annoying ‘feature’ of ASM.

    Cheers,
    Luca

  2. Hi Luca,

    Thanks for reading!

    It just seems such a dumb thing for ASM to do, you can’t imagine that this was “designed” to happen!

    cheers,

    jason.

    • Hi Jason,

      I really thank you for this wonderful post.

      I have faced the same issue and this work around is absolutely worked with me.

      Pradeep

      • I can confirm that oracle has indeed left this lovely “feature” in place in 11gR2. Why improve on perfection, right: πŸ˜‰

        Anyway, thanks Jason for the article!

  3. Thanks, Jason.

    It was indeed a great post. Many Thanks.

    What I really liked about it is the presentation style (simple yet lucid).

    We do have a disk (in our DATA disk-group) that’s currently in “limbo” state!

    Needless to say, it must be fixed and I will give it a shot using information given this post.

    Regards,
    Vijay

  4. Hi Jason
    This post is a lifesaver! NOTHING on metalink about this or how to fix it. BTW – 11.1.0.7 ASM has exactly the same ‘undocumented feature’ as 10g.

  5. This works to resolve the issue in 11g:

    Use the option FORCE of the “create diskgroup” or “alter diskgroup add disk” to enforce overriding the previous content of the disk.

    I think I’d use the dd command as a last resort.

Leave a reply to nobody Cancel reply