Keep Disks in your Diskgroup the same size

In my production instances, I have only ever used ASM with external redundancy. Hey, I’m paying expensive fees for fancy hardware RAID, I might as well use it. I therefore tend to present to ASM a LUN which is normally a RAID 10 stripe set. As far as ASM is concerned this is one large disk and it does not have to worry about failure groups.

Well, that is the ideal, but as we know we don’t live in a perfect world. On one of the production instances, due to using “hand me down” hardware, I have created 2 LUNS, and these are of differing sizes. Everything was running along happily until, one of the LUNS ran out of space.

I had thought ASM was meant to distribute extents based on the size of the disks in the diskgroup, so for example, if your diskgroup was made up of 2 disks and one was say 60GB and one was 120GB the 120GB would contain twice as many extents (Allocation Units) as the 60GB disk. This would ensure that one disk in the diskgroup was never filled up while the other one still had plenty of space. Well, it seems that this does not necessarily work perfectly in practice.

So I have a diskgroup lets, call it DATA4 and it is made up of two disks VOL4 and VOL5 and when you look at V$ASM_DISKGROUP this diskgroup has lots of lovely free space:

 SQL> select group_number, name, total_mb, free_mb
 from V$ASM_DISKGROUP;

GROUP_NUMBER NAME		 TOTAL_MB	FREE_MB
------------ ------------------ ----------     ----------
4            DATA4		 220391	       24501

So If you just looked at that view you would be hard pushed to explain why you could not allocate space in your diskgroup. However if you diskgroup is made up of multiple disks take a look at the following view:

SQL> select group_number, name, TOTAL_MB, FREE_MB
from V$asm_disk_stat;
GROUP_NUMBER NAME			      TOTAL_MB	  FREE_MB
------------ ------------------------------ ---------- ----------
	   4 VOL4				124660	    24501
	   4 VOL5				 95731	        0

Oh great! All the available space in the diskgroup is on one of the disks in the diskgroup. ASM is not clever enough to just then allocate new extents to this disk in the diskgroup, it will just keep on doing it’s effective round robin distribution of extents, which means you will get an ORA-15041 error saying the diskgroup space is exhausted. And you’ll be convinced that it ain’t so if you just look at V$ASM_DISKGROUP.

Thankfully, there is help at hand to fix this in the rebalance process. I had thought a rebalance was only required when the storage had physically changed, i.e. adding a new disk, but a rebalance basically evened out where the data was stored:

SQL> alter diskgroup DATA4 rebalance;

You can set a variable level of speed to the rebalance using the power syntax. After the rebalance completed and it took 41 minutes at power 1. I saw the following in V$ASM_DISK_STAT:

SQL> select group_number, name, TOTAL_MB, FREE_MB
from V$asm_disk_stat;    
GROUP_NUMBER NAME			      TOTAL_MB	  FREE_MB
------------ ------------------------------ ---------- ----------
	   4 VOL4				124660	    13859
	   4 VOL5				 95731	    10642

Bingo! I can now allocate new extents in my diskgroup and I have not increased the storage available by 1 byte.

Definitely, it will save you pain if you keep all disks in your diskgroup the same size.

Advertisements

6 thoughts on “Keep Disks in your Diskgroup the same size

  1. Jason, could you share the version of both ASM and database, and the compatibility setting of the ASM diskgroup?

    If the allocation is truly blind round robin, it really is a major flaw in the design, because it will mean that using different sizes of disks diskspace will not be reachable until the diskgroup is rebalanced.

  2. Hi Frits,

    Yep, I realised this was an oversight in the article. The database and ASM versions are running 10.2.0.3 on rhel 4 U3 x86-64. The compatability setting is 10.1.0.0.0 on both the compatibility , and database_compatibility columns of v$asm_diskgroup.

    Which I believe to be the default even with 10gR2.

    Basically that is what is happening to us, the diskgroup is filling up, though has a good 25GB free (around 10-15%), except as you see one disk/lun in the diskgroup being slightly smaller, has got itself filled completely up, while the other disk/lun has all the free space.

    This certainly is not working as advertised.

  3. Yes, 10.1.0.0.0 is the default for both, even in version 11.1.0.6.0.

    I am planning to test a scenario with non-equal sized disks in a diskgroup.

    The oracle documentation states that allocation of extents in ASM is done relative to the total(!) disk size, which means that if you got unequal sized disks, it will fill up relatively the same.

    That means that your scenario probably means that a disk is added or resized and the rebalance needed after that has not happened or is aborted. Could you elaborate on that?

  4. Hi Frits,

    I should mention ASMLib is also in the mix here:

    I have 2.6.9-34 linux kernel (and the asm version 2.0.3-1 for that kernel). asmsupport is also 2.0.3-1, while asmlib is 2.0.2-1.

    What happened and this goes back to August 2006, was the disk group was created and then immediately I added the 2nd lun to the disk group (note the disk group was NOT created with both luns in the same create disk group statement, but a create followed by an add).

    Note there was NO data on the luns before I added the 2nd disk group

    The asm alert log states that AFTER the 2nd disk was added a rebalance of disk group occurred. The alert log claims this rebalance completed successfully. It does seem likely that it has not managed to communicate the correct sizes of the luns.

    Yes on the docs, I read that and was slightly peeved. Looking at the free_mb when the 2nd lun filled up, it has not quite treated both luns as the same size, as there is 100GB on one but only 95GB on the other, close though.

  5. The penny has only just dropped. The rebalance has not actually moved data from the 2nd lun. It has just increased the actual size of the lun. I have not changed the size of the lun. It would seem the first rebalance when the disk group first had the 2nd lun added did not calculate the size correctly.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s