ASM Mirroring

May 12, 2008

ASM can provide RAID like protection for your data. ASM provides redundancy via failure groups, essentially when you create a disk group you assign disks that are members of the disk group to a particular failure group. ASM will then mirror the data between these failure groups. The idea being you keep disks that are dependent on the same hardware in the same failure group, so failure of a particular component does not impact the availability of your disk group.

ASM actually mirrors at the extent level. When an extent is allocated there is the concept of a primary extent and a secondary extent, essentially a primary copy and a mirror copy and these are allocated within different failure groups. The above diagram shows primary extents in purple and secondary extents in red (each square represents an extent).

By default, ASM always reads the primary copy of an extent. At first I thought this was a big limitation and would reduce the I/O bandwidth available, but of course ASM mirroring ensures the primary copies are spread across the failure groups so that the I/O is spread over the maximum number of drives.

Frits Hoogland pointed out to me this may still be less efficient than traditional mirroring in that ASM has no chance of optimising a read based on which underlying physical device has the closest disk head position.

To be fair, Oracle themselves to state that external redundancy is preferred unless you have particular requirements that can only be provided with a software RAID solution. Oh and yeah, preferred_read_failure_groups seem like a real win for extended clusters if that is your bag.


ASM Extents

May 6, 2008

Every ASM disk is divided into allocation units (au). ASM files are stored as extents and an extent consists of one or more allocation unit, though it was only 11g that brought in variable sized extents. The ASM instance provides the RDBMS instance with an extent map that the RDBMS instance then uses when doing I/O.

The diagram above is meant to show the extents of a pair of ASM files distributed amongst the available drives in a disk group. Essentially this is the algorithm that ASM uses to maximise the I/O performance - spread all data across the disks in a disk group.

When you create a disk group in 11g you can specify the size of the allocation unit to be from 1MB to 64MB, the size doubling between these limits. That is you can set the size of the au for a disk group to be one of 1, 2, 4, 8, 16, 32, or 64MB.

Clearly the larger the au size chosen the less the number of extents it will take to map a file of a given size. The larger au are clearly beneficial for large data files and cuts down on SGA required to track. Each individual extent resides on a single disk.

Extents can vary in size from 1 au to 8 au to 64 au. The number of au a given extent will use is dependent on the number of extents allocated and the extent size increases at a threshold of 20,000 extents to 8 and then again at 40,000 extents to 64. Again this is designed to be beneficial to larger data files, requiring less extents to be tracked.

You can see how the extents are allocated between disks in a disk group by looking at the X$KFFXP view:


SQL> select count(*), group_kffxp, disk_kffxp

from X$KFFXP
group by group_kffxp, disk_kffxp
order by group_kffxp;

This will show you how many au have been allocated to each disk, if you have a healthy balanced system each disk in a disk group should have a similar number of au.

A very useful script for looking at all this is available on metalink, look for Note: 351117.1, diagnosing ASM space issues, well worth having a look at.


ASM and disk “hot spots”

May 1, 2008

I’ve seen repeated in various locations that ASM somehow has the ability to move data around in response to how much I/O is ocurring on each of the disks that ASM is managing. The theory goes that by doing this, ASM is able to balance the I/O amongst all the drives, thus giving your RDBMS instance(s) that are using ASM the absolute tip-top I/O performance that could possibly be achieved given your hardware limitations.

Sounds great? Trouble is, it just is not true. This idea has gained a bit of traction in the community, and I’m sure many people think ASM is perhaps more clever than it actually is. Whether this is due to marketing terminological inexactitude, i’ll leave up to the reader to decide.

The only metric ASM uses when determining where data should be located is the capacity of the disks in a disk group. ASM’s goal in placing data is to ensure every drive is filled to the same amount. Therefore if you have a disk group of equal size they will receive the same amount of data. The theory being that by spreading the data evenly across the drives you will achieve good I/O performance as both drives are likely to be serving the same number of I/O requests.

ASM does expose some data on how many requests each disk in a disk group is performing, this is via V$ASM_DISK_STAT:

SQL>select group_number, disk_number, read_time, write_time, bytes_read, bytes_written

from V$ASM_DISK_STAT;

GROUP_NUMBER DISK_NUMBER  READ_TIME WRITE_TIME BYTES_READ BYTES_WRITTEN
------------ ----------- ---------- ---------- ---------- -------------

4                0       14910505.7 4626148.24 4.1821E+13 1.4998E+12
4                1       14965432.3 5324739.98 4.1833E+13 1.6264E+12 

There are two disks in this disk group , which are actually of equal size. They have both read and written a similar quantity of data, though it is not exactly equal. The average write time shows a bigger discrepancy than the read times.

Basically, the point is that for equal sized disks in a disk group the ASM algorithm of distributing data according to capacity works reasonably well.

But consider if you had different sized disks in a disk group. A larger disk gets more data. Is a larger disk actually quicker at returning that data? Well, probably not, and the larger the discrepancy in sizes the larger the skew of I/O there will be.

Maybe one day, ASM will have the ability shift data based on the I/O activity of the underlying drives but until then, make sure all the disks you have in a disk group are of the same size (oh and same performance characteristic). That way you’ll protect yourself from any I/O hot spots that ASM won’t quite save you from yet!


Dropping a disk in an ASM Disk Group

April 29, 2008

One of the big selling points of ASM is the ability to reconfigure the storage online. Previously I’ve blogged about expanding a disk in a disk group. Another useful feature of ASM is to use it to migrate from one set of disks in a disk group to another, or indeed from one storage array to another.

This is basically achievable because ASM distributes data across all disks in a disk group evenly, and assuming you have enough space, you can happily drop disks in a disk group and ASM will seamlessly migrate the data to the existing disks in the disk group.


SQL> select group_number, name, TOTAL_MB, FREE_MB
from V$asm_disk_stat;

GROUP_NUMBER      NAME          TOTAL_MB    FREE_MB
------------ ---------------- ---------- ----------
	   1      VOL1		61439	     61187
	   2      VOL2		61439	     61164
	   3      VOL3		61439	     61164
	   4      VOL4	       409594	    310962
	   4      VOL5	       153597	     95240

So, we see here that VOL4 and VOL5 are two disks (luns) in disk group 4. Previously I had expanded VOL4 and this now has enough capacity to encompass all the data resident on this disk group. I am now safe to drop VOL5 and this is an online operation:

SQL> alter diskgroup DATA4 drop disk VOL5;

Diskgroup altered.

This alter diskgroup command essentially shuffles extents from the disk you are removing and distributes them to the remaining disks in your disk group. While the operation is continuing you can check V$ASM_OPERATION for the progress you are making:

SQL> select * from v$asm_operation;

GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE  EST_MINUTES
------------ ----- ---- ----- ----- ------ -------  ---------- ----------
	   4 REBAL RUN     1	 1   100   42234       1007     41 

Most of the columns here are self explanatory, however the SOFAR column tells you the number of Allocation Units (au) that have been moved, the EST_WORK and EST_RATE are also in au and au/minute.

Once the rebalance has moved all the Allocation Units the disk is removed from the disk group:


SQL> select group_number, name, TOTAL_MB, FREE_MB
from V$asm_disk_stat;

GROUP_NUMBER      NAME          TOTAL_MB    FREE_MB
------------ ---------------- ---------- ----------
	   1      VOL1		61439	     61187
	   2      VOL2		61439	     61164
	   3      VOL3		61439	     61164
	   4      VOL4	       409594	    252006

Dropping a disk in a disk group seemed to work as advertised, the real benefit of course, is instead of it being just a disk you were dropping but that it was a lun representing a whole storage array, then this has real potential for allowing you to upgrade storage or even migrate to a different storage platform entirely.


Expanding an ASM disk

April 22, 2008

One of the major advantages of ASM is the ability to reconfigure the storage online. In theory you should be able to add disks, remove disks, and resize disks all the while your ASM and RDBMS instances just keep humming along.

However, I don’t think it is actually possible to expand a lun without downtime if you are using ASMLib. Part of the problem seems to be that with ASMLib you have to create a partition and certainly on RedHat 4 Update 3, using kernel 2.6.9-34.ELsmp, to change a partition table required that ASM was not using the disk that the partition table was residing on.

Recently I found this out the hard way when I attempted to increase the size of a lun that was being used by ASM. Expanding the lun on the storage was fairly straightforward on the EMC Clariion on which the data was residing.

I’m not really sure if this is the best way of mapping OS device -> ASM disk:

[jason@bdb ~]$ sudo /etc/init.d/oracleasm querydisk VOL4
Disk “VOL4″ is a valid ASM disk on device [8, 1]

I believe this to be the major and minor number of the device, so you can look in /dev to see what device this corresponds to:

[jason@bdb ~]$ ls -l /dev/sda1
brw-rw—- 1 root disk 8, 1 Apr 3 16:29 /dev/sda1

Or indeed thanks to Charles Kim you can run the querydisk the opposite way round:

[jason@bdb ~]$ sudo /etc/init.d/oracleasm querydisk /dev/sda1
Disk “/dev/sda1″ is marked an ASM disk with the label “VOL4″

I cannot see in any V$ASM view where this mapping from asm disk -> OS device is exposed, perhaps ASMLib is getting in the way here. What can say is that ASM disk VOL4 maps to /dev/sda1 which is a partiton on /dev/sda. I then increased the size of the lun that this device was created from.

Then comes the scary part, getting the OS to see the increased lun. This was running on rhel 4 update 3, and after a reboot I could see the following via fdisk:

[jason@bdb ~]$ sudo /sbin/fdisk /dev/sda

Command (m for help): p

Disk /dev/sda: 429.4 GB, 429496729600 bytes
255 heads, 63 sectors/track, 52216 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 1 32635 262140606 83 Linux

So the OS can see that the device /dev/sda now has 52216 cylinders but only 32635 (which was the original lun size) have been allocated to the partition /dev/sda1. Now you can actually just delete the partition and recreate it without losing any data:

Command (m for help): d
Selected partition 1

This has deleted the partition

Command (m for help): p

Disk /dev/sda: 429.4 GB, 429496729600 bytes
255 heads, 63 sectors/track, 52216 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

Now you have to recreate it:

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-52216, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-52216, default 52216):
Using default value 52216

Now we can see the /dev/sda1 partition is up to the full capacity of the underlying lun:

Command (m for help): p

Disk /dev/sda: 429.4 GB, 429496729600 bytes
255 heads, 63 sectors/track, 52216 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 1 52216 419424988+ 83 Linux

Don’t forget to write the changes out:


Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

I actually had ASM shut down at this point because fdisk had previously stated the device was busy (when trying to write the new partition table) and that the kernel would still use the old partition:


WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.

Once ASM (and obviously the RDBMS instance relying on this ASM instance) was down, I was able to write the partition table. Without changing the partition table, ASM would not recognise that the luns had been increased. After this partition table was written, getting ASM to increase what it thought was the size of disk was quite simple:


SQL> select group_number, name, TOTAL_MB, FREE_MB
from V$asm_disk_stat; 2

GROUP_NUMBER NAME TOTAL_MB FREE_MB
———— —————————— ———- ———-
1 VOL1 61439 61187
2 VOL2 61439 61164
3 VOL3 61439 61164
4 VOL4 255996 157374
4 VOL5 153597 95230

SQL> alter diskgroup DATA4 resize all rebalance power 4;

Diskgroup altered.

SQL> select group_number, name, TOTAL_MB, FREE_MB
from V$asm_disk_stat; 2

GROUP_NUMBER NAME TOTAL_MB FREE_MB
———— —————————— ———- ———-
1 VOL1 61439 61187
2 VOL2 61439 61164
3 VOL3 61439 61164
4 VOL4 409594 310962
4 VOL5 153597 95240

So what am I saying, well it seems that with ASMLib it is hard to resize a disk completely online with ASM, but due to the fact that the Linux partition table cannot be re-written while ASM has open the device.

Perhaps, this is a disadvantage of ASMLib compared to running without out, though for my money ASMLib seems to be the favoured Oracle solution, certainly I think it is pushed in the documentation.


Some thoughts on the April CPU

April 17, 2008

Ok, so everyone and their granny knows that the latest and greatest Critical Patch Update has appeared. But there are a couple of things that might be missed.

First, I took out a documentation bug, bug number 6764071, on the last CPU. This was because Oracle stated at one point users in a RAC system could access the system while the post-install procedures were being executed, but the post-install instructions state quite clearly that you need a startup upgrade even in a RAC environment - hence no access to the system. I’ve already blogged about this, but a few days ago Oracle confirmed that the doc bug had been amended.

Of course Oracle make the same document bug from January in the new April CPU:

I got the feedback on the 14th, so it seems it will be included in the next CPU patch. The sentence will be changed to:
“Users can continue to access the database during the post-installation steps, except during the one-
time view recompilation.”

The second point is that the fixes introduced in the April CPU are actually included in the 10.2.0.4 patchset, metalink Note:552248.1 states the following:

1.3 Database 10.2.0.4 Patch Set

The Database 10.2.0.4 Patch Set includes the CPUApr2008 content.

I find this quite interesting on two levels, first if you need to do testing for the CPU, maybe you are as well as just doing the testing for the patchset instead, and just jumping right to 10.2.0.4 (if your apps can live with it). Secondly, is it a bit worrying that the 10.2.0.4 patchset has been available for 1 month now, but Oracle only now let on that there are these critical vulnerabilities and they have been fixed for this length of time.

Is that really good security? Yes, I know it’s all about the quarterly cycle, but is it not more important giving customers as much information as possible?

Also, it is almost like the left hand does not know what the right hand is doing, because if you look at the metalink document 10.2.0.4 Patch Set - List of Bug Fixes by Problem Type, you will see the January CPU mentioned right at the top in a section about Security Alerts Issues fixed, but there is no mention of the April CPU - but surely they must have known this when they were making the patchset!

I’m sure the instructions for the April CPU are gospel and that you will be protected if you upgrade to 10.2.0.4, but it hardly gives you a warm glow of confidence, does it now?


It’s a bug!

April 15, 2008

Well it seems like the issue whereby my disk group was showing free space on 1 disk in the diskgroup but out of space on the other disk has turned out to be a bug. The details of the problem are available in the last post.

Oracle support are now saying that we are being hit by Bug 4380450, which is to do with:

Unbalanced space usage if diskgroup has only two disks both of different sizes.

This bug is known to affect 10.2.0.3 (the version I was running). and is marked as fixed in 10.2.0.4 and 11.1.0.6. The disk group must have exactly 2 disks and the disks must be of different sizes, then it is possible you will have unbalanced data allocation, which in turn can lead you to filling one of the drives, which means you can no longer allocate disk space from this disk group. If the data was balanced you would still have storage capacity available.

The workarounds on this bug are to perform a manual rebalance or use disks that are the same size - advice which I would thoroughly agree with!


Keep Disks in your Diskgroup the same size

April 10, 2008

In my production instances, I have only ever used ASM with external redundancy. Hey, I’m paying expensive fees for fancy hardware RAID, I might as well use it. I therefore tend to present to ASM a LUN which is normally a RAID 10 stripe set. As far as ASM is concerned this is one large disk and it does not have to worry about failure groups.

Well, that is the ideal, but as we know we don’t live in a perfect world. On one of the production instances, due to using “hand me down” hardware, I have created 2 LUNS, and these are of differing sizes. Everything was running along happily until, one of the LUNS ran out of space.

I had thought ASM was meant to distribute extents based on the size of the disks in the diskgroup, so for example, if your diskgroup was made up of 2 disks and one was say 60GB and one was 120GB the 120GB would contain twice as many extents (Allocation Units) as the 60GB disk. This would ensure that one disk in the diskgroup was never filled up while the other one still had plenty of space. Well, it seems that this does not necessarily work perfectly in practice.

So I have a diskgroup lets, call it DATA4 and it is made up of two disks VOL4 and VOL5 and when you look at V$ASM_DISKGROUP this diskgroup has lots of lovely free space:

 SQL> select group_number, name, total_mb, free_mb
 from V$ASM_DISKGROUP;

GROUP_NUMBER NAME		 TOTAL_MB	FREE_MB
------------ ------------------ ----------     ----------
4            DATA4		 220391	       24501

So If you just looked at that view you would be hard pushed to explain why you could not allocate space in your diskgroup. However if you diskgroup is made up of multiple disks take a look at the following view:

SQL> select group_number, name, TOTAL_MB, FREE_MB
from V$asm_disk_stat;
GROUP_NUMBER NAME			      TOTAL_MB	  FREE_MB
------------ ------------------------------ ---------- ----------
	   4 VOL4				124660	    24501
	   4 VOL5				 95731	        0

Oh great! All the available space in the diskgroup is on one of the disks in the diskgroup. ASM is not clever enough to just then allocate new extents to this disk in the diskgroup, it will just keep on doing it’s effective round robin distribution of extents, which means you will get an ORA-15041 error saying the diskgroup space is exhausted. And you’ll be convinced that it ain’t so if you just look at V$ASM_DISKGROUP.

Thankfully, there is help at hand to fix this in the rebalance process. I had thought a rebalance was only required when the storage had physically changed, i.e. adding a new disk, but a rebalance basically evened out where the data was stored:

SQL> alter diskgroup DATA4 rebalance;

You can set a variable level of speed to the rebalance using the power syntax. After the rebalance completed and it took 41 minutes at power 1. I saw the following in V$ASM_DISK_STAT:

SQL> select group_number, name, TOTAL_MB, FREE_MB
from V$asm_disk_stat;
GROUP_NUMBER NAME			      TOTAL_MB	  FREE_MB
------------ ------------------------------ ---------- ----------
	   4 VOL4				124660	    13859
	   4 VOL5				 95731	    10642

Bingo! I can now allocate new extents in my diskgroup and I have not increased the storage available by 1 byte.

Definitely, it will save you pain if you keep all disks in your diskgroup the same size.


Comparing ASM with ZFS

April 9, 2008

I’ll be presenting on the topic of ASM and ZFS at the forthcoming UKOUG UNIX SIG, on the 20th May. I’m really looking forward to the presentation as it will be first presentation I have given in the style of presentation zen. There will be no bullet points in the slides, indeed the slides themselves will be meaningless on their own, though they will be an appropriate accompaniment to the words i’ll be delivering.

I’m currently writing a word document that will be the take-away document for the presentation. So, I’d like to ask anyone popping by the blog, what kind of stuff they would like to see in a presentation on ASM and ZFS? Is there any topics you feel are not addressed all that often, that should go into a talk on this subject?

Below is the general outline of the presentation:

This presentation describes Oracle’s ASM and Sun’s ZFS file systems.  I will tell a little bit of their history and how they actually work.

I will also compare and contrast the file systems, giving an understanding of the benefits of each.

The idea for the presentation came about while I was watching one of the Chief designers of ZFS, Sun’s Bill Moore, give a talk on ZFS. I was of course impressed with the functionality of the file system, though I had heard quite a lot about it prior to this. What I found unexpectedly in the talk that really intrigued me, was that the language Bill was using and some of the concepts expounded on in the presentation would be familiar to a DBA audience.

I was also struck by some of the similarities between ASM and ZFS – they have some unique features in common – what I mean by that, is that there are some advantages a software RAID solution (which both of them are) have over hardware RAID.

I had been running ASM in production for around two years by this time (December 2007) and, I suspect like a lot of DBAs had in some ways treated ASM like a black box. I knew enough to install it and operate it, but knew very little about how it actually worked. In some ways I think Oracle are greatly responsible for this state of affairs, as the stunning lack of documentation available regarding ASM has only bred a lack of understanding.

To be fair, I think Oracle have partly addressed this issue with the 11g documentation set, which now includes a “Storage Administrator” guide. However, they really have only partly addressed this, in that this guide still does not really tell you very many details on how ASM actually works.There is though an ASM book: “Oracle Automatic Storage Management” by Nitin Vengurlekar, Murali Vallath, and Rich Long. This book covers the gap in the explanation of how ASM actually works.

The boundary of responsibility for storage administration has become increasingly blurred within organisations with the adoption of ASM. I think this means DBAs more than ever (though, you could argue it should always have been the case) need to understand storage concepts to be fully in a position to extract the maximum benefit from their storage.

Here I will present some of the ideas behind both ASM and ZFS giving some insight of the benefits of both storage solutions and some of the features they have in common as well as where they differ.

I hope that sounds interesting enough for a presentation, and if you have any tips on what to include I’ll consider every one received.


Managing Datafiles on a Standby using ASM

April 1, 2008

I encountered a curious failure in a dataguard environment, that seems interesting enough to distribute to a wider audience. The system was running 10.2.0.3 on Linux, with the datafiles stored in ASM. This was recorded recently in the RDBMS instance alert log:

MRP0: Background Media Recovery terminated with error 1237
Sun Mar 23 09:57:19 2008
Errors in file /opt/oracle/product/admin/STANDBY/bdump/standby1_mrp0_27165.trc:
ORA-01237: cannot extend datafile 35
ORA-01110: data file 35: '+DATA4/standby/datafile35.dbf'
ORA-17505: ksfdrsz:1 Failed to resize file to size 1624704 blocks
ORA-15041: diskgroup space exhausted

While in the ASM instance alert log I found the following:

Sun Mar 23 09:57:18 2008
WARNING: allocation failure on disk VOL5 for file 286 xnum 12693

At first sight you might think this is an obvious case of the diskgroup filling up and that more space needs to be allocated to it. However when I checked how much free space was available I saw:

SQL> select name, total_mb, free_mb, usable_file_mb from v$asm_diskgroup;

NAME				 TOTAL_MB    FREE_MB USABLE_FILE_MB
------------------------------ ---------- ---------- --------------
DATA1				    61439      29766	      29766
DATA2				    10239	3356	       3356
DATA3				    10239	3356	       3356
DATA4				   220391      25077	      25077
FRA				    68197      67959	      67959

So as far as the V$ASM_DISKGROUP view was concerned there really was enough storage space to allocate to this datafile, note this datafile was already a considerable size so the amount it was extending was nothing compared to the 25GB free. Much scratching of heads ensued, and  I started thinking if fragmentation could be responsible, but then I looked at the contents of the affected diskgroup using asmcmd.

I spotted that there was a datafile that had been removed a couple of weeks previously from the primary. At the same time as that datafile, several others had been removed and all these were gone. And then we remembered that the first drop tablespace command had not included the and datafiles clause. We have standby_file_management set to auto and this worked perfectly for files that were automatically removed on the primary. It did not work for the datafile that was removed manually, as you’d probably expect. Running rm within ASMCMD logged the following kind of thing into the alert log of the ASM instance:

SQL> alter diskgroup 'DATA4' drop file '+DATA4/STANDBY/removed_datafile.dbf'

Once this datafile was removed the standby could continue processing happily and whatever caused it to fail to extend the datafile was now not causing it a problem. The real question is why I could not allocate the space when the diskgroup was not really full?

I think what happened was that the RDBMS instance thought the space allocated to the datafile that was being used by the dropped tablesapace was now available for use again and the database instance tried to extend a tablespace into this space but found it was in fact still occupied by the datafile that had not been removed.

Clearly having the database clean up datafiles automatically is a really useful feature, and this becomes doubly so in the case of a dataguard environment. Certainly, I think it is a good idea to drop a tablespace with the including contents and datafiles clause.