ASM and disk “hot spots”

I’ve seen repeated in various locations that ASM somehow has the ability to move data around in response to how much I/O is ocurring on each of the disks that ASM is managing. The theory goes that by doing this, ASM is able to balance the I/O amongst all the drives, thus giving your RDBMS instance(s) that are using ASM the absolute tip-top I/O performance that could possibly be achieved given your hardware limitations.

Sounds great? Trouble is, it just is not true. This idea has gained a bit of traction in the community, and I’m sure many people think ASM is perhaps more clever than it actually is. Whether this is due to marketing terminological inexactitude, i’ll leave up to the reader to decide.

The only metric ASM uses when determining where data should be located is the capacity of the disks in a disk group. ASM’s goal in placing data is to ensure every drive is filled to the same amount. Therefore if you have a disk group of equal size they will receive the same amount of data. The theory being that by spreading the data evenly across the drives you will achieve good I/O performance as both drives are likely to be serving the same number of I/O requests.

ASM does expose some data on how many requests each disk in a disk group is performing, this is via V$ASM_DISK_STAT:

SQL>select group_number, disk_number, read_time, write_time, bytes_read, bytes_written

from V$ASM_DISK_STAT;

GROUP_NUMBER DISK_NUMBER  READ_TIME WRITE_TIME BYTES_READ BYTES_WRITTEN
------------ ----------- ---------- ---------- ---------- -------------

4                0       14910505.7 4626148.24 4.1821E+13 1.4998E+12 
4                1       14965432.3 5324739.98 4.1833E+13 1.6264E+12 

There are two disks in this disk group , which are actually of equal size. They have both read and written a similar quantity of data, though it is not exactly equal. The average write time shows a bigger discrepancy than the read times.

Basically, the point is that for equal sized disks in a disk group the ASM algorithm of distributing data according to capacity works reasonably well.

But consider if you had different sized disks in a disk group. A larger disk gets more data. Is a larger disk actually quicker at returning that data? Well, probably not, and the larger the discrepancy in sizes the larger the skew of I/O there will be.

Maybe one day, ASM will have the ability shift data based on the I/O activity of the underlying drives but until then, make sure all the disks you have in a disk group are of the same size (oh and same performance characteristic). That way you’ll protect yourself from any I/O hot spots that ASM won’t quite save you from yet!

About these ads
Next Post
Leave a comment

8 Comments

  1. Exactly. I am making a presentation for an oracle-DBA conference in the netherlands about ASM, and have and am investigating the inner working of ASM. The I/O tuning is done using the AU/stripe allocation policy. This means there is NO magic I/O tuing done.

    Another point is oracle ASM sees the ‘disks’ it is getting (which are actually partitions on a local device) as individual intities. Is that assumption true?

    Most servers I encounter have connection with 1 storage box (either being SAN or NAS). Out of that storage box ‘slices’ are given to the server. Big chance (and most of the time true), the slices are using the same pool of physical disks. This means you have the chance of getting “I/O encounters”, which could lead to longer I/O times.
    (to be honest, I have not poven this issue by measurement.)

    Reply
  2. jarneil

     /  May 1, 2008

    Hi Frits,

    Thanks for reading!

    I hope you are going to give that talk in the UK sometime?

    I’m giving a talk to the UKOUG on 20th may on ASM and ZFS, more of an overview than a technical deep dive though.

    A lot of dbas have no idea how their storage has been carved up, they are given a set of luns and just have to get on with it. I bet you are right, there probably is a lot of storage horrors out there.

    I’m lucky, am I the storage admin for my company, so I know exactly how my luns are carved up. I never in production present multiple luns to ASM carved from the same disks.

    cheers,

    jason.

    Reply
  3. Jun Erroba

     /  May 14, 2008

    Thank you for highlighting a very important design shortcoming of ASM when it comes to IO load balancing, as opposed to “Disk space utilization balancing” for lack of a better word, where it really matters. I believe that the prevalent misconception amongst DBA, which I would count myself as one of them, until I read your article and force me to look again at the docs, is partly Oracle’s fault , as the snippet below from the 11g ASM doc clearly demonstrates. The clause, “this ensures load balancing”, seems to allude to IO request load, while the next sentence discounts that presumption.

    “When all of the files are evenly dispersed, all of the disks are evenly filled to the same percentage; this ensures load balancing. Rebalancing does not relocate data based on I/O statistics nor is rebalancing started as a result of statistics. ASM rebalancing operations are controlled by the size of the disks in a disk group.”

    Reply
  4. jarneil

     /  May 15, 2008

    Hi Jun,

    No doubt it is Oracle’s fault – what is going to sell more? I/O load balancing or disk space balancing?

    I know how I would sell it if I worked for Oracle marketing!

    Glad to be of help, btw!

    jason.

    Reply
  5. avargasil

     /  January 25, 2009

    ASM does stripe data in allocation unit sizes all along the ASM disks provided by the administrator. that means that oracle files will have allocated their AU’s in a round robin fashion along all physical devices thus spreading the IO on these files.
    Still we as DBA’s need to calculate which is the throughput our database need and to size the storage to provide this throughput.
    So is not a matter of blaming anyone, just do the math and size the infrastructure for the required load,

    Reply
  6. jarneil

     /  January 26, 2009

    Hello Alejandro,

    Thanks for reading.

    I think the point I was making was that some DBAs are mistaken to believe that ASM can dynamically rebalance data in response to I/O hotspots and I was trying to point out that thinking this is incorrect – I had seen a number of people ask this question on the OTN forum, this posting was attempting to get the correct information out there!

    As we know ASM does not dynamically rebalance data, sure the algorithm for data placement is an attempt to avoid I/O hotspots and I’m sure works fine – it’s just that it does not rebalance dynamically!

    jason.

    Reply
  1. asm com
  2. Log Buffer #96: A Carnival of the Vanities for DBAs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 56 other followers

%d bloggers like this: