Exadata Uses Hardware RAID? – You Bet it Does!

I was reading an Excellent post on Exadata hardware by Frits Hoogland and I managed to get myself completely bamboozled by the fact that on an Exadata V2 all you see on the compute nodes is one solitary disk device, and a couple of partitions created from this.

Now it does not take much to work out that a setup created on a single disk would not really be a great selling point for a high end piece of kit, so how do the compute nodes have acceptable levels of disk drive availability and resilience?

Well the traditional answer to increase disk availability is to use hardware RAID and Exadata is no exception in reaching for this solution. Specifically the compute nodes are using the PCI express card pictured opposite as a hardware RAID controller. This is an LSI MegaRAID controller.

You can see that it is installed with the following command:


[root@db01 ~]# lspci -v
0d:00.0 RAID bus controller: LSI Logic / Symbios Logic Unknown device 0079 (rev 03) 
        Subsystem: LSI Logic / Symbios Logic Unknown device 9263 
        Flags: bus master, fast devsel, latency 0, IRQ 66
       .
       .
       .


This was run on a V2 compute node and uses the lspci command to display all the attached PCI cards. I’ve filtered out everything but the LSI MegaRAID controller.

So we now know we have one of these in our Exadata system, how do we know it is actually doing anything of use for us? I still see the following in /proc/partitions:

[root@db01 ~]# cat /proc/partitions 
major minor  #blocks  name

   8     0  285155328 sda 
   8     1   62910508 sda1 
   8     2   16771860 sda2 
   8     3  205471350 sda3


This obviously matches up with what we see with looking at df.

[root@db01 ~]# df -h 
Filesystem            Size  Used Avail Use% Mounted on 
/dev/sda1              60G   22G   35G  39% / 
/dev/sda3             193G  130G   54G  71% /u01

Lets use lsscsi to check what SCSI devices are attached to the system:

[root@db01 ~]# lsscsi 
[0:2:0:0]    disk    LSI      MR9261-8i        2.12  /dev/sda

So the LSI module is managing the /dev/sda device through which there are the 3 partitions created on.

Lets use dmesg to check how many drives were found at boot time:

[root@db01 ~]# dmesg 
           .
           .
SCSI subsystem initialized
megasas: 00.00.04.38 Fri. Jan. 14 12:24:32 EDT 2011
megasas: 0x1000:0x0079:0x1000:0x9263: bus 13:slot 0:func 0
GSI 21 sharing vector 0x42 and IRQ 21
ACPI: PCI Interrupt 0000:0d:00.0[A] -> GSI 24 (level, low) -> IRQ 66
PCI: Setting latency timer of device 0000:0d:00.0 to 64

 gen2: instance->base_addr = df2fc000<6>megasas: FW now in Ready state
megasas: cpx is not supported.
megasas_init_mfi: fw_support_ieee=0<6>scsi0 : LSI SAS based MegaRAID driver
  Vendor: HITACHI   Model: H103030SCSUN300G  Rev: A2A8
  Type:   Direct-Access                      ANSI SCSI revision: 06
  Vendor: HITACHI   Model: H103030SCSUN300G  Rev: A2A8
  Type:   Direct-Access                      ANSI SCSI revision: 06
  Vendor: HITACHI   Model: H103030SCSUN300G  Rev: A2A8
  Type:   Direct-Access                      ANSI SCSI revision: 06
  Vendor: HITACHI   Model: H103030SCSUN300G  Rev: A2A8
  Type:   Direct-Access                      ANSI SCSI revision: 06
  Vendor: LSI       Model: MR9261-8i         Rev: 2.12
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 1169686528 512-byte hdwr sectors (598880 MB)
sda: Write Protect is off
sda: Mode Sense: 1f 00 00 08
SCSI device sda: drive cache: write back, no read (daft)
SCSI device sda: 570310656 512-byte hdwr sectors (291999 MB)
sda: Write Protect is off
sda: Mode Sense: 1f 00 00 08
SCSI device sda: drive cache: write back, no read (daft)
 sda: sda1 sda2 sda3
sd 0:2:0:0: Attached scsi disk sda
         .
         .

I’ve edited the output from this for brevity. So dmesg seems to be reporting that 4 hard drives and the LSI RAID controller have been found.

We can use the MegaCli64 command to interrogate the LSI RAID controller. The MegaCli64 command has a huge number of commands you can give to interrogate the controller, here is a view of what Logical Device my controller has control off:

[root@db01 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -LALL -aALL


Adapter 0 -- Virtual Drive Information: 
Virtual Disk: 0 (Target Id: 0) 
Name: 
RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3 
Size:271.945 GB 
State: Optimal 
Stripe Size: 1.0 MB 
Number Of Drives:3 
Span Depth:1 
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU 
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU 
Access Policy: Read/Write 
Disk Cache Policy: Disabled 
Encryption Type: None 
Number of Dedicated Hot Spares: 1 
    0 : EnclId - 252 SlotId - 3

Exit Code: 0x00

You can see from this that the controller is aware of 4 Physical Drives. These drives match up quite nicely of course with what dmesg says was found at boot time. 1 of the drives is marked as a Dedicated HotSpare.

The drives are in a RAID5 set. You can also see some nice information about the configuration of the controller with the CfgDsply MegaCli64 option:


[root@db01~]# /opt/MegaRAID/MegaCli/MegaCli64 -CfgDsply -aALL

============================================================================== 
Adapter: 0 
Product Name: LSI MegaRAID SAS 9261-8i 
Memory: 512MB 
BBU: Present 
Serial No: SV04004860 
============================================================================== 
Number of DISK GROUPS: 1

DISK GROUPS: 0 
Number of Spans: 1 
SPAN: 0 
Span Reference: 0x00 
Number of PDs: 3 
Number of VDs: 1 
Number of dedicated Hotspares: 1 
Virtual Disk Information: 
Virtual Disk: 0 (Target Id: 0) 
Name: 
RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3 
Size:271.945 GB 
State: Optimal 
Stripe Size: 1.0 MB 
Number Of Drives:3 
Span Depth:1 
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU 
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU 
Access Policy: Read/Write 
Disk Cache Policy: Disabled 
Encryption Type: None 
Physical Disk Information: 
Physical Disk: 0 
sd 0:2:0:0: Attached scsi disk sda

I’ve edited for brevity, but this output then goes to show some information on the Physical Drives. This output is nice in that you can see how much memory the controller has, you also see the number of physical drives in the RAID set and whether there is a Hot spare.

It is also worth noting the Cache Policy that the controller is currently using. WriteBack as displayed here means the controller will acknowledge the write as soon as the data is resident on the cache of the controller, rather than waiting for it to be written out to disk.

I think you can see that there is no need to be bamboozled and that Exadata really does use hardware RAID to increase the availability of the drives within the compute nodes.

Advertisements

8 thoughts on “Exadata Uses Hardware RAID? – You Bet it Does!

  1. Mmmm, interesting post, particularly these words ….

    This is an LSI MegaRAID controller.

    You might want to speak to your client’s DBAs about the joy that baby caused over several days 😉

    (All now resolved, of course)

  2. Hi Doug,

    I’ve no doubt those guys have many a tale to tell, you can tell those guys have suffered.

    Really they belong in a benevolent home now.

    jason.

  3. Multi-million dollar solutions comprised of hundreds-dollar critical components. It’s a tough situation and Oracle is not unique in this regard…well, other than their chest-thumping and self-proclaimed better-than-everyone-at-everything-all-the-time-no-matter-what mantra.

    • Hi Kevin,

      Thanks for reading!

      Yep, I can’t stand the “better-than-everyone-at-everything-all-the-time-no-matter-what mantra”, I’ve always preferred the best tool for the job approach. But recently convincing bean counters that taking a best of breed approach is the way forward for an enterprise has been hard.

      The “One throat to choke” argument is quite persuasive for the people who signs the cheques, as opposed to the people who have to use the technologies.

      jason.

  4. Hi Jason,

    I’m a regular reader of your blog!

    The pre-configured nature of Exadata has become the single largest value add. That, however, is mostly a reflection of the fact that mere mortals cannot handle the software complexity of Real Application Clusters. The software is just simply too complex. Way too complex.

    Now, as for the “One throat to choke” value proposition I see it from two angles. Sure, there is no possibility for a three-way concall whilst hovering over the cadaver. That, to some, is a value proposition because it eliminates finger-pointing. On the other hand, one through to choke is also the same as a single hand to hold as you are dangling off a cliff with shark-infested water several hundred feet below. We know for certain the shark-infested water is a reality. The thing that should frighten intelligent people is having *only one* party interested in helping you–especially when that one party is Oracle. If there are a reasonable number of suppliers involved the customer has greater odds of actually talking to a) someone who actually knows something about information technology and/or b) someone who actually cares!

    I’m going to sound like a schill on this matter, but I am totally convinced that VCE is both an excellent model and a platform with suppliers that genuinely care about the misery you are enduring due to the complexity of the whole stack.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s