I was reading an Excellent post on Exadata hardware by Frits Hoogland and I managed to get myself completely bamboozled by the fact that on an Exadata V2 all you see on the compute nodes is one solitary disk device, and a couple of partitions created from this.
Now it does not take much to work out that a setup created on a single disk would not really be a great selling point for a high end piece of kit, so how do the compute nodes have acceptable levels of disk drive availability and resilience?
Well the traditional answer to increase disk availability is to use hardware RAID and Exadata is no exception in reaching for this solution. Specifically the compute nodes are using the PCI express card pictured opposite as a hardware RAID controller. This is an LSI MegaRAID controller.
You can see that it is installed with the following command:
[root@db01 ~]# lspci -v
0d:00.0 RAID bus controller: LSI Logic / Symbios Logic Unknown device 0079 (rev 03)
Subsystem: LSI Logic / Symbios Logic Unknown device 9263
Flags: bus master, fast devsel, latency 0, IRQ 66
.
.
.
This was run on a V2 compute node and uses the lspci command to display all the attached PCI cards. I’ve filtered out everything but the LSI MegaRAID controller.
So we now know we have one of these in our Exadata system, how do we know it is actually doing anything of use for us? I still see the following in /proc/partitions:
[root@db01 ~]# cat /proc/partitions major minor #blocks name 8 0 285155328 sda 8 1 62910508 sda1 8 2 16771860 sda2 8 3 205471350 sda3
This obviously matches up with what we see with looking at df.
[root@db01 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 60G 22G 35G 39% / /dev/sda3 193G 130G 54G 71% /u01
Lets use lsscsi to check what SCSI devices are attached to the system:
[root@db01 ~]# lsscsi [0:2:0:0] disk LSI MR9261-8i 2.12 /dev/sda
So the LSI module is managing the /dev/sda device through which there are the 3 partitions created on.
Lets use dmesg to check how many drives were found at boot time:
[root@db01 ~]# dmesg
.
.
SCSI subsystem initialized
megasas: 00.00.04.38 Fri. Jan. 14 12:24:32 EDT 2011
megasas: 0x1000:0x0079:0x1000:0x9263: bus 13:slot 0:func 0
GSI 21 sharing vector 0x42 and IRQ 21
ACPI: PCI Interrupt 0000:0d:00.0[A] -> GSI 24 (level, low) -> IRQ 66
PCI: Setting latency timer of device 0000:0d:00.0 to 64
gen2: instance->base_addr = df2fc000<6>megasas: FW now in Ready state
megasas: cpx is not supported.
megasas_init_mfi: fw_support_ieee=0<6>scsi0 : LSI SAS based MegaRAID driver
Vendor: HITACHI Model: H103030SCSUN300G Rev: A2A8
Type: Direct-Access ANSI SCSI revision: 06
Vendor: HITACHI Model: H103030SCSUN300G Rev: A2A8
Type: Direct-Access ANSI SCSI revision: 06
Vendor: HITACHI Model: H103030SCSUN300G Rev: A2A8
Type: Direct-Access ANSI SCSI revision: 06
Vendor: HITACHI Model: H103030SCSUN300G Rev: A2A8
Type: Direct-Access ANSI SCSI revision: 06
Vendor: LSI Model: MR9261-8i Rev: 2.12
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sda: 1169686528 512-byte hdwr sectors (598880 MB)
sda: Write Protect is off
sda: Mode Sense: 1f 00 00 08
SCSI device sda: drive cache: write back, no read (daft)
SCSI device sda: 570310656 512-byte hdwr sectors (291999 MB)
sda: Write Protect is off
sda: Mode Sense: 1f 00 00 08
SCSI device sda: drive cache: write back, no read (daft)
sda: sda1 sda2 sda3
sd 0:2:0:0: Attached scsi disk sda
.
.
I’ve edited the output from this for brevity. So dmesg seems to be reporting that 4 hard drives and the LSI RAID controller have been found.
We can use the MegaCli64 command to interrogate the LSI RAID controller. The MegaCli64 command has a huge number of commands you can give to interrogate the controller, here is a view of what Logical Device my controller has control off:
[root@db01 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -LALL -aALL
Adapter 0 -- Virtual Drive Information:
Virtual Disk: 0 (Target Id: 0)
Name:
RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3
Size:271.945 GB
State: Optimal
Stripe Size: 1.0 MB
Number Of Drives:3
Span Depth:1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disabled
Encryption Type: None
Number of Dedicated Hot Spares: 1
0 : EnclId - 252 SlotId - 3
Exit Code: 0x00
You can see from this that the controller is aware of 4 Physical Drives. These drives match up quite nicely of course with what dmesg says was found at boot time. 1 of the drives is marked as a Dedicated HotSpare.
The drives are in a RAID5 set. You can also see some nice information about the configuration of the controller with the CfgDsply MegaCli64 option:
[root@db01~]# /opt/MegaRAID/MegaCli/MegaCli64 -CfgDsply -aALL ============================================================================== Adapter: 0 Product Name: LSI MegaRAID SAS 9261-8i Memory: 512MB BBU: Present Serial No: SV04004860 ============================================================================== Number of DISK GROUPS: 1 DISK GROUPS: 0 Number of Spans: 1 SPAN: 0 Span Reference: 0x00 Number of PDs: 3 Number of VDs: 1 Number of dedicated Hotspares: 1 Virtual Disk Information: Virtual Disk: 0 (Target Id: 0) Name: RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3 Size:271.945 GB State: Optimal Stripe Size: 1.0 MB Number Of Drives:3 Span Depth:1 Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disabled Encryption Type: None Physical Disk Information: Physical Disk: 0 sd 0:2:0:0: Attached scsi disk sda
I’ve edited for brevity, but this output then goes to show some information on the Physical Drives. This output is nice in that you can see how much memory the controller has, you also see the number of physical drives in the RAID set and whether there is a Hot spare.
It is also worth noting the Cache Policy that the controller is currently using. WriteBack as displayed here means the controller will acknowledge the write as soon as the data is resident on the cache of the controller, rather than waiting for it to be written out to disk.
I think you can see that there is no need to be bamboozled and that Exadata really does use hardware RAID to increase the availability of the drives within the compute nodes.


Doug Burns
/ October 31, 2011Mmmm, interesting post, particularly these words ….
This is an LSI MegaRAID controller.
You might want to speak to your client’s DBAs about the joy that baby caused over several days
(All now resolved, of course)
jarneil
/ October 31, 2011Hi Doug,
I’ve no doubt those guys have many a tale to tell, you can tell those guys have suffered.
Really they belong in a benevolent home now.
jason.
Doug Burns
/ October 31, 2011I’ll search for a benevolent home that would take them ….
kevinclosson
/ November 3, 2011Multi-million dollar solutions comprised of hundreds-dollar critical components. It’s a tough situation and Oracle is not unique in this regard…well, other than their chest-thumping and self-proclaimed better-than-everyone-at-everything-all-the-time-no-matter-what mantra.
jarneil
/ November 3, 2011Hi Kevin,
Thanks for reading!
Yep, I can’t stand the “better-than-everyone-at-everything-all-the-time-no-matter-what mantra”, I’ve always preferred the best tool for the job approach. But recently convincing bean counters that taking a best of breed approach is the way forward for an enterprise has been hard.
The “One throat to choke” argument is quite persuasive for the people who signs the cheques, as opposed to the people who have to use the technologies.
jason.
kevinclosson
/ November 3, 2011Hi Jason,
I’m a regular reader of your blog!
The pre-configured nature of Exadata has become the single largest value add. That, however, is mostly a reflection of the fact that mere mortals cannot handle the software complexity of Real Application Clusters. The software is just simply too complex. Way too complex.
Now, as for the “One throat to choke” value proposition I see it from two angles. Sure, there is no possibility for a three-way concall whilst hovering over the cadaver. That, to some, is a value proposition because it eliminates finger-pointing. On the other hand, one through to choke is also the same as a single hand to hold as you are dangling off a cliff with shark-infested water several hundred feet below. We know for certain the shark-infested water is a reality. The thing that should frighten intelligent people is having *only one* party interested in helping you–especially when that one party is Oracle. If there are a reasonable number of suppliers involved the customer has greater odds of actually talking to a) someone who actually knows something about information technology and/or b) someone who actually cares!
I’m going to sound like a schill on this matter, but I am totally convinced that VCE is both an excellent model and a platform with suppliers that genuinely care about the misery you are enduring due to the complexity of the whole stack.