We have already seen that the compute nodes in an Exadata system are using hardware RAID to offer increased availability and serviceability for the disk drives in them. What about the Storage Cells themselves?
At this point you are quite possibly thinking I’ve gone a bit nuts. Everyone knows Exadata uses ASM to offer highly resilient storage with all the benefits that ASM brings to the table, and everyone knows you don’t need hardware RAID to have these benefits.
So surely an Exadata Storage Cell does not use hardware RAID, right?
Storage Cell Hardware
So how can you tell you are working on a Storage Cell, as opposed to the compute node? Well lets check what dmidecode states:
[root@cel01 ~]# dmidecode -s system-product-name SUN FIRE X4275 SERVER
This is actually a V2 box, while the X2-2 box is different in a couple of ways:
[root@cel01 ~]# dmidecode -s system-product-name SUN FIRE X4270 M2 SERVER
The X4270 M2 can actually take 24 2.5″ drives or 12 3.5″ drives. Currently only the 12 disk option is available.
The schematic for this server is above, basically it is a 2U box that can take up to 12 drives. In Exadata these storage cells are running linux:
[root@cel01 ~]# uname -r 2.6.18-22.214.171.124.3.el5
However, they have our old friend the LSI MegaRAID controller installed:
[root@cel01 ~]# lsscsi -v [0:2:0:0] disk LSI MR9261-8i 2.12 /dev/sda dir: /sys/bus/scsi/devices/0:2:0:0 [/sys/devices/pci0000:00/0000:00:05.0/0000:13:00.0/host0/target0:2:0/0:2:0:0] [0:2:1:0] disk LSI MR9261-8i 2.12 /dev/sdb dir: /sys/bus/scsi/devices/0:2:1:0 [/sys/devices/pci0000:00/0000:00:05.0/0000:13:00.0/host0/target0:2:1/0:2:1:0] [0:2:2:0] disk LSI MR9261-8i 2.12 /dev/sdc dir: /sys/bus/scsi/devices/0:2:2:0 [/sys/devices/pci0000:00/0000:00:05.0/0000:13:00.0/host0/target0:2:2/0:2:2:0] . .
I’ve abbreviated the output to just the 3 drives, while the full output shows all 12 and the flash cards as well. Ok, so it’s pretty clear there is the LSI MegaRAID MR9261-8i card, just like the compute nodes.
Lets take a look at what our old friend is doing in the storage cell:
[root@cel01 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -ShowSummary -aALL System OS Name (IP Address) : Not Recognized OS Version : Not Recognized Driver Version : Not Recognized CLI Version : 8.00.23 Hardware Controller ProductName : LSI MegaRAID SAS 9261-8i(Bus 0, Dev 0) SAS Address : 500605b00250ef70 FW Package Version: 12.12.0-0048 Status : Optimal BBU BBU Type : Unknown Status : Healthy Enclosure Product Id : HYDE12 Type : SES Status : OK Product Id : SGPIO Type : SGPIO Status : OK PD Connector : Port 0 - 3<Internal><Encl Pos 0 >: Slot 11 Vendor Id : SEAGATE Product Id : ST360057SSUN600G State : Online Disk Type : SAS,Hard Disk Device Capacity : 557.861 GB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 0 >: Slot 10 Vendor Id : SEAGATE Product Id : ST360057SSUN600G State : Online Disk Type : SAS,Hard Disk Device Capacity : 557.861 GB Power State : Active Connector : Port 0 - 3<Internal><Encl Pos 0 >: Slot 9 Vendor Id : SEAGATE Product Id : ST360057SSUN600G State : Online Disk Type : SAS,Hard Disk Device Capacity : 557.861 GB Power State : Active . . . Storage Virtual Drives Virtual drive : Target Id 0 ,VD name Size : 557.861 GB State : Optimal RAID Level : 0 Virtual drive : Target Id 1 ,VD name Size : 557.861 GB State : Optimal RAID Level : 0 Virtual drive : Target Id 2 ,VD name Size : 557.861 GB State : Optimal RAID Level : 0 . . .
Again, output chopped after 3 drives for brevity. Basically we have 12 Physical Drives mapped to 12 Virtual Drives all with RAID level 0. But each RAID 0 stripe is only across a single drive.
You can even see that the LSI RAID controller has the same 512MB battery backed cache:
[root@cel01 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -cfgDsply -aALL ============================================================================== Adapter: 0 Product Name: LSI MegaRAID SAS 9261-8i Memory: 512MB BBU: Present Serial No: SV03902812 ============================================================================== Number of DISK GROUPS: 12 DISK GROUP: 0 Number of Spans: 1 SPAN: 0 Span Reference: 0x00 Number of PDs: 1 Number of VDs: 1 Number of dedicated Hotspares: 0 Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name : RAID Level : Primary-0, Secondary-0, RAID Level Qualifier-0 Size : 557.861 GB State : Optimal Stripe Size : 1.0 MB Number Of Drives : 1 Span Depth : 1 Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU Access Policy : Read/Write Disk Cache Policy : Disabled Encryption Type : None Physical Disk Information: Physical Disk: 0 Enclosure Device ID: 20 Slot Number: 0 Device Id: 19 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 557.861 GB [0x45bb9000 Sectors] Firmware state: Online, Spun Up SAS Address(0): 0x5000c50028c59721 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST360057SSUN600G08051047E1P6N9 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive: Not Certified . .
Output chopped after 1 drive, as it does not get any more interesting. You can see that again the drives are in writeback mode, which means acknowledgements are given upon data being written to cache as opposed to actually physically on disk – again you’ve got to make sure your batteries are good to give yourself some protection on power failure.
Of course RAID-0 will not give any protection to your devices upon the event of hard disk failure but you can still say it’s true that an Exadata Storage Cell is using hardware RAID.
Joel Goodman has written an excellent account of how two of the 12 drives, the system disks, are used to create the various O/S devices.
We can see the differences between a system drive and a non-system drive with the following:
[root@cel01 /]# fdisk -l Disk /dev/sda: 598.9 GB, 598999040000 bytes 255 heads, 63 sectors/track, 72824 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 15 120456 fd Linux raid autodetect /dev/sda2 16 16 8032+ 83 Linux /dev/sda3 17 69039 554427247+ 83 Linux /dev/sda4 69040 72824 30403012+ f W95 Ext'd (LBA) /dev/sda5 69040 70344 10482381 fd Linux raid autodetect /dev/sda6 70345 71649 10482381 fd Linux raid autodetect /dev/sda7 71650 71910 2096451 fd Linux raid autodetect /dev/sda8 71911 72171 2096451 fd Linux raid autodetect /dev/sda9 72172 72432 2096451 fd Linux raid autodetect /dev/sda10 72433 72521 714861 fd Linux raid autodetect /dev/sda11 72522 72824 2433816 fd Linux raid autodetect
So this is one of the two system drives while a non system drive has the following:
[root@cel01 /]# fdisk -l /dev/sdc Disk /dev/sdc: 598.9 GB, 598999040000 bytes 255 heads, 63 sectors/track, 72824 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdc doesn't contain a valid partition table
So from all these partitons on the system drives we then use mdadm to create software RAID devices by combining partitions from each system drive:
[root@cel01 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/md6 9.9G 4.4G 5.0G 47% / tmpfs 12G 0 12G 0% /dev/shm /dev/md8 2.0G 618M 1.3G 33% /opt/oracle /dev/md4 116M 52M 59M 47% /boot /dev/md11 2.3G 88M 2.1G 4% /var/log/oracle
And we can see that these /dev/md devices are made up from the /dev/sd[a-b] devices:
[root@cel01 ~]# mdadm -Q -D /dev/md6 /dev/md6: Version : 0.90 Creation Time : Fri Dec 31 14:08:30 2010 Raid Level : raid1 Array Size : 10482304 (10.00 GiB 10.73 GB) Used Dev Size : 10482304 (10.00 GiB 10.73 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 6 Persistence : Superblock is persistent Update Time : Fri Nov 11 16:42:07 2011 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 87891e9e:e9bb6307:1e49e958:271166fe Events : 0.4 Number Major Minor RaidDevice State 0 8 6 0 active sync /dev/sda6 1 8 22 1 active sync /dev/sdb6
So while the Exadata storage server does indeed have a hardware RAID capability the O/S on the storage cell is given higher availability by utilising mdadm software RAID. This allows the unused space on the system drives to still be used in the ASM diskgroups.