Exadata flash storage is provided by the Sun Flash Accelerator F20 PCIe card shown above. Four of these cards are installed in every Exadata storage cell. There is a Documentation set available to peruse.
First, we can see these devices using lspci:
[root@cel01 ~]# lsscsi |grep MARVELL [8:0:0:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdn [8:0:1:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdo [8:0:2:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdp [8:0:3:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdq [9:0:0:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdr [9:0:1:0] disk ATA MARVELL SD88SA02 D20Y /dev/sds [9:0:2:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdt [9:0:3:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdu [10:0:0:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdv [10:0:1:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdw [10:0:2:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdx [10:0:3:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdy [11:0:0:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdz [11:0:1:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdaa [11:0:2:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdab [11:0:3:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdac
You can see they are bunched into 4 groups of 4 8:, 9:, 10:, and 11: This is the fact that the 4 cards each have 4 FMOD, so on every exadata the flash is presented as 16 separate devices.
We can also use the flash_dom command:
[root@cel01 ~]# flash_dom -l
Aura Firmware Update Utility, Version 1.2.7
Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved..
U.S. Government Rights - Commercial Software. Government users are subject
to the Sun Microsystems, Inc. standard license agreement and
applicable provisions of the FAR and its supplements.
Use is subject to license terms.
This distribution may include materials developed by third parties.
Sun, Sun Microsystems, the Sun logo, Sun StorageTek and ZFS are trademarks
or registered trademarks of Sun Microsystems, Inc. or its subsidiaries,
in the U.S. and other countries.
HBA# Port Name Chip Vendor/Type/Rev MPT Rev Firmware Rev IOC WWID Serial Number
1. /proc/mpt/ioc0 LSI Logic SAS1068E C0 105 011b5c00 0 5080020000fe34c0 465769T+1130A405XA
Current active firmware version is 011b5c00 (1.27.92)
Firmware image's version is MPTFW-01.27.92.00-IT
x86 BIOS image's version is MPTBIOS-6.26.00.00 (2008.10.14)
FCode image's version is MPT SAS FCode Version 1.00.49 (2007.09.21)
D# B___T Type Vendor Product Rev Operating System Device Name
1. 0 0 Disk ATA MARVELL SD88SA02 D20Y /dev/sdn [8:0:0:0]
2. 0 1 Disk ATA MARVELL SD88SA02 D20Y /dev/sdo [8:0:1:0]
3. 0 2 Disk ATA MARVELL SD88SA02 D20Y /dev/sdp [8:0:2:0]
4. 0 3 Disk ATA MARVELL SD88SA02 D20Y /dev/sdq [8:0:3:0]
2. /proc/mpt/ioc1 LSI Logic SAS1068E C0 105 011b5c00 0 5080020000fe3440 465769T+1130A405X7
Current active firmware version is 011b5c00 (1.27.92)
Firmware image's version is MPTFW-01.27.92.00-IT
x86 BIOS image's version is MPTBIOS-6.26.00.00 (2008.10.14)
FCode image's version is MPT SAS FCode Version 1.00.49 (2007.09.21)
D# B___T Type Vendor Product Rev Operating System Device Name
1. 0 0 Disk ATA MARVELL SD88SA02 D20Y /dev/sdr [9:0:0:0]
2. 0 1 Disk ATA MARVELL SD88SA02 D20Y /dev/sds [9:0:1:0]
3. 0 2 Disk ATA MARVELL SD88SA02 D20Y /dev/sdt [9:0:2:0]
4. 0 3 Disk ATA MARVELL SD88SA02 D20Y /dev/sdu [9:0:3:0]
.
.
The output above has been edited for brevity. You can even have a look at the devices /proc/mpt/ioc1 on the filesystem.
We can also of course look at these devices via cellcli:
CellCLI> list physicaldisk where diskType='FlashDisk'
FLASH_1_0 1113M086V3 normal
FLASH_1_1 1113M086V4 normal
FLASH_1_2 1113M086V0 normal
FLASH_1_3 1113M086UY normal
FLASH_2_0 1113M0892K normal
FLASH_2_1 1113M086TR normal
FLASH_2_2 1113M0891P normal
FLASH_2_3 1113M0892L normal
FLASH_4_0 1113M086UP normal
FLASH_4_1 1113M086UQ normal
FLASH_4_2 1113M086UT normal
FLASH_4_3 1113M086UN normal
FLASH_5_0 1113M08AGJ normal
FLASH_5_1 1112M07V6U normal
FLASH_5_2 1113M08AKJ normal
FLASH_5_3 1113M08AH5 normal
Again presented as 4 lots of 4 and disktype of FlashDisk. Looking in on the detail of one of the flashdisks:
CellCLI> list physicaldisk where diskType='FlashDisk' detail
name: FLASH_5_3
diskType: FlashDisk
errCmdTimeoutCount: 0
errHardReadCount: 0
errHardWriteCount: 0
errMediaCount: 0
errOtherCount: 0
errSeekCount: 0
luns: 5_3
makeModel: "MARVELL SD88SA02"
physicalFirmware: D20Y
physicalInsertTime: 2011-12-07T19:00:02+00:00
physicalInterface: sas
physicalSerial: 1113M08AH5
physicalSize: 22.8880615234375G
sectorRemapCount: 0
slotNumber: "PCI Slot: 5; FDOM: 3"
status: normal
I’ve edited the above for just the detail on the FLASH_5_3 device, basically the last FDOM slot on the highest numbered PCI slot. You can see the size of each of the FDOMs at 22.8880615234375G which multiplied by 16 gives 366.21G.
We can also look at the lun level:
CellCLI> list lun where id='5_3' detail
name: 5_3
cellDisk: FD_15_cel01
deviceName: /dev/sdy
diskType: FlashDisk
id: 5_3
isSystemLun: FALSE
lunAutoCreate: FALSE
lunSize: 22.8880615234375G
overProvisioning: 100.0
physicalDrives: FLASH_5_3
status: normal
You can see each lun has a celldisk name associated with it, and a sensible naming convention. Finally drilling down into the celldisk detail:
CellCLI> list celldisk where name='FD_15_cel01' detail
name: FD_15_cel01
comment:
creationTime: 2012-01-10T10:13:06+00:00
deviceName: /dev/sdy
devicePartition: /dev/sdy
diskType: FlashDisk
errorCount: 0
freeSpace: 0
id: 8ddbd2c8-8446-4735-8948-d8aea5744b35
interleaving: none
lun: 5_3
size: 22.875G
status: normal
The final point of interest on the flash cards is the white part, middle top on the card. That is the Energy Storage Module (ESM), and it has a set lifetime. According the F20 docs on a V2 it’s lifetime was expected at 3 years. You can monitor the health and lifetime of your modules with the following ipmi command:
[root@cel01 ~]# for RISER in RISER1/PCIE1 RISER1/PCIE4 RISER2/PCIE2 RISER2/PCIE5; do ipmitool sunoem cli "show /SYS/MB/$RISER/F20CARD/UPTIME"; done
Connected. Use ^D to exit.
-> show /SYS/MB/RISER1/PCIE1/F20CARD/UPTIME
/SYS/MB/RISER1/PCIE1/F20CARD/UPTIME
Targets:
Properties:
type = Power Unit
ipmi_name = PCIE1/F20/UP
class = Threshold Sensor
value = 9844.000 Hours
upper_nonrecov_threshold = 26220.000 Hours
upper_critical_threshold = 25806.000 Hours
upper_noncritical_threshold = 25254.000 Hours
lower_noncritical_threshold = N/A
lower_critical_threshold = N/A
lower_nonrecov_threshold = N/A
alarm_status = cleared
Commands:
cd
show
-> Session closed
Disconnected
I’ve edited the output above to just one riser card, just to prevent boredom. You are looking to ensure the value , here showing value = 9844.000 Hours is less than the upper_noncritical_threshold, which in this case it is. Otherwise have the ESM replaced if this value is greater than the threshold.
So far I’ve found the flash cards on both V2 and X2-2 to be very reliable, I’d be interested in hearing other thoughts on their reliability.


Andy Colvin (@acolvin)
/ March 15, 2012From our experience, replacements of flash cards have been few and far between, especially when compared to the loss of hard drives. Out of all the systems I’ve worked on, most have lost at least one disk over time, and I could count on one hand the number that needed a flash card replacement. Also, it’s nice that you get a spare flash card in case one goes out.
jarneil
/ March 15, 2012Hi Andy,
thanks for confirming! Oldest V2′s I’m managing are at their 2nd birthday. Hope the cards see out their 3rd!
kevinclosson
/ March 15, 2012These devices are very reliable in current generation. I remember the shaky days though!
jarneil
/ March 15, 2012Kevin, I’ve heard some interesting stories on fusion I/O from circa 2 years ago and it sounded a bit challenging on the reliability.
kevinclosson
/ March 16, 2012Hi Jason,
All systems have bugs.
I’m not a Fusion I/O expert. I have never touched a system with fusion on it (although my friends at Fusion I/O would have it another way if possible
).
I pointed out that these cards are pretty reliable these days. I shouldn’t think my earlier reply should spawn one of those but-so-and-so-suck-too sort of threads.
My view on flash is quite simple: applying it as a cache is a fad. And, yes, I am aware that EMC has a product in this space (VFCache). That fact doesn’t curb my viewpoint. The word “fad” is not so pejorative as it may sound. VME was a fad. Are there any systems that still support a VME bus for main bus or even peripheral attach? Nope.
It is also my view that Exadata architecture is a fad. It took me a few years of toiling with the technology to come to that conclusion, but as my recent posts show I can make a pretty good argument in favor of using plain old Oracle Database (+RAC) for extreme high-bandwidth query processing. Do I say Oracle Database is a fad? No. It remains a very good technology that can scale to exploit high bandwidth storage. The problem with Exadata is the fact that it does not possess as favorable scalability characteristics as RAC *without* Exadata and I’ve made that point very clear on more than one occasion. It’s really quite simple. If you chop off filtration, relegate it to a set of servers on the other side of a miserably slow (compared to a system bus), low bandwidth IB data path separate from joins/agg/sort you have a bottleneck. I don’t like bottlenecks. Never did. Never will. Sure, today’s Exadata offers more in-bound data bandwidth to the RAC grid than you’d get if you attached low-bandwidth conventional storage. That should be obvious. But it is quite simple with today’s technology to attach ample conventional storage (data flow) to totally obliterate host CPUs precessing complex queries. And, in DW/BI, all that really matters is plumbing sufficient data flow to busy up the CPUs you can afford to license (RAC).
Time will tell but one this is for certain. I wouldn’t be typing these words if Larry Ellison hadn’t squandered the war chest on Sun. The face of Exadata would be ***entirely*** different. It would remain to this day (what it still actually is) a software solution (cellsrv) portable to pretty much any system with a C++ compiler. Best of breed competitors would battle to have the best Exadata implementation. Oracle would still have partners, Oracle customers would have choice ( and less strong-arm sales tactics to suffer through), and Oracle’s quarterly earnings calls would not be so, um, uncomfortable. But most importantly we would likely have never seen an advertisement claiming an Exadata rack is the world’s “First OLTP Machine.”
And, as they say, is that.
ashminder ubhi
/ March 16, 2012Watch out for this bug:
Bug 13454147 : FLASH CARDS DISAPPEAR AFTER 6 MONTHS OF UPTIME
Niall Litchfield (@nlitchfield)
/ March 22, 2012I guess there’s an upside to Exadata PatchMadness then
jarneil
/ March 22, 2012Only upside I’ve seen is the overtime payments
A few squashed bugs for the customer as well. Lets not talk about the newly introduced ones though.