Exadata X2-2 Compute Node Drive Protection

There are indeed a few differences between an Exadata V2 and an X2-2 box and one of the differences is the fact that the compute nodes build LVM logical volumes on top of the hardware enabled RAID device built using the same technology as V2.

First off, how can we tell we are working on an X2 system as opposed to V2? You can tell this from the output of the following command:


[root@db01 ~]# dmidecode -s system-product-name

SUN FIRE X4170 M2 SERVER  

The key to the above is that the X2 uses the X4710 M2 version, while the V2 is output will not have the M2 part, though is still an X4170.

The M2 still has the exact same LSI MegaRAID controller. There is a nice way of summarising your configuration:

[root@db01 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -ShowSummary -aALL
                                     
System
        OS Name (IP Address)       : Not Recognized
        OS Version                 : Not Recognized
        Driver Version             : Not Recognized
        CLI Version                : 8.00.23

Hardware
        Controller
                 ProductName       : LSI MegaRAID SAS 9261-8i(Bus 0, Dev 0)
                 SAS Address       : 500605b00292ed90
                 FW Package Version: 12.12.0-0048
                 Status            : Optimal
        BBU
                 BBU Type          : Unknown
                 Status            : Healthy
        Enclosure
                 Product Id        : SGPIO           
                 Type              : SGPIO
                 Status            : OK

        PD 
                Connector          : Port 0 - 3<Internal>: Slot 3 
                Vendor Id          : HITACHI 
                Product Id         : H103030SCSUN300G
                State              : Dedicated HotSpare
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 278.875 GB
                Power State        : Active

                Connector          : Port 0 - 3<Internal>: Slot 2 
                Vendor Id          : HITACHI 
                Product Id         : H103030SCSUN300G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 278.875 GB
                Power State        : Active

                Connector          : Port 0 - 3<Internal>: Slot 0 
                Vendor Id          : HITACHI 
                Product Id         : H103030SCSUN300G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 278.875 GB
                Power State        : Active

                Connector          : Port 0 - 3<Internal>: Slot 1 
                Vendor Id          : HITACHI 
                Product Id         : H103030SCSUN300G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 278.875 GB
                Power State        : Active

Storage

       Virtual Drives
                Virtual drive      : Target Id 0 ,VD name DBSYS
                Size               : 557.75 GB
                State              : Optimal
                RAID Level         : 5 


Exit Code: 0x00

This nicely shows you the state of the Physical Drives (PD), shows you have a HotSpare, and also shows the Virtual Drive created on top of this with what RAID level has been used. Remember, this is just the same as a V2. This summary command is not available with MegaCli-5.00 version of the MegaCli rpm, but it is available with version 8.00 of the rpm.

However a simple df -h command on X2 shows quite a difference from a V2:

[root@db01 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
                       30G  6.8G   22G  25% /
/dev/sda1             124M   36M   82M  31% /boot
/dev/mapper/VGExaDb-LVDbOra1
                       99G   20G   74G  22% /u01
tmpfs                  81G  615M   80G   1% /dev/shm

Now this is quite different. We know as usual the LSI RAID controller presents the one device /dev/sda:

[root@db01 ~]# fdisk -l

Disk /dev/sda: 598.8 GB, 598879502336 bytes
255 heads, 63 sectors/track, 72809 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          16      128488+  83  Linux
/dev/sda2              17       72809   584709772+  8e  Linux LVM

So 2 partitions created on top of this device, one is presented as the /boot partition. While the other is being used for the LVM.

[root@db01 ~]# pvdisplay 
  --- Physical volume ---
  PV Name               /dev/sda2
  VG Name               VGExaDb
  PV Size               557.62 GB / not usable 1.64 MB
  Allocatable           yes 
  PE Size (KByte)       4096
  Total PE              142751
  Free PE               103327
  Allocated PE          39424
  PV UUID               KYm3PX-4V3W-9T5L-QZBC-mW0n-jEDJ-QrJszz

This output ties in quite nicely with the fdisk output. One PV (Physical Volume) /dev/sda2, which has the VG (Volume Group) VGExaDb created on it.

[root@db01 ~]# lvdisplay
  --- Logical volume ---
  LV Name                /dev/VGExaDb/LVDbSys1
  VG Name                VGExaDb
  LV UUID                pJFeiN-Kqa4-VMpS-YYH0-BrY4-baKd-XivOZE
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                30.00 GB
  Current LE             7680
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0
   
  --- Logical volume ---
  LV Name                /dev/VGExaDb/LVDbSwap1
  VG Name                VGExaDb
  LV UUID                fnP95e-qCR9-Z2PT-faHR-HRI6-ReMX-btfeCU
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                24.00 GB
  Current LE             6144
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:1
   
  --- Logical volume ---
  LV Name                /dev/VGExaDb/LVDbOra1
  VG Name                VGExaDb
  LV UUID                LbuHDM-GSeK-fTRJ-Mgti-xDqz-siK5-yWEj4g
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                100.00 GB
  Current LE             25600
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2

We can see the Volume Group VGExaDb has multiple Logical Volumes (LV) created upon it. These logical volumes are mapped to the /dev/mapper devices that you can see in the df output:

[root@db01 ~]# ls -ltar /dev/VGExaDb/LVDb*
lrwxrwxrwx 1 root root 28 Sep 14 11:57 /dev/VGExaDb/LVDbSys1 -> /dev/mapper/VGExaDb-LVDbSys1
lrwxrwxrwx 1 root root 29 Sep 14 11:57 /dev/VGExaDb/LVDbSwap1 -> /dev/mapper/VGExaDb-LVDbSwap1
lrwxrwxrwx 1 root root 28 Sep 14 11:57 /dev/VGExaDb/LVDbOra1 -> /dev/mapper/VGExaDb-LVDbOra1

I’m Not really sure what using LVM gives you in place of creating the simple partitions, but there is a large chunk of unallocated space that could be used to extend the LV’s if you wanted.

In fact you can tell how much free space you have with the following:

[root@db01 ~]# vgdisplay -s
  "VGExaDb" 557.62 GB [154.00 GB used / 403.62 GB free]

You can use lvextend to utilise this free space on the LVs but this is not something that can be done while oracle is still running on the node in question.

There you have it the X2-2 compute nodes have LVM logical volumes created on top of the same old LSI MegaRAID hardware RAID device.

Advertisements

11 thoughts on “Exadata X2-2 Compute Node Drive Protection

  1. In my opinion, the advantage of using LVM is obvious: any logical volume can be enlarged with the free space available in the volume group, whilst only the last partition can be enlarged without LVM.

    Of course you can move the entire partition, but that requires much more effort than simply growing the logical volume. If you need more space for any logical volume, it’s quite probable the other’s need space too. Trying to arrange that without LVM will be messy. And it’s simple to extend the volume group with an additional disk, which can be used for the logical volumes, etc.

  2. Hi Frits,

    Sure, I can see that LVM gives more flexibility, but why leave an ocean of free space in the volume group? Also you need to stop the node to extend the / and you need to stop the instance/GI to extend the /u01. Seems nuts to me.

    Is that supported adding an additional disk?

    jason.

  3. Yes, I’ve been puzzled by how often LVM is used on server builds. Maybe it’s because it’s default in the EL installer, but for a standardised server installation (especially in a clustered or VM environment) do you ever need to resize, e.g. root? Unless you have some fancy need for LVM (say, for a DR approach) I don’t see the point of putting another layer in between the applications and the disks when you’ve got a hardware RAID controller. Alternatively ditch the hardware RAID and do the RAID in software (e.g. like ZFS).

    Interesting post Jason – keep up the good work!

    • Hi Simon,

      Yep, at first I had assumed the LVM was because they had ditched the LSI MegaRAID in the X2. As for resizing, the boxes I’ve seen do have oceans of space to use to resize, but why not just set up from the outset with that space on a filesystem?

      jason.

  4. Good evening – now that it comes to LVM I’d like to jump in, but it’s not necessarily Exadata related.

    I like to build all the servers with LVM for the Oracle binaries, and possibly for the root volume group as well.

    Software releases get bigger and bigger with every release cycle and I like to be able to extend my mount points if needed. Now add out-of-place upgrades and you might be really short on space one day. The nice thing about LVM though is that you can add an extra LUN to it and extend the volume group that way. I have successfully resized LVM based file systems including root online and love the flexibility it gives me. Leaving a lot of free space is something I often see with the internal RAID controllers for HP servers (cciss), but since you don’t do anything else with that doesn’t matter too much.

    Interesting to see how it’s done with the X2-2!

    Martin

  5. Starting with Exadata 11.2.1.3.1, LVM is used to configure the partitions for the file systems on compute nodes.It don’t care X2 or V2.

  6. Hi John,

    I noticed that about LVM and 11.2.1.3.1 in the documentation, however, I have access to an Exadata V2 that is at 11.2.2.2.0 and it most definitely does not have LVM. So perhaps if the initial install was at 11.2.1.3.1 or higher it would have had LVM, but you can have a V2 at higher than 11.2.1.3.1 and still not have LVM.

    jason.

  7. Yes, it is dependent on the version that the compute node was imaged with. 11.2.1.3.1 included quite a few changes. It introduced out of place patching, OEL 5.5 instead of 5.3, and LVM on the compute nodes. Because the minimal (formerly convenience) packs do not reimage the compute nodes, these big changes don’t usually take effect. You can check your factory image by running the imagehistory command.

    As for LVM, I like that they leave the space open, so that you can decide if you need to use it for LVM snapshots, additional filesystems, or expand /u01. While I generally end up adding it to /u01, I have had to do other things with the space for various customers.

    Unfortunately, Oracle doesn’t support modifying the hardware in any way. This includes adding hard drives to the compute nodes.

    • Hi Andy,

      Thanks for reading!

      Yes, the V2’s I have seen were imaged at earlier than 11.2.1.3.1.

      I take your point on the flexibility introduced by LVM – giving the customer a choice is a *good* thing, but I wonder how many of your customers have done nothing with the space and leave it not utilised?

      cheers,

      jason.

  8. I believe that space holds the solaris x86 system and you need to actual get it back using the reclaim.sh script. So when you chose linux over solaris the free space was created and your existing pv and was extended.However extending lv and either extending your filesystem or adding a new one are left up to you do chose as you please. you might want to keep logs, /var etc on the space so freed up…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s