Exadata Batteries

Andy Colvin has a good post highlighting the importance of making sure your batteries are operating with enough charge to ensure that the drive policy is in writeback as opposed to writethrough.

I just wanted to add a small addendum to that posting. I have seen severe issues with the MegaRaid controller going into writethrough mode. It is particularly crucial on the compute nodes. Under some circumstances it can lead to the drives on the compute node suffering disk corruption. I have felt the pain of this leading to a so called Bare Metal Restore of the affected node.

I’ve also had the pleasure of being involved with the replacement of around 50 Exadata V2 batteries. In the last couple of months. This is almost certainly due to the age of the batteries. The batteries will have been due to be replaced in all Exadata’s after 2 years, but these batteries just failed to make the distance.

One of the MegaCLI commands Andy highlighted provides a wealth of information:


[root@db01 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0

BBU status for Adapter: 0

BatteryType: iBBU08 
Voltage: 4040 mV 
Current: 0 mA 
Temperature: 50 C

BBU Firmware Status:

  Charging Status              : None 
  Voltage                      : OK 
  Temperature                  : OK 
  Learn Cycle Requested        : No 
  Learn Cycle Active           : No 
  Learn Cycle Status           : OK 
  Learn Cycle Timeout          : No 
  I2c Errors Detected          : No 
  Battery Pack Missing         : No 
  Battery Replacement required : No 
  Remaining Capacity Low       : No 
  Periodic Learn Required      : No 
  Transparent Learn            : No

Battery state:

GasGuageStatus: 
  Fully Discharged        : No 
  Fully Charged           : No 
  Discharging             : No 
  Initialized             : Yes 
  Remaining Time Alarm    : No 
  Remaining Capacity Alarm: Yes 
  Discharge Terminated    : No 
  Over Temperature        : No 
  Charging Terminated     : No 
  Over Charged            : No

Relative State of Charge: 100 % 
Charger System State: 1 
Charger System Ctrl: 0 
Charging current: 0 mA 
Absolute state of charge: 0 % 
Max Error: 0 %

BBU Capacity Info for Adapter: 0

Relative State of Charge: 100 % 
Absolute State of charge: 87 % 
Remaining Capacity: 1341 mAh 
Full Charge Capacity: 1353 mAh 
Run time to empty: Battery is not being discharged 
Average time to empty: 161 min 
Average Time to full: Battery is not being charged 
Cycle Count: 2 
Max Error: 0 % 
Remaining Capacity Alarm: 0 mAh 
Remaining Time Alarm: 0 Min


BBU Design Info for Adapter: 0

Date of Manufacture: 06/02, 2011 
Design Capacity: 1530 mAh 
Design Voltage: 4100 mV 
Specification Info: 0 
Serial Number: 2080 
Pack Stat Configuration: 0x0000 
Manufacture Name: LS36681 
Device Name: bq27541 
Device Chemistry: LPMR 
Battery FRU: N/A


BBU Properties for Adapter: 0

Auto Learn Period: 2592000 Sec 
Next Learn time: 384645185 Sec 
Learn Delay Interval:0 Hours 
Auto-Learn Mode: Enabled

Exit Code: 0x00

This is from a V2 that has had it’s battery replaced. First thing to highlight is the battery type:

BatteryType: iBBU08

Earlier batteries were 07 and were perhaps less longer lasting than the full 2 years before a preventative maintenance was due. I’d be extra vigilant if you have the 07 model. It will show as iBBU on a storage cell and unknown on a compute node.

Next up is the temperature:

Temperature: 50 C

You really want to ensure this is under 55C or there is something either wrong with the environment (use ipmitool to check the ambient temperature) or the battery is overheating.

You can tell if your battery is charging with either the:

Charging Status : None

or the

Average Time to full: Battery is not being charged

Output’s would show charging and a time to full if it was charging. One possible reason for a low battery charge is a learn cycle.

Charge capacity determines whether the writeback or writethrough mode is in use:

Full Charge Capacity: 1353 mAh

This is a relatively new battery and has a good amount of charge.

As you start approaching going below 700 mAH you may want to take proactive action and schedule a battery replacement.

You also want to ensure the Max error is down low:

Max Error: 0 %

Last thing I’m going to highlight is the battery manufacture:

Device Chemistry: LPMR

While on an X2-2 it displays:

Device Chemistry: LION

Both display the bq27541 used for determining the charge level. Apart from this line, there appears little difference between the output on a V2 and an X2.

Just to reemphasise keep an eye on your batteries and make sure your MegaRaid Controller is in writeback!

About these ads
Leave a comment

7 Comments

  1. Have you checked the elastic bands in the compute nodes recently? There’s probably an iLO command to return the Young’s modulus value. Note this is especially important on Exalogic Elastic Cloud machines too… ;-)

    Reply
  2. Wonderful work! This is the type of info that should be shared around the web. Shame on Google for not positioning this post higher!

    Reply
  3. How long does it take to physically replace the battery?

    Reply
    • jarneil

       /  October 7, 2013

      30 minutes or thereabouts. I find the longest part of the procedure is to shutdown the instances/server.

      Reply
  1. Megaraid i2c | Gwinnettgas

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 52 other followers

%d bloggers like this: