Human Error

Taking up Doug’s human error challenge, I have a couple of incidents that I have been witness to, though note I was only a witness not a protagonist.

There once was a datacenter that required additonal cooling and while this was being installed (in the roof of the datacenter) there was a lot of dust being generated. A bright spark thought it would be a great idea to cover a RAC cluster in dust sheets to protect them from getting too dusty. Unfortunately this person forgot that the servers need to suck in cool air to keep them from meltdown. Admittedly, it was not quite meltdown, but the shiny RAC cluster did overheat causing all nodes to shutdown. A good indication that RAC does not necessarily increase your availability too much. Thankfully, the servers had shut themselves down cleanly and did come back.

Then there was a nice shiny new datacenter that a company was rightly proud of, so decided to conduct guided tours. A dba was showing people round one day, and again decided to point out where a RAC cluster (why is it always the cluster that gets it?) was situated, except this dba got too close to the node he was pointing out. At first the lights of the node were still on but a minute later there was a bit less noise in the datacenter, and one tour was drawn to a swift conclusion.


4 thoughts on “Human Error

  1. I am the bright spark that Jason is accusing of human error. Whilst he is correct, a dust sheet was placed over the rac cluster that restricted airflow, I would like to point out a few things in my defense.

    1. The project in question was actually to remove old aircon units from the ceiling so that the room could be made fire proof.
    2. The datacenter in question was not really a datacenter it was more a room that had just become full of computers. It was not designed as a datacenter.
    3. The company management decided (against my advice) that they didn’t want to build a new computer room and instead they wanted me to make the existing room better cooled and fire proof.
    4. The companies preferred contractors were fools who would not listen to the simplest of instructions.
    5. I foolishly thought that they were competent and understood the importance of the equiptment in the room.
    6. My instructions to them were to place a dust sheet over the top of the two rows of racks and not over the front of a rack.
    7. Management did not take seriously my warnings about the risk of dust being generated and workmen being in the room un-supervised.
    8. Management did not think it was necessary for an SA to be present in the room while workmen were working in there.
    9. The old aircon units that were being removed should never have been installed in the first place. They were not suitable for the job they were doing.

    I did make mistakes and If I did it again I would do things differently – I would treat the workmen like idiots and double check every step of their work. (This was not the only thing that was done wrong) I would insist on an SA being in the room at all times. I would insist on any changes to the plan being approved by me first and not just carried out at the whim of the contractor. I would also expect a great deal more of my management when raising real and serious issues about the risks involved.

  2. Hey John, thanks for stopping by and clearing that up for us. There was a no name/no shame thing going on, but now you have outed yourself, never mind.

    Someone has already suggested that you perhaps protest too much?

    Best letting the past go, this was just a bit fun posting, no reputations were meant to be traduced in the making of this.

  3. Yes that’s right, no names. Imagine what would happen to my golden reputation if it ever got out that it was me who was responsible for turning off the database server in the other incident described by Jason.

  4. Howdy Pat,

    Glad to see you dropping by. I fear you may have let the cat out the bag, mr. deputy co chair of the Unix Sig.

    Part of me thinks your antics indicate that a dba should never get closer to the kit than a sqlplus terminal.

    We won’t mention the developer server you powered of by accident when investigating another issue.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s