When Oracle Patching Goes Wrong – OUI-67124

Sometimes things never quite go according to plan. You can test, and test in a UAT or dev environment, but just sometimes, something comes out of left field when you come to roll it into production. Just such an issue appeared when I was really rolling a patch into production.

This was Exadata BP 11, but it is not really exadata specific. The bundle patch was being applied with the opatch auto command and it all appeared to be going well, and no indication of an issue appeared in the window where I was applying the patch, but when I checked how many patches were installed in the GI home, instead of seeing the following:

db01(oracle):+ASM1:oracle$ /u01/app/oracle/product/ lsinventory  |grep -i applied

Patch  12914289     : applied on Sat Nov 12 14:14:39 GMT 2011 
Patch  12421404     : applied on Sat Nov 12 14:12:01 GMT 2011 
Patch  12902308     : applied on Sat Nov 12 13:03:27 GMT 2011

I found only the 12902308 patch applied to the GI home. I knew this bundle patch required that I was left with 3 patches applied to the GI home so I knew something had gone awry.

Looking into the log file for the patch application eventually revealed the following:

 The following warnings have occurred during OPatch execution: 
 1) OUI-67303: 
 Patches [   12419090 ] will be rolled back.

 2) OUI-67124:Copy failed from '/u01/app/oracle/BP11/12902308/12421404/files/bin/crsctl.bin' to '/u01/app/oracle/product/'...

 3) OUI-67124:ApplySession failed in system modification phase... 'ApplySession::apply failed: Copy failed from '/u01/app/oracle/BP11/12902308/12421404/files/bin/crsctl.bin' to '/u01/app/oracle/product/'...

So now we can see where the issue occurred, but we still need to work out how to fix it. Checking out the file with ls all looked fine, and permissions seemed to look good too.

I’d also like to point out at this point that using opatch auto it is meant to take care of shutting down the GI stack cleanly and basically automating the application of the patch to both the GI and RDBMS homes.

fuser to the rescue

Last idea was to check if there was any processes using this file, a simple ps was not giving any clue that something was running from this $ORACLE_HOME, though there were lots of processes owned by the oracle user, nothing was obviously running from the GI home. One excellent way of finding out if a process is using a particular file or filesystem is fuser. I ran this and saw the following:

fuser -c /u01/app/oracle/product/

/u01/app/oracle/product/  1106c  2569c  3493c  4348c  4865c  5863c  5887c  6666c  6739c  7036c  7230c  7299c  7303c  8411  8428c  8487c  8545c  8642c  9462c  9754c 10634c 10710c 11278c 11413ce 11919c 12344 12907c 13550c 13674c 14992 15166c 15480c 15987 16282c 16421c 16982c 17390c 17500c 17860c 17932c 18162c 18373c 18667c 19065c 19980c 20017c 20019c 20115c 20139c 20441c 20594c 20942c 21202c 21305c 21761c 21825c 24599c 24792c 

Ouch! That is a lot of processes that were using this executable, looking at a few they seemed to be ssh processes owned by oracle off to other servers. It seemed a bit of a pain to go through them all killing each one individually, and this is where fuser comes to the rescue again!

fuser -ck /u01/app/oracle/product/

The -k flag kills all processes accessing this file. Now I could try manually applying the missing patches:

Manually Patching GI

This is well documented but bears repeating, when you are attempting to manually apply a patch (or indeed rollback a patch) to the GI Home you have to unlock the home as root. You need to run the following:

# /u01/app/oracle/product/ -unlock

Now as the oracle user you descend to your patch directory and apply the patch with a simple opatch apply (or napply) and once you have applied all the patches to the GI Home you need to lock the GI Home again. Once more you need to run as root:

# /u01/app/oracle/product/ -patch

Avoiding these steps is certainly one advantage of the opatch auto, I just wish it made it a bit obvious when it failed to apply every patch to a home!

5 thoughts on “When Oracle Patching Goes Wrong – OUI-67124

  1. Bundle Path 11? 11! I won’t even say a word about that.

    I do know from past experience,how many changes are crammed into these bundles though. Are Exadata customers just taking the bundles as a matter of practice or is your customers in specific need of a fix contained in this bundle?

    Nice blog post by the way. One very difficult thing to get right for Oracle (or anyone) is the begin-state for patch application. In my experience it seems a lot of vendor-side testing of patch application starts from a freshly booted configuration. What was the start-state specified in the patch release notes? I’m just curious.

  2. Hi Kevin,

    Yep, there are a surprising number of bug fixes per BP. This particular environment takes the BP as routine each quarter.

    So, this was applied via opatch auto. The instructions mention nothing about shutting anything down, though we happened to have db down.



    • Routinely patching would ordinarily be utterly foolish but in the case of Exadata it is actually the opposite.

      Exadata bundle patches are an unfortunate necessity. Everything from wrong SQL results to data corruption / data loss and other such excruciating bugs get addressed in these bundles and, most importantly, it takes a bundle patch to fix bad patches in previous bundle patches. That’s just the state of the technology. I do wish Oracle would stop “releasing” new features in bundle patches though. Yes, Oracle treats bundle patches as release vehicles for new features where Exadata is concerned. That’s pretty frightening from a conventional viewpoint, but likely not as frightening as running old software in production that Oracle does not have running anywhere in their labs (e.g., BP3 or some other such vintage). Exadata customers need to be running the same software as what Oracle has in labs. That also means a rush to adopt 12(whatever) will also be the unconventional wisdom. You have to basically stop thinking out of the box when using Exadata.

  3. >and no indication of an issue appeared in the window where I was applying the patch
    Ouch ! That is bad. A DBA might just “sign-off” “BP APPLIED SUCCESSFULLY WITHOUT ERRORS” (meaning that the opatch did not fail).

    Does the documentation / readme provide clear instructions about what to do before applying the patch and what to do after having executed opatch apply ?

  4. Hello Jason,
    In response to your tweet:
    In our case of patching the cell to, sailed into some rough weather.

    After applying the minimal pack patch in the db nodes, the nodes went into kernel panic loops.

    The lvm.conf in the initrd(2.6.18-238) was diagnosed to be incorrect. Copy a proper lvm.conf from the original initrd image and rebuild it. Basically the lvm from a good initrd database was copied into the latest initrd.

    Performed Actions
    1. extract the initrds
    # cd /boot
    # mkdir x y
    # cd x
    # zcat ../initrd-2.6.18-238*.img | cpio -idmv
    # cd ../y
    # zcat ../initrd-2.6.18-194*.img | cpio -idmv
    # cd ..

    2. replace the lvm.conf
    # mv x/etc/lvm/lvm.conf{,.orig}
    # cp y/etc/lvm/lvm.conf x/etc/lvm/

    3. rebuild a new initrd
    # cd x
    # find ./ | cpio -H newc -o > /boot/initrd-new.img

    4. modify the grub setting

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s