Sometimes things never quite go according to plan. You can test, and test in a UAT or dev environment, but just sometimes, something comes out of left field when you come to roll it into production. Just such an issue appeared when I was really rolling a patch into production.
This was Exadata BP 11, but it is not really exadata specific. The bundle patch was being applied with the opatch auto command and it all appeared to be going well, and no indication of an issue appeared in the window where I was applying the patch, but when I checked how many patches were installed in the GI home, instead of seeing the following:
db01(oracle):+ASM1:oracle$ /u01/app/oracle/product/188.8.131.52/grid/OPatch/opatch lsinventory |grep -i applied Patch 12914289 : applied on Sat Nov 12 14:14:39 GMT 2011 Patch 12421404 : applied on Sat Nov 12 14:12:01 GMT 2011 Patch 12902308 : applied on Sat Nov 12 13:03:27 GMT 2011
I found only the 12902308 patch applied to the GI home. I knew this bundle patch required that I was left with 3 patches applied to the GI home so I knew something had gone awry.
Looking into the log file for the patch application eventually revealed the following:
-------------------------------------------------------------------------------- The following warnings have occurred during OPatch execution: 1) OUI-67303: Patches [ 12419090 ] will be rolled back. 2) OUI-67124:Copy failed from '/u01/app/oracle/BP11/12902308/12421404/files/bin/crsctl.bin' to '/u01/app/oracle/product/184.108.40.206/grid/bin/crsctl.bin'... 3) OUI-67124:ApplySession failed in system modification phase... 'ApplySession::apply failed: Copy failed from '/u01/app/oracle/BP11/12902308/12421404/files/bin/crsctl.bin' to '/u01/app/oracle/product/220.127.116.11/grid/bin/crsctl.bin'...
So now we can see where the issue occurred, but we still need to work out how to fix it. Checking out the file with ls all looked fine, and permissions seemed to look good too.
I’d also like to point out at this point that using opatch auto it is meant to take care of shutting down the GI stack cleanly and basically automating the application of the patch to both the GI and RDBMS homes.
fuser to the rescue
Last idea was to check if there was any processes using this file, a simple ps was not giving any clue that something was running from this $ORACLE_HOME, though there were lots of processes owned by the oracle user, nothing was obviously running from the GI home. One excellent way of finding out if a process is using a particular file or filesystem is fuser. I ran this and saw the following:
fuser -c /u01/app/oracle/product/18.104.22.168/grid/bin/crsctl.bin /u01/app/oracle/product/22.214.171.124/grid/bin/crsctl.bin: 1106c 2569c 3493c 4348c 4865c 5863c 5887c 6666c 6739c 7036c 7230c 7299c 7303c 8411 8428c 8487c 8545c 8642c 9462c 9754c 10634c 10710c 11278c 11413ce 11919c 12344 12907c 13550c 13674c 14992 15166c 15480c 15987 16282c 16421c 16982c 17390c 17500c 17860c 17932c 18162c 18373c 18667c 19065c 19980c 20017c 20019c 20115c 20139c 20441c 20594c 20942c 21202c 21305c 21761c 21825c 24599c 24792c
Ouch! That is a lot of processes that were using this executable, looking at a few they seemed to be ssh processes owned by oracle off to other servers. It seemed a bit of a pain to go through them all killing each one individually, and this is where fuser comes to the rescue again!
fuser -ck /u01/app/oracle/product/126.96.36.199/grid/bin/crsctl.bin
The -k flag kills all processes accessing this file. Now I could try manually applying the missing patches:
Manually Patching GI
This is well documented but bears repeating, when you are attempting to manually apply a patch (or indeed rollback a patch) to the GI Home you have to unlock the home as root. You need to run the following:
# /u01/app/oracle/product/188.8.131.52/grid/crs/install/rootcrs.pl -unlock
Now as the oracle user you descend to your patch directory and apply the patch with a simple opatch apply (or napply) and once you have applied all the patches to the GI Home you need to lock the GI Home again. Once more you need to run as root:
# /u01/app/oracle/product/184.108.40.206/grid/crs/install/rootcrs.pl -patch
Avoiding these steps is certainly one advantage of the opatch auto, I just wish it made it a bit obvious when it failed to apply every patch to a home!