I thought I understood testing. Before I run anything in my production environment, I’m utterly strict that I test in a non-production environment first. Does not matter where that change comes from, it is always run into test first. This naturally includes any changes at the database level, rather than just inside a particular schema.
When I have a set of instructions or steps to take the database from one particular environment, or to install a particular feature, I don’t tend to test just 1/2 the steps, but generally, if I have a sequence of steps I tend to test the entire sequence.
Recently, I’ve been working on a project to increase the protection level of a dataguard environment from Maximum Performance to Maximum Availability. This is a 10gR2 environment, so I pull up the 10gR2 dataguard documentation. To me, the steps seem pretty clear. Let me highlight step 1:
Step 1 If you are upgrading the protection mode, perform this step.
Perform this step only if you are upgrading the protection mode (for example, from maximum performance to maximum availability mode). Otherwise, go to Step 3.
Assume this example is upgrading the Data Guard configuration from the maximum performance mode to the maximum availability mode. Shut down the primary database and restart it in mounted mode:
SQL> SHUTDOWN IMMEDIATE;
SQL> STARTUP MOUNT;
It’s clear right? To upgrade the protection level, you have got to shutdown the instance and have it in the mount mode. I would not just run this in production, I’d always want to test these steps in my test infrastructure that was the same environment as my production setup. Question is, would anyone test upgrading the protection level, but just skip this step? Would it really occur to someone, oh, I wonder if I can just skip this first step and keep my instance up and running?
It did not occur to me, but then I read the (excellent) Oracle Data Guard 11g Handbook by Larry Carpenter, et. al. It’s pretty explicit that you don’t need to shutdown your instance!
The above was run on a 10.2.0.4 instance. I’d already set the log_archive_dest_n to a LGWR SYNC mode. One thing to note, you must explicitly set AFFIRM here as well, it’s not good enough just using LGWR SYNC, as NOAFFIRM is the default and this leads to the protection_level being in continual resynchronization.
Resynchronization occurs when you first increase the protection_mode or when there is a network outage. It means your configuration is effectively at that point running in maximum performance, and while the protection_level is not at maximum availability the potential exists for data loss.
This really does contradict the documentation so this has been a really useful find for me, as just following the documentation would have led to me having to take downtime on my RAC cluster. It has always been the case that you can drop the protection level without incurring downtime. Note to go all the way to MAXIMUM PROTECTION still requires the database to be in the mounted state.