Monitoring the Progress of an 11gR2 Standby Database

It is the obvious question when running a Standby Database, how do I know how up-to-date my Standby is? I’ve been working on some Dataguard switchover testing on an 11gR2 system and was investigating this question once more. The first and obvious place to look into is V$DATAGUARD_STATS:


SQL> select * from v$dataguard_stats;

NAME				 VALUE								  UNIT		                     TIME_COMPUTED			    DATUM_TIME
-------------------------------- ----------------------------------------- -------------------------------- ------------------------------
transport lag			 +00 00:00:00					day(2) to second(0) interval	 11/14/2010 20:00:19		11/14/2010 20:00:18
apply lag			     +00 00:00:00					day(2) to second(0) interval	 11/14/2010 20:00:19		11/14/2010 20:00:18
apply finish time		 +00 00:00:00.000				day(2) to second(3) interval	 11/14/2010 20:00:19
estimated startup time		 28							second			                 11/14/2010 20:00:19

Assuming you have no lag either transport or apply, then you ensure that the time_computed and the datum_time columns are up-to-date with what the wall clock time is. If all that is correct then you can be confident that your standby is receiving and applying redo from the primary. However there is something deeply unsatisfying about relying on a value that is only reporting 0.

The recommended check to determine the state of your standby before performing a failover, is to check the V$ARCHIVE_DEST_STATUS view on the primary. For 11gR2 there is a new column on this view called GAP_STATUS. This does not appear before 11gR2 (Not in the 11gR1 docs):


SQL> select status, gap_status 
   from V$archive_dest_status
   where dest_id=3;

STATUS	  GAP_STATUS
--------- ------------------------
VALID	  NO GAP

So that can give you some confidence that you are going to switchover to a destination that is up-to-date, but still no hard numbers on where exactly the standby has recovered up to.

If you check the current_scn from V$DATABASE you’ll find it far behind the primary. The documentation for current_scn here has also changed from earlier versions and gives a nice clue to look at V$RECOVERY_PROGRESS to check the SCN of the last applied redo:


SQL> select * from v$recovery_progress;

02-11-10 09:52:22 Media Recovery						   Log Files			    Files	    870 	 0
02-11-10 09:52:22 Media Recovery						   Active Apply Rate		    KB/sec	    679 	 0
02-11-10 09:52:22 Media Recovery						   Average Apply Rate		    KB/sec	    114 	 0
02-11-10 09:52:22 Media Recovery						   Maximum Apply Rate		    KB/sec	  12327 	 0
02-11-10 09:52:22 Media Recovery						   Redo Applied 		    Megabytes	 139335 	 0
02-11-10 09:52:22 Media Recovery						   Last Applied Redo		    SCN+Time	      0 	 0 16-11-10 21:17:50 SCN: 7179024793 
02-11-10 09:52:22 Media Recovery						   Active Time			    Seconds	 489607 	 0
02-11-10 09:52:22 Media Recovery						   Apply Time per Log		    Seconds	    560 	 0
02-11-10 09:52:22 Media Recovery						   Checkpoint Time per Log	    Seconds	      0 	 0
02-11-10 09:52:22 Media Recovery						   Elapsed Time 		    Seconds	1250728 	 0

The Last Applied Redo for this system which is in real time apply tracks the wall time exceedingly closely and gives greater feedback of redo application on the standby.

With 11.2.0.2 there is the prospect that V$ARCHIVE_DEST on the primary will track which scn’s have been applied on the standby with the applied_scn column which is new in 11.2.0.2, and does not appear in 11.2.0.1. This will give a straightforward mechanism of determining how far your standby is lagging behind your primary.

Advertisements

3 thoughts on “Monitoring the Progress of an 11gR2 Standby Database

  1. Hi

    Thanks for the post – really useful information!

    It’s encouraging that Oracle has made some improvements in the area of Data Guard monitoring. The V$DATAGUARD_STATS and V$ARCHIVE_GAP dynamic performance views are an unreliable source of information in 11gR1. See Bug 7119382 and Bug 10072528 respectively.

    The way I check that Data Guard is upto date on a Physical Standby is to query V$ARCHIVED_LOG on the Standby. If no rows are returned from the following query, then this is a good indication all is well.

    alter session set nls_date_format=’YYYY-MON-DD HH24:MI:SS’;
    SELECT SEQUENCE#, THREAD#, FIRST_TIME, NEXT_TIME, APPLIED FROM V$ARCHIVED_LOG
    where applied = ‘NO’
    ORDER BY THREAD#, SEQUENCE#;

    Additionally, if you have Data Guard Broker configured, the following command should return SUCCESS.

    DGMGRL> show configuration

    Thanks
    John

  2. Hi Jason,

    With “LGWR ASYNC”, is it possible to find how much redo is written to the SRLor the standby is lagging behind by how much (not in terms of apply or archive)?

    Thanks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s