FAL failing to resolve archive gaps

I had been having problems with FAL failing to resolve archive gaps. The classic error message on my 10.2.0.3 physical standby looked something like this:

FAL[client]: Failed to request gap sequence
GAP – thread 1 sequence 11402-11403
DBID 2878580510 branch 589635294
FAL[client]: All defined FAL servers have been attempted.
————————————————————-
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that is sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
————————————————————-

Now, a FAL (Fetch Archive Log) request is triggered by the MRP process detecting a gap in the available archives to apply. The server to contact to fulfil the request is determined by the FAL_SERVER parameter which you set to an appropriate tnsnames alias. This essentially enables your standby to contact the primary database and get it to ship the appropriate missing log using the arc process.

While the above situation was logging the above on the standby, no indication of any issues were being logged on the primary. First thought of course was connectivity, but that was easily disproved.

So to get more information I deployed extra logging on both the primary and standby, using the log_archive_trace parameter. This parameter has the ability to trace a number of background processes and is thus useful on both the primary and the standby.

In fact it was the arcn process on the primary that really showed what was wrong, as the trace continually had the following lines:

FAL[server]: Duplicate request detected (dbid 0 branch 127 thread 1 sequence 11402 dest STANDBY)
FAL[server]: Request not queued in this case
FAL[server]: Duplicate request detected (dbid 0 branch 127 thread 1 sequence 11403 dest STANDBY)
FAL[server]: Request not queued in this case
*** 2008-06-03 11:57:18.229
FAL[subfal]: Waiting on response from destination for thread 1 sequence 10890.

So, what is this actually telling us? The standby is asking for 11402, and 11403 and the primary knows the standby is after these, however the primary also thinks the standby is after 10890, but the standby having already applied this does not care about receiving this archived log.

Essentially this log is blocking the FAL process from working properly.

Turns out this is a bug situation. We were hitting bug:5576816 with an associated MetaLink Note: 5576816.8. Seems like it’s related to having max_connections set on the log_archive_dest parameter (which we did).

As an attempted workaround I tried setting the log_archive_max_processes down from 6 to 1, with the following shown in the primary alert log:

Shutting down archive processes
ALTER SYSTEM SET log_archive_max_processes=1 SCOPE=BOTH;
ARCH shutting down
ARC4: Archival stopped
ARCH shutting down
ARC3: Archival stopped
ARCH shutting down
ARC2: Archival stopped
ARCH shutting down
ARC1: Archival stopped
ARC5: Becoming the ‘no FAL’ ARCH
ARC5: Becoming the ‘no SRL’ ARCH
ARC5: Becoming the heartbeat ARCH
ARCH shutting down
ARC5: Archival stopped
ARC0: Becoming the ‘no FAL’ ARCH
ARC0: Becoming the ‘no SRL’ ARCH
ARC0: Becoming the heartbeat ARCH

I then killed (OS kill that is) the final running arc background process – note this is safe according Oracle support, and pmon detects it is dead and restarts the arc process:

ARCH: Detected ARCH process failure
ARCH: STARTING ARCH PROCESSES
ARC0: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
ARC0 started with pid=276, OS id=24202
ARC0: Becoming the ‘no FAL’ ARCH
ARC0: Becoming the ‘no SRL’ ARCH
ARC0: Becoming the heartbeat ARCH

This did not work. When the arc0 process restarted it still was thinking it should be sending the 10890 archive log. The only solution was to bounce the instance, thankfully with it being a RAC primary there was no real service outage.

I won’t be caring about FAL next week, as I’m off to the seaside for a week.

16 thoughts on “FAL failing to resolve archive gaps”

Mak says:

August 24, 2009 at 6:55 pm

Thanks for your analysis. I have 10204, we have not set the max connections and still I am getting the duplicate request message in the trace file. Time for me to log a SR

Thanks
Mak

Reply
Tim Williams says:

November 28, 2009 at 1:37 am

You would not believe what worked for me. Found the suggestion in another forum. You kill the arc_ processes on your primary. They will respawn and when they do gap resolution will work. Amazing. I’ve been trying to fix this for 2 weeks and I happened to stumble across the solution.

Reply
- Alan Kendall says:
  
  May 6, 2013 at 6:59 pm
  
  Besides killing the arc processes, I have been fixing this problem with
  On Primary in SQLPLUS:
  SQL> CREATE OR REPLACE DIRECTORY asm_files
  AS ‘+NFL_DATA_01/csesbp/archivelog/2013_05_03/’ ;
  SQL> CREATE OR REPLACE DIRECTORY DSK_FILES AS ‘/u01/alan’;
  SQL> exec DBMS_FILE_TRANSFER.COPY_FILE ( ‘asm_files’ , ‘thread_2_seq_52097.269.814250165’ , ‘dsk_files’ , ‘thread_2_seq_52097.269.814250165.arc’ );
  On primary OS:
  /u01/alan> gzip *.arc
  On Standby OS:
  /u01/alan>scp 10.196.50.20:/u01/alan/*.gz .
  /u01/alan>gunzip *.arc
  On Standby RMAN:
  RMAN> catalog archivelog ‘/u01/alan/’thread_2_seq_52097.269.814250165.arc’;
  
  And the alertlog shows that the replication immediately starts up and continues.
  
  Reply
Ben Shi says:

December 11, 2009 at 4:21 pm

Yes, I followed the suggestion by Tim Willams, and I tried. Amazingly, it works.

Here is what I did on primary side:
$ ps -ef|grep arc|grep
$ kill -9

Then you will see, oracle automatically spawn all those archive processes, and go to your standby side, you should see logs are being received and media recovered.

Thanks, Tim.

Reply
Olaf says:

March 11, 2010 at 9:44 am

You saved my day! Our standby was out of sync and wasn’t able to resync via FAL. Restarting the primary was impossible. I orakilled ARC0 and the standby caught up within minutes.

Thank you!
Olaf

Reply
Kshitij Joy says:

March 24, 2010 at 3:06 pm

Guys

I can also recommend that killing the ARC process is the best bet, i too tried it because my Standby was not catching up and Primary was trying to send the Archivelog which Standby had already applied.

So, as soon as you kill the ARC, Oracle spawned a new ARC process and the FAL automatically fetched all the pending archivelogs.

Reply
kafui says:

March 31, 2010 at 5:10 pm

how is the arc ocesses spotted and killed on windows envirpronment?

Reply
kafui says:

March 31, 2010 at 5:10 pm

how is the arc processes spotted and killed on windows environment?

Reply
- jarneil says:
  
  March 31, 2010 at 6:16 pm
  
  I don’t tend to use windows. You might want to check out orakill
  
  jason..
  
  Reply
- Vijay Parmar says:
  
  February 20, 2012 at 6:11 am
  
  Thanks for your help Jarneil, for windows you can get the sid, serial using following select stmt.
  select * from v$session where type = ‘BACKGROUND’ and program like ‘%ARC0%’;
  
  confirm and kill the process.
  
  Earlier I used to FTP the file and give following stmt.
  alter database register or replace logfile ”;
  
  Thanks
  -Vijay Parmar
  
  Reply
Pingback: OraExplorer » Oracle 10.2.0.3 Gap Resolution of Physical Standby Appears to Hang ~
a collado says:

May 31, 2011 at 3:55 pm

whew! thank you so much jarneil. this also saved my day, killing the ARC process on the Primary. we’re 500+ logs behind and been babysitting the Standby DB, manually shipping some missing logs that the FAL cannot request from Prod.

killing the ARC process rocks.. we have only 2 archiver process, and the ora_arc1, i assumed to be the DEST_2 pair, so i just left the ora_arc0, fearing it might cause unnecessary effect to the Primary.

thanks again.

Reply
justanotherday says:

September 28, 2011 at 11:27 am

Another thumbs up to this tip from a grateful admin.

Reply
Martin Meyer says:

October 20, 2011 at 10:59 am

Excellent thanks! Saved me a few anxious hours. The following Unix command (for SID=abcprd) worked for me

ps -ef|grep arc|grep casdrp|grep -v grep|awk ‘{print $2}’|xargs kill -9

A few minutes later the standby alert.log showed:
Thu Oct 20 12:54:46 2011
Fetching gap sequence in thread 1, gap sequence 120862-120882

Reply
Damodar Reddy says:

October 26, 2011 at 3:44 pm

Wonderful! This works. My standby too was not in sync with primary for the last few days. Thankfully, came accross this thread and killed the primary arc processes. They respawned and the standby started to fetch and apply the gap sequence.

Reply
Evgeniy says:

February 8, 2013 at 10:18 am

Amazing! It really works. Thank you Jarneil so much. Really don’t know what to do before. And now everything ok. Actually have some fear to kill arc_N processes (I have 2) on the primary db, because it’s a billing system. But take a decision and now happy and ready for my weekend.
Thanks.

Reply

	OCI Bare Metal Prici… on OCI VM Shape Pricing
	OCI Annual Universal… on OCI Discount Levels
	Thyss Sineas on Expanding an ASM disk
	Kumar Ramalingam on Keep Disks in your Diskgroup t…
	EDUARDO JUVENAL DE S… on Installing ODI Developer Studi…

jarneil

The thoughts of Jason Arneil

Menu

FAL failing to resolve archive gaps

16 thoughts on “FAL failing to resolve archive gaps”

Leave a comment Cancel reply

Menu

Share this:

Related

16 thoughts on “FAL failing to resolve archive gaps”

Leave a comment Cancel reply