Advanced Queuing Dequeue Freeze

The Issue

We have been running an Advanced Queuing solution for a while now, and with a frustrating regularity we were seeing messages building up in our queue table and the dequeue process seeming to think that there was no work for it to do. These were messages that had a time delay set before being ready for dequeuing. Messages queued without a delay were processed at all times.

This was a RAC system running 10.2.0.4

Perhaps the most frustrating thing about this problem was that Oracle Support effectively refused to look into the problem because we had the parameter aq_tm_processes set. Aparently the recomendation in 10g is to have this parameter unset. Of course if this parameter is currently set you can’t actually unset it without a bounce of the database, which is not something you can automatically perform.

The Workaround

While awaiting the chance to bounce the database and alter this parameter we hit on a workaround to get the queue processing again. It basically involved setting the aq_tm_processes value to 0, seeing that all the queue process monitors shutdown and then set the aq_tm_process parameter back to the original value:

SQL> alter system set aq_tm_processes=0;

ps -ef|grep qmn
ps -ef|grep q00

There should be no background processes found. Once these processes have died it is a simple case of starting them back up again:

SQL> alter system set aq_tm_processes=1;

So fairly straightforward workaround but this was happening with a tedious regularity.

Towards a Permanent Fix

Eventually we took some downtime and unset this parameter with a database bounce. We could then perform some debugging with Oracle Support, as it did not take long for the issue to reappear. Essentially we needed to trace the background processes when the dequeueing process had frozen. This is fairly easy with oradebug:

First find the pid to trace:

[jason@test]$ ps -ef|grep qmn
oracle    6669     1  0 Jan22 ?        00:00:00 ora_qmnc_TEST

SQL> select pid from v$process where spid=6669;

       PID
----------
	14
SQL> oradebug setorapid 14
Unix process pid: 6669, image: oracle@test.jason.org.uk (QMNC)

SQL> oradebug unlimit
Statement processed.

SQL> oradebug Event 10046 trace name context forever, level 12
Statement processed.

Once you have gathered enough debugging information you can turn off the oradebug with the following:

SQL> oradebug Event 10046 trace name context off
Statement processed.

In the trace file of one of the queue monitor processes we could see the following:

*** SERVICE NAME:\(SYS$BACKGROUND) 2009-01-14 14:58:37.219
*** SESSION ID:\(1120.5) 2009-01-14 14:58:37.219
Received ORADEBUG command 'unlimit' from process Unix process pid: 26034, image:
Received ORADEBUG command 'Event 10046 trace name context forever, level 12' from process Unix process pid: 26034, image:
WAIT #0: nam='Streams AQ: waiting for time management or cleanup tasks' ela= 4467314 p1=0 p2=0 p3=0 obj#=-1 tim=1203071408002413
*** 2009-01-14 15:03:31.795
WAIT #0: nam='Streams AQ: waiting for time management or cleanup tasks' ela= 283204298 p1=0 p2=0 p3=0 obj#=-1 tim=1203071691206827
*** 2009-01-14 15:08:31.801
WAIT #0: nam='Streams AQ: waiting for time management or cleanup tasks' ela= 104963843 p1=0 p2=0 p3=0 obj#=-1 tim=1203071984181007
*** 2009-01-14 15:13:31.806
WAIT #0: nam='Streams AQ: waiting for time management or cleanup tasks' ela= 292974027 p1=0 p2=0 p3=0 obj#=-1 tim=1203072277155198
*** 2009-01-14 15:18:31.812
WAIT #0: nam='Streams AQ: waiting for time management or cleanup tasks' ela= 292973898 p1=0 p2=0 p3=0 obj#=-1 tim=1203072570129257
*** 2009-01-14 15:23:31.817
WAIT #0: nam='Streams AQ: waiting for time management or cleanup tasks' ela= 292973852 p1=0 p2=0 p3=0 obj#=-1 tim=1203072863103293
*** 2009-01-14 15:28:31.823

The Solution

Now be aware this issue had dragged on for quite a few months by now. As soon as Oracle saw the trace file the support engineer managed to diagnose immediately what the issue was – a bug 7150994 . Unbelievably, to be tripped up by this bug you have to have upgraded to 10.2.0.4 AND be running RAC. This bug appears to affect 10.2.0.4 only. Patch 6600051 is available to fix this problem.

What I think was the most disappointing thing off all is that our symptoms, version, and the fact we were running RAC matched so closely to this bug, that Oracle Support should have been able to point this bug out to us without us suffering a few months of delays before alighting on the solution.

I fully suspect that it was a change in support personnel rather than us providing additional information that enabled the solution to be found

Advertisements

4 thoughts on “Advanced Queuing Dequeue Freeze

  1. Jason,

    Timely post, thanks. I’m starting a project with an E-Business Suite service that may culminate in migration to a 10.2.0.4 RAC system, and since it’s E-Business Suite, AQ figures into the equation. Thanks for saving me some pain in advance. 🙂

  2. Hi John,

    Well that is a bit of luck for you! If you are going to be relying on AQ messages with 10.2.0.4 and RAC then yep, definitely check out patch 6600051.

    jason.

  3. You dare speak ill of Oracle Support! BLASPHEME!

    We have had the worst trouble with Oracle Support lately. I usually give up when they ask me for the same information that I’ve already provided 3 times, including in the original SR creation.

    I definitely think they overcharge for that “service”.

    • Hi Don,

      I’m sure you too have had the great support analyst who really does know his stuff and helps resolve a problem quickly, trouble is it all seems so random and I’d say at least 50% of the time it’s someone that makes me want to commit murder.

      The worst example I have experienced was when I mentioned localhost, as in 127.0.0.1, and the support analyst did not what that was. Very depressing.

      jason.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s