Improving an Index Range Scan

I recently had the opportunity to perform a little bit of tuning work, and I thought I would share this here. I think tuning is one of the more satisfying things about being a dba, where you can actually see that you have a made a concrete change for the better.

The query in question was quite simple of the form:


SQL> select count(*) from ns where id = :B1 and deleted is null

The table is of a reasonable size with around 40 Million rows. This is an OLTP environment. Yes there was an index on the id column and indeed the optimizer was choosing this:



----------------------------------------------------------------------------------------------------
| Id  | Operation		     | Name		   | Rows  | Bytes | Cost (%CPU)| Time	   |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	     |			   |	 1 |	11 |	 2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE 	     |			   |	 1 |	11 |		|	   |
|*  2 |   TABLE ACCESS BY INDEX ROWID| NS                  |	 1 |	11 |	 2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN	     | IX_ID_NS            |	 1 |	   |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("DELETED" IS NULL)
   3 - access("ID"=TO_NUMBER(:B1))

It’s all Skew-Whiff

So far not all that interesting. However this data is extremely skewed. The id column is not a primary key, and the distribution of id’s looks like this:



SQL> select min(id), max(id), median(id), avg(id)
          from ns;

   MIN(ID)    MAX(ID) MEDIAN(ID)    AVG(ID)
---------- ---------- ---------- ----------
	 1	10774	       1 4.64528267

That does not really do justice to how this data is skewed. So I’ve counted up the number of times each particular id value appears in the table, then done an aggregate of how often each count appears. A little snippet of the results set should explain the data:



1   26816448                                                           
2    2064686                                                           
3     219876                                                           
4      34708
.
.
.
135931     1                                                           
136166     7                                                           
136167     1                                                           
140111     3 
                

So what are these numbers saying? Well the most popular number of times an id value appears is 1, and there are 26M id values that appear just the once. Next is the count of id values that appear twice in the table and so on.

You can seen from this that there are 3 id values that appear 140,111 times in the table.

Plotting this with on a logarithmic scale looks like this:

Frequency Count

So how is this affecting the query?

Well as stated the simple range scan is just fine for the 26Million id values that just appear this once, but whenever an id is used that appears far more frequently the index range scan is not a great idea:



SQL> select count(*) from ns where id=16318609 and deleted is null;

  COUNT(*)
----------
    108559

Elapsed: 00:02:30.35

2 and a 1/2 minutes for an OLTP system is a bit of a problem. Looking at the extended plan information we can see the issue:


-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation		     | Name		   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE 	     |			   |	  1 |	   1 |	    1 |00:01:36.12 |   64371 |	56298 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| NS           	   |	  1 |	   1 |	  102K|00:01:35.80 |   64371 |	56298 |
|*  3 |    INDEX RANGE SCAN	     | IX_ID_NS            |	  1 |	   1 |	  132K|00:00:05.83 |	 458 |	  444 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("DELETED" IS NULL)
   3 - access("ID"=16318609)

The above is an extended trace and it shows how many rows the optimizer thinks it will get from each operation and the actual number of rows it really found when the database had to do the work for real. You can see the statistics are are out by 6 orders of magnitude for this particular chosen id value.

However even if the optimizer exactly knew the correct number of rows to return, the only other plan available is a full table scan, which is not all that great on a 40M row table.

Improving the response time

Hopefully I took some of Jonathan Lewis’ advice and considered how best to get the data and with this simple query the best implementation really is to avoid going back to the table at all, so creating a fatter index on both the id and deleted column seemed to be the way to do it:



SQL> create index id_removed on ns(id, deleted) parallel 3 nologging;

SQL> select count(*) from ns where id=16318609 and deleted is null;

  COUNT(*)
----------
    108559

Elapsed: 00:00:00.54

That is a pretty pleasing two orders of magnitude improvement in the response time of this query.

Of course best response time of all would have been not to have to run the query at all, but that’s another story 😉

Advertisements

5 thoughts on “Improving an Index Range Scan

  1. All the time when I was reading your post, first thing that struck me was the query “SQL> select count(*) from ns where id = :B1 and deleted is null” and the following statement “This is an OLTP environment”.
    Then I read your last statement of the post and bingo…:)
    While your solution is fantastic, it is a human tendancy to not to bother about real problem (which is answering the question do we really need count(*) in OLTP environment?), once this (possible) band-aid does the job.

    • Hi Narenda,

      Thanks for reading!

      Absolutely, I always ask my developers “do you really need to do this”, but sometimes you just don’t have access to the and then the band-aid is at least mitigating the sub optimal situation.

  2. Hi Jason,

    I wonder if by creating the index in parallel you also induced the optimiser to use parallel processing in answering the query using the index?… I don’t suppose you can still run the query through explain plan to test that?

    Martin

  3. Jason,
    have you ever thought of switching to an index organized table (on id, deleted and maybe something else), and if yes, what where the results?
    I doesn’t know your data, but IDs rarely change, so rows should not move too often within your IOT-structure?
    One more step could be a partitioning on ‘deleted’. Partition pruning AND iot might fit very well (if there are no other drawbacks).
    Martin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s