Thursday, May 14, 2015

New Oracle Top-N query options

What is the best way to get the Top-N values from a table in Oracle? In the past the answer to this question has always been a little convoluted. There are a couple of clear ways to perform this task but they can feel kludgey when explaining them to an inexperienced Oracle developer or analyst. The processes work, but they don't seem as clear cut as they should.

With 12c Oracle has added a feature to allow for Top-N queries with FETCH FIRST|NEXT|PERCENT clauses. This approach appears more elegant than the "old" ways since it is easier to explain. Also there is some nice functionality with this syntax to allow for offsetting the values returned, including duplicate values, and even allowing to fetch the top X percent of values. See the Oracle documentation for examples.

In this post I want to do a quick comparision between the new Top-N approach and some older approaches. In the example I ran one case of the old approach ran much better when a particular index was involved, so it goes to show that something "new" isn't always "better". 

As with any feature, be sure to investigate the performance against representative amounts of data as you develop.

In the following example I'll query against big_tab which is a copy of the dba_objects table. After initial tests I'll unhide an index and see that only one query uses that index.

Note, all of the queries return the same data so there are no issues seen with functionality.

The tests were using the parameters
compatible                12.1.0.0.0
optimizer_features_enable 12.1.0.1


The FETCH FIRST clause runs in 2.4 seconds and does 48,190 logical reads. This is comparable to the other methods investigated which included a RANK() clause and the older subquery with rownum<=10.

Option 1.
SELECT /*+ gather_plan_statistics*/  owner, object_id FROM op.big_tab
ORDER BY object_id DESC    

FETCH FIRST 10 ROWS ONLY

Option 2.
select /*+ gather_plan_statistics */ owner, object_id
from
   (select  owner, object_id, rank() over (order by object_id desc ) as rank
   from big_tab )
where rank <=10

Option 3.
select /*+ gather_plan_statistics */  stage.owner, stage.object_id   
from ( select owner, object_id from op.big_tab order by object_id desc) stage
where rownum <=10


The detailed execution plans are listed out in the following sections. These show the details for Actual time and Buffer Gets to allow for comparing behavior.

Execution plan for Option 1.

SQL_ID  7thwrzawu40gz, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics*/  owner, object_id FROM op.big_tab
ORDER BY object_id DESC     FETCH FIRST 10 ROWS ONLY

Plan hash value: 4205637774

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |         |      1 |        |     10 |00:00:02.40 |   48190 |  48185 |       |       |          |
|*  1 |  VIEW                    |         |      1 |   2866K|     10 |00:00:02.40 |   48190 |  48185 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK|         |      1 |   2866K|     10 |00:00:02.40 |   48190 |  48185 | 80896 | 80896 |71680  (0)|
|   3 |    TABLE ACCESS FULL     | BIG_TAB |      1 |   2866K|   2866K|00:00:08.15 |   48190 |  48185 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("OBJECT_ID") DESC )<=10)

 
Execution plan for Option 2

SQL_ID  4y49s6tk4crha, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ owner, object_id from (select
owner, object_id, rank() over (order by object_id desc ) as rank from
big_tab ) where rank <=10

Plan hash value: 4205637774

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |         |      1 |        |     10 |00:00:02.47 |   48190 |  48185 |       |       |          |
|*  1 |  VIEW                    |         |      1 |   2866K|     10 |00:00:02.47 |   48190 |  48185 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK|         |      1 |   2866K|     11 |00:00:02.47 |   48190 |  48185 | 95232 | 95232 |83968  (0)|
|   3 |    TABLE ACCESS FULL     | BIG_TAB |      1 |   2866K|   2866K|00:00:08.52 |   48190 |  48185 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RANK"<=10)
   2 - filter(RANK() OVER ( ORDER BY INTERNAL_FUNCTION("OBJECT_ID") DESC )<=10)


Execution plan for Option 3

SQL_ID  7x3nv0d1r475r, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */  stage.owner, stage.object_id   
from ( select owner, object_id FROM op.big_tab ORDER BY object_id DESC)
stage   where rownum <=10

Plan hash value: 2431194888

---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |         |      1 |        |     10 |00:00:02.54 |   48198 |  48187 |       |       |          |
|*  1 |  COUNT STOPKEY          |         |      1 |        |     10 |00:00:02.54 |   48198 |  48187 |       |       |          |
|   2 |   VIEW                  |         |      1 |   2866K|     10 |00:00:02.54 |   48198 |  48187 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY|         |      1 |   2866K|     10 |00:00:02.54 |   48198 |  48187 | 80896 | 80896 |71680  (0)|
|   4 |     TABLE ACCESS FULL   | BIG_TAB |      1 |   2866K|   2866K|00:00:13.03 |   48198 |  48187 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)


Note that when an index was created on the object_id field Options 1 and 2 did not utilize the index. Option 3 (with the subquery), however, did utilize the index as we can see below. This is an interesting result that should be kept in mind as these types of queries are developed.

Execution plan for Option 3 (utilizing index)

SQL_ID  7x3nv0d1r475r, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */  stage.owner, stage.object_id   
from ( select owner, object_id FROM op.big_tab ORDER BY object_id DESC)
stage   where rownum <=10

Plan hash value: 1151062491

------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |      1 |        |     10 |00:00:00.01 |       4 |      4 |
|*  1 |  COUNT STOPKEY                |         |      1 |        |     10 |00:00:00.01 |       4 |      4 |
|   2 |   VIEW                        |         |      1 |     10 |     10 |00:00:00.01 |       4 |      4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| BIG_TAB |      1 |   2866K|     10 |00:00:00.01 |       4 |      4 |
|   4 |     INDEX FULL SCAN DESCENDING| BIG_IDX |      1 |     10 |     10 |00:00:00.01 |       3 |      3 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=10)



In conclusion, in most cases it looks like the new FETCH FIRST query will run just as well as the pre-12c methods for collecting the TOP-N queries. In this test, the subquery method (Option 3) was able to utilize an index the others could not, so if there are performance consideration you should check the plans against production level amounts of data.