This post will test that out with a couple of examples. I will look at accuracy and performance to evaluate the functionality. The first test will be of a full scan, the second test will use the SAMPLE clause, and the final test will use the new APPROX_COUNT_DISTINCT function.
To simplify the test I will create a copy of the DBA_OBJECTS table called BIG_TABLE. The query used will group by OWNER while performing a count on distinct OBJECT_ID. To evaluate the accuracy I will reduce the output of 38 rows to 7 key rows.
As a note, when I evaluate the execution plan I will be using the gather_plan_statistics hint to get ALLSTATS from dbms_xplan.display_cursor. When I execute different queries I often use an additional string in the hint to help search v$sql for the SQL_TEXT I want to review. That explains why you will see hints like /*+ gather_plan_statistics a*/.
Queries
The queries evaluated were the following:A. Full Scan
select /*+ gather_plan_statistics a*/ owner, count(distinct object_id)
from big_table
group by owner
order by owner;
B. Using sample size of 20%
select /*+ gather_plan_statistics b*/ owner, count(distinct object_id)
from big_table sample (20)
group by owner
order by owner;
C. Using new feature
select /*+ gather_plan_statistics c*/ owner, approx_count_distinct (object_id)
from big_table
group by owner
order by owner;
The results will show that even though the sample size query ran faster than APPROX_COUNT_DISTINCT, the results of the latter were more accurate. The performance of the full scan had a run time of twice that of APPROX_COUNT_DISTINCT, so the performance was also much improved with the new function.
Detail Results
The details are provided here for anybody interested. Following this I will have a Summary of this information.A:
OWNER COUNT(DISTINCTOBJECT_ID)
-------------------- ------------------------
APEX_050000 3600
BI 8
HR 34
LBACSYS 237
PUBLIC 37125
SYS 42110
XFILES 141
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 07q279qdm8pgp, child number 0
-------------------------------------
select /*+ gather_plan_statistics a*/ owner, count(distinct object_id)
from big_table group by owner order by owner
Plan hash value: 2085152455
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 38 |00:00:00.30 | 1577 | 1565 |
| 1 | SORT GROUP BY | | 1 | 90869 | 38 |00:00:00.30 | 1577 | 1565 |
| 2 | VIEW | VM_NWVW_1 | 1 | 90869 | 93680 |00:00:00.27 | 1577 | 1565 |
| 3 | HASH GROUP BY | | 1 | 90869 | 93680 |00:00:00.24 | 1577 | 1565 |
| 4 | TABLE ACCESS FULL| BIG_TABLE | 1 | 90869 | 93680 |00:00:00.18 | 1577 | 1565 |
------------------------------------------------------------------------------------------------------------
B:
OWNER COUNT(DISTINCTOBJECT_ID)
-------------------- ------------------------
APEX_050000 741
HR 4
LBACSYS 49
PUBLIC 7455
SYS 8508
XFILES 30
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 729uapczb0dd2, child number 0
-------------------------------------
select /*+ gather_plan_statistics b*/ owner, count(distinct object_id)
from big_table sample (20) group by owner order by owner
Plan hash value: 1425374893
------------------------------------------------------------------------------------------------------------| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 37 |00:00:00.06 | 1577 | 1565 |
| 1 | SORT GROUP BY | | 1 | 3635 | 37 |00:00:00.06 | 1577 | 1565 |
| 2 | VIEW | VM_NWVW_1 | 1 | 3635 | 18904 |00:00:00.05 | 1577 | 1565 |
| 3 | HASH GROUP BY | | 1 | 3635 | 18904 |00:00:00.05 | 1577 | 1565 |
| 4 | TABLE ACCESS SAMPLE| BIG_TABLE | 1 | 3635 | 18904 |00:00:00.03 | 1577 | 1565 |
------------------------------------------------------------------------------------------------------------
C:
OWNER APPROX_COUNT_DISTINCT(OBJECT_ID)
-------------------- --------------------------------
APEX_050000 3630
BI 8
HR 34
LBACSYS 236
PUBLIC 37338
SYS 42118
XFILES 140
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 3bmz5b1a9uzqh, child number 0
-------------------------------------
select /*+ gather_plan_statistics c*/ owner, approx_count_distinct
(object_id) from big_table group by owner order by owner
Plan hash value: 3184991183
------------------------------------------------------------------------------------------------------------| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 38 |00:00:00.14 | 1577 | 1565 |
| 1 | SORT GROUP BY APPROX| | 1 | 90869 | 38 |00:00:00.14 | 1577 | 1565 |
| 2 | TABLE ACCESS FULL | BIG_TABLE | 1 | 90869 | 93680 |00:00:00.09 | 1577 | 1565 |
--------------------------------------------------------------------------------------------------------
Summary
To summarize the information seenPerformance
Method | Run time |
Count Full | 0.30 |
Count Sample | 0.06 |
Count Approx | 0.14 |
Accuracy
Owner | Count full | Count sample | Count sample*5 | Count Approx |
APEX_050000 | 3600 | 741 | 3705 | 3630 |
BI | 8 | 0 | 0 | 8 |
HR | 34 | 4 | 20 | 34 |
LBACSYS | 237 | 49 | 245 | 236 |
PUBLIC | 37125 | 7455 | 37275 | 37338 |
SYS | 42110 | 8508 | 42540 | 42118 |
XFILES | 141 | 30 | 150 | 140 |
Conclusion
The new function runs faster than the full scan and it is more accurate than using the sample feature. With the 20% sample, the BI records were not even seen. If you multiple the 20% sample by 5 you can get an estimate of the full table values with this method, but the values are not as close as the APPROX_COUNT_DISTINCT.If you need to get a rough estimate of counts in a table and you are using a version >= 12.1.0.2 then you should consider this feature. If exact numbers are still needed then you will need to perform a full scan.