Thursday, May 5, 2016

Tips for travel (update)

This is a followup to a Tips for Travel post I had created several years ago.

Over the course of time there have been technology changes that I wanted to address. Also I have accumulated additional advice that I think will be helpful updates to the prior tips.

Technology updates
I still use my iPhone heavily when I travel. The apps I use like CamScanner and Maps continue to be some of the most important items on my phone. One update is that Maps will provide audio prompts which is much safer than trying to check the phone while driving.

Some other apps that are useful include the apps provided by airline, hotel, and rental car companies. I often use these apps to make travel changes without having to find a computer or connection for my laptop. With the airline apps I will check in with my phone and save the boarding pass to my Wallet app. This reduces the paper I have to print and carry.

The other apps I use most often are Audible, SiriusXM, and Podcasts. I have a subscription with Audible which provides me with 2 audiobooks every month. A typical book is around 15 hours which gives you plenty of entertainment while you are traveling. We have Sirius radio in our car, so I downloaded the Sirius app which allows me to listen to music, talk radio, etc while I travel. Note that if you are not connected to wifi, then the Sirius app will use quite a bit of data. Finally there are several Podcasts I listen to on a regular basis. The latest episodes can be downloaded to your device when you are connected to wifi then listened to at your leisure in order to save data usage.

Lastly, I am a fan of Amazon Prime. With my Kindle device I can download movies and TV shows I've purchased, but also with a prime membership you can download movies or shows that are available for streaming. This allows you to watch something when you are not connected to wifi, such as on a plane or sitting in an airport. I have noticed that my Bose in-ear headphones allow me to hear the shows much better, especially when I'm on a plane. Also as a prime member I can use the Kindle lending library to read some books for free. Note the selection is pretty weak, but you can probably find something that would do in a pinch.

Travel updates
Enroll in TSA PreCheck. Most airports have different lines for PreCheck which will allow you to get to your gate much faster than going through regular security. If you travel internationally then you should look into the Global Entry program, but for domestic flights PreCheck is all you need.

Accept the fact that you will have flight delays. Between weather issues, mechanical problems, and cascading delays it's almost surprising that flights leave on time at all. When you want to get home or if you are in a hurry to get to a client's site, delays can be very frustrating but you have to remind yourself to accept the things you cannot change. Stressing about these delays only hurt yourself. Given that, you should always check your flight connection times to ensure that you give yourself enough time to reach your next gate.

I don't have memberships to the airline clubs, but with their credit cards you can get a one-time pass for a reasonable amount of money. If I know I have a multi hour delay I will sometimes take advantage of that feature to spend time in the clubs. The free house drinks, free wifi, snacks, and comfortable chairs makes those delays pass by much faster. With a good audiobook or a downloaded movie on my Kindle, the time goes by even faster.

Saturday, April 2, 2016

APPROX_COUNT_DISTINCT overview

Oracle has a new function advertised with 12.1.0.2 called APPROX_COUNT_DISTINCT. This SQL function is will allow the user to query a rough estimate of a group of distinct values. This feature is supposed to run more efficiently than the current methods.

This post will test that out with a couple of examples. I will look at accuracy and performance to evaluate the functionality. The first test will be of a full scan, the second test will use the SAMPLE clause, and the final test will use the new APPROX_COUNT_DISTINCT function.

To simplify the test I will create a copy of the DBA_OBJECTS table called BIG_TABLE. The query used will group by OWNER while performing a count on distinct OBJECT_ID. To evaluate the accuracy I will reduce the output of 38 rows to 7 key rows.

As a note, when I evaluate the execution plan I will be using the gather_plan_statistics hint to get ALLSTATS from dbms_xplan.display_cursor. When I execute different queries I often use an additional string in the hint to help search v$sql for the SQL_TEXT I want to review. That explains why you will see hints like /*+ gather_plan_statistics a*/.

Queries

The queries evaluated were the following:

A. Full Scan
select /*+ gather_plan_statistics a*/ owner, count(distinct object_id)
from big_table
group by owner
order by owner;

B. Using sample size of 20%
select /*+ gather_plan_statistics b*/ owner, count(distinct object_id)
from big_table sample (20)
group by owner
order by owner;

C. Using new feature
select /*+ gather_plan_statistics c*/ owner, approx_count_distinct (object_id)
from big_table
group by owner
order by owner;

The results will show that even though the sample size query ran faster than APPROX_COUNT_DISTINCT, the results of the latter were more accurate. The performance of the full scan had a run time of twice that of APPROX_COUNT_DISTINCT, so the performance was also much improved with the new function.

Detail Results

The details are provided here for anybody interested. Following this I will have a Summary of this information.

A:
OWNER                COUNT(DISTINCTOBJECT_ID)
-------------------- ------------------------
APEX_050000                              3600
BI                                          8
HR                                         34
LBACSYS                                   237
PUBLIC                                  37125
SYS                                     42110
XFILES                                    141


PLAN_TABLE_OUTPUT                                                               
--------------------------------------------------------------------------------
SQL_ID  07q279qdm8pgp, child number 0                                           
-------------------------------------                                           
select /*+ gather_plan_statistics a*/ owner, count(distinct object_id)          
from big_table group by owner order by owner                                    
                                                                                
Plan hash value: 2085152455                                                     
                                                                                
----------------------------------------------------------------------------------------------------------- 
| Id  | Operation            | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |      1 |        |     38 |00:00:00.30 |    1577 |   1565 |     
|   1 |  SORT GROUP BY       |           |      1 |  90869 |     38 |00:00:00.30 |    1577 |   1565 |    
|   2 |   VIEW               | VM_NWVW_1 |      1 |  90869 |  93680 |00:00:00.27 |    1577 |   1565 |      
|   3 |    HASH GROUP BY     |           |      1 |  90869 |  93680 |00:00:00.24 |    1577 |   1565 |    
|   4 |     TABLE ACCESS FULL| BIG_TABLE |      1 |  90869 |  93680 |00:00:00.18 |    1577 |   1565 |       
------------------------------------------------------------------------------------------------------------
                                                                                


B:
OWNER                COUNT(DISTINCTOBJECT_ID)
-------------------- ------------------------
APEX_050000                               741
HR                                          4
LBACSYS                                    49
PUBLIC                                   7455
SYS                                      8508
XFILES                                     30



PLAN_TABLE_OUTPUT                                                               
--------------------------------------------------------------------------------
SQL_ID  729uapczb0dd2, child number 0                                           
-------------------------------------                                           
select /*+ gather_plan_statistics b*/ owner, count(distinct object_id)          
from big_table sample (20) group by owner order by owner                        
                                                                                
Plan hash value: 1425374893                                                     
                                                                                
------------------------------------------------------------------------------------------------------------| Id  | Operation              | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | 
----------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT       |           |      1 |        |     37 |00:00:00.06 |    1577 |   1565 |     
|   1 |  SORT GROUP BY         |           |      1 |   3635 |     37 |00:00:00.06 |    1577 |   1565 | 
|   2 |   VIEW                 | VM_NWVW_1 |      1 |   3635 |  18904 |00:00:00.05 |    1577 |   1565 |    
|   3 |    HASH GROUP BY       |           |      1 |   3635 |  18904 |00:00:00.05 |    1577 |   1565 | 
|   4 |     TABLE ACCESS SAMPLE| BIG_TABLE |      1 |   3635 |  18904 |00:00:00.03 |    1577 |   1565 |     
------------------------------------------------------------------------------------------------------------



C:
OWNER                APPROX_COUNT_DISTINCT(OBJECT_ID)
-------------------- --------------------------------
APEX_050000                                      3630
BI                                                  8
HR                                                 34
LBACSYS                                           236
PUBLIC                                          37338
SYS                                             42118
XFILES                                            140



PLAN_TABLE_OUTPUT                                                               
--------------------------------------------------------------------------------
SQL_ID  3bmz5b1a9uzqh, child number 0                                           
-------------------------------------                                           
select /*+ gather_plan_statistics c*/ owner, approx_count_distinct              
(object_id) from big_table group by owner order by owner                        
                                                                                
Plan hash value: 3184991183                                                     
                                                                                
------------------------------------------------------------------------------------------------------------| Id  | Operation            | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | 
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |      1 |        |     38 |00:00:00.14 |    1577 |   1565 |      
|   1 |  SORT GROUP BY APPROX|           |      1 |  90869 |     38 |00:00:00.14 |    1577 |   1565 |   
|   2 |   TABLE ACCESS FULL  | BIG_TABLE |      1 |  90869 |  93680 |00:00:00.09 |    1577 |   1565 |     
--------------------------------------------------------------------------------------------------------

Summary

To summarize the information seen

Performance

Method Run time
Count Full 0.30
Count Sample 0.06
Count Approx 0.14

Accuracy

Owner Count full Count sample Count sample*5 Count Approx
APEX_050000 3600 741 3705 3630
BI 8 0 0 8
HR 34 4 20 34
LBACSYS 237 49 245 236
PUBLIC 37125 7455 37275 37338
SYS 42110 8508 42540 42118
XFILES 141 30 150 140


Conclusion

The new function runs faster than the full scan and it is more accurate than using the sample feature. With the 20% sample, the BI records were not even seen. If you multiple the 20% sample by 5 you can get an estimate of the full table values with this method, but the values are not as close as the APPROX_COUNT_DISTINCT.

If you need to get a rough estimate of counts in a table and you are using a version >= 12.1.0.2 then you should consider this feature. If exact numbers are still needed then you will need to perform a full scan.

Sunday, March 13, 2016

Hotsos Symposium 2016 Wrap-up


Another Hotsos symposium is in the books. There were a so many great sessions that it is hard to quickly outline things. In addition to the presentations, there was a wonderful Training Day by Richard Foote.

The topic of indexes (Richard Foote's training day material) is more vast than we realize when doing day-to-day work in Oracle. Richard went through a great deal of information on indexing. During the course of his day long session he debunked many widely held myths about indexes. For example, is it best with a composite index to have the most selective values at the start of the index? Are you sure about that...?

This basic information is critical to supporting an Oracle system. A long standing belief of mine is that people should spend more time on the fundamentals of database design and performance rather than going down rabbit holes of features or techniques that are often not really needed. Many times I see people using these rare techniques to resolve issues that sometimes have much simpler fixes.

Throughout the conference there were a mix of presentations from veteran presenters and new participants. The typical big names including Tanel Poder, Kerry Osborne, Jeff Smith, et al provided the high quality presentations we are used to seeing from them. As usual for the Hotsos Symposium, the content was performance centric however it covered a wide variety of topics.

The training day wrapped up the Symposium but the start was from Kellyn Pot'Vin-Gorman and ThatJeffSmith on the importance of social media presence. Their presentation served to remind me of how infrequent I have been blogging and interacting in the social media space lately. As a result of their wonderful presentation, I will be working on doing more than a just few blog posts a year.

In addition to the increased social media activity I also plan on being more active with presentations and articles. I have done quite a bit in the past, but have taken a little bit of a break lately.

I guess that is the sign of a good conference... Being more energized when it is over.

Thursday, May 14, 2015

New Oracle Top-N query options

What is the best way to get the Top-N values from a table in Oracle? In the past the answer to this question has always been a little convoluted. There are a couple of clear ways to perform this task but they can feel kludgey when explaining them to an inexperienced Oracle developer or analyst. The processes work, but they don't seem as clear cut as they should.

With 12c Oracle has added a feature to allow for Top-N queries with FETCH FIRST|NEXT|PERCENT clauses. This approach appears more elegant than the "old" ways since it is easier to explain. Also there is some nice functionality with this syntax to allow for offsetting the values returned, including duplicate values, and even allowing to fetch the top X percent of values. See the Oracle documentation for examples.

In this post I want to do a quick comparision between the new Top-N approach and some older approaches. In the example I ran one case of the old approach ran much better when a particular index was involved, so it goes to show that something "new" isn't always "better". 

As with any feature, be sure to investigate the performance against representative amounts of data as you develop.

In the following example I'll query against big_tab which is a copy of the dba_objects table. After initial tests I'll unhide an index and see that only one query uses that index.

Note, all of the queries return the same data so there are no issues seen with functionality.

The tests were using the parameters
compatible                12.1.0.0.0
optimizer_features_enable 12.1.0.1


The FETCH FIRST clause runs in 2.4 seconds and does 48,190 logical reads. This is comparable to the other methods investigated which included a RANK() clause and the older subquery with rownum<=10.

Option 1.
SELECT /*+ gather_plan_statistics*/  owner, object_id FROM op.big_tab
ORDER BY object_id DESC    

FETCH FIRST 10 ROWS ONLY

Option 2.
select /*+ gather_plan_statistics */ owner, object_id
from
   (select  owner, object_id, rank() over (order by object_id desc ) as rank
   from big_tab )
where rank <=10

Option 3.
select /*+ gather_plan_statistics */  stage.owner, stage.object_id   
from ( select owner, object_id from op.big_tab order by object_id desc) stage
where rownum <=10


The detailed execution plans are listed out in the following sections. These show the details for Actual time and Buffer Gets to allow for comparing behavior.

Execution plan for Option 1.

SQL_ID  7thwrzawu40gz, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics*/  owner, object_id FROM op.big_tab
ORDER BY object_id DESC     FETCH FIRST 10 ROWS ONLY

Plan hash value: 4205637774

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |         |      1 |        |     10 |00:00:02.40 |   48190 |  48185 |       |       |          |
|*  1 |  VIEW                    |         |      1 |   2866K|     10 |00:00:02.40 |   48190 |  48185 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK|         |      1 |   2866K|     10 |00:00:02.40 |   48190 |  48185 | 80896 | 80896 |71680  (0)|
|   3 |    TABLE ACCESS FULL     | BIG_TAB |      1 |   2866K|   2866K|00:00:08.15 |   48190 |  48185 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("OBJECT_ID") DESC )<=10)

 
Execution plan for Option 2

SQL_ID  4y49s6tk4crha, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ owner, object_id from (select
owner, object_id, rank() over (order by object_id desc ) as rank from
big_tab ) where rank <=10

Plan hash value: 4205637774

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |         |      1 |        |     10 |00:00:02.47 |   48190 |  48185 |       |       |          |
|*  1 |  VIEW                    |         |      1 |   2866K|     10 |00:00:02.47 |   48190 |  48185 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK|         |      1 |   2866K|     11 |00:00:02.47 |   48190 |  48185 | 95232 | 95232 |83968  (0)|
|   3 |    TABLE ACCESS FULL     | BIG_TAB |      1 |   2866K|   2866K|00:00:08.52 |   48190 |  48185 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RANK"<=10)
   2 - filter(RANK() OVER ( ORDER BY INTERNAL_FUNCTION("OBJECT_ID") DESC )<=10)


Execution plan for Option 3

SQL_ID  7x3nv0d1r475r, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */  stage.owner, stage.object_id   
from ( select owner, object_id FROM op.big_tab ORDER BY object_id DESC)
stage   where rownum <=10

Plan hash value: 2431194888

---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |         |      1 |        |     10 |00:00:02.54 |   48198 |  48187 |       |       |          |
|*  1 |  COUNT STOPKEY          |         |      1 |        |     10 |00:00:02.54 |   48198 |  48187 |       |       |          |
|   2 |   VIEW                  |         |      1 |   2866K|     10 |00:00:02.54 |   48198 |  48187 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY|         |      1 |   2866K|     10 |00:00:02.54 |   48198 |  48187 | 80896 | 80896 |71680  (0)|
|   4 |     TABLE ACCESS FULL   | BIG_TAB |      1 |   2866K|   2866K|00:00:13.03 |   48198 |  48187 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)


Note that when an index was created on the object_id field Options 1 and 2 did not utilize the index. Option 3 (with the subquery), however, did utilize the index as we can see below. This is an interesting result that should be kept in mind as these types of queries are developed.

Execution plan for Option 3 (utilizing index)

SQL_ID  7x3nv0d1r475r, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */  stage.owner, stage.object_id   
from ( select owner, object_id FROM op.big_tab ORDER BY object_id DESC)
stage   where rownum <=10

Plan hash value: 1151062491

------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |      1 |        |     10 |00:00:00.01 |       4 |      4 |
|*  1 |  COUNT STOPKEY                |         |      1 |        |     10 |00:00:00.01 |       4 |      4 |
|   2 |   VIEW                        |         |      1 |     10 |     10 |00:00:00.01 |       4 |      4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| BIG_TAB |      1 |   2866K|     10 |00:00:00.01 |       4 |      4 |
|   4 |     INDEX FULL SCAN DESCENDING| BIG_IDX |      1 |     10 |     10 |00:00:00.01 |       3 |      3 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=10)



In conclusion, in most cases it looks like the new FETCH FIRST query will run just as well as the pre-12c methods for collecting the TOP-N queries. In this test, the subquery method (Option 3) was able to utilize an index the others could not, so if there are performance consideration you should check the plans against production level amounts of data.

Tuesday, March 11, 2014

Urge to Purge Part 3

Another area to help identify tables that can be purged is for high number of similar tables. The query below will identify tables with similar names, note that you may need to adjust the numbers to better identify such tables. Some examples of cases like this from Oracle's E-Business Suite are listed out below.

The query to use:
select substr(table_name, 1, 15), count(1)
from dba_tables
group by substr(table_name, 1, 15) having count(1) > 30
order by 2 desc;

In Oracle E-Business Suite there are several cases where interim and temp tables are not being cleaned out by the application. Below are a few cases that depend on the module used and the current release of the applications.

There are a couple of groups of temporary tables that have not been cleaned out by Oracle. These tables can be manually removed, but review notes indicated below to check that. The tables may not be causing any significant issues, they are just consuming space on the system. These objects can also slow down gathering object level statistics if there are a large number of them.

A.            FA_JOURNALS_INTERIM_xxx
Review note FA_JOURNALS_INTERIM Tables are Not Dropped after Successful Create Journal Entry and Journal Import (Doc ID 757666.1)

B.            GL_CONS_INTERFACE_xxxx
Review note Many Interim Tables and Indexes are filling up the GL and GL_NDX tablespaces (Doc ID 287534.1)

C.            GL_POSTING_INTERIM_xxx
Review note GL_POSTING_INTERIM_xxxx Tables are Not Being Dropped after Successful Posting (Doc ID 364040.1)

D.            XLA_GLT_xxx tables not being cleaned up by oracle
Review note XLA_GLT_XXXX Temporary Tables For Purge Program 'Purge Temporary Journal Import Interface Tables' Not Dropped (Doc ID 1158934.1)

Tuesday, August 27, 2013

Tuning Advice - Be Sure to Check the Bind Variables

An occasional problem I encounter is with queries doing too much work due to the bind variables used. When looking at queries performing a high number of reads, part of the investigation should include a review of all of the bind variables passed into the query.

Recall from older posts that the top queries can be identified in AWR/Statspack reports or by running a query against the data dictionary such as:

select buffer_gets, elapsed_time,cpu_time,executions,sql_id,sql_text
from (select buffer_gets, elapsed_time,cpu_time,executions,sql_id,sql_text, rank() over
(order by buffer_gets desc
) as rank from v$sqlstats
where last_active_time > (sysdate - 1/12)
) where rank <=20;

A trace with binds and waits will give you information about the bind variables that were used for a query. You can also query v$sql_bind_capture. If you have the appropriate license you can query dba_hist_sqlbind to get the bind variables for older queries.

select child_number,position,name,datatype_string,value_string 
from v$sql_bind_capture 
where sql_id=&SQL_ID_from_1
order by child_number,position;

When the bind variables have been identified, check the occurrences of those values in the table columns being queried. A value that is heavily skewed could represent a programming issue if too much data is being read or retrieved. Also any date variables that are causing the query to read multiple years worth of data should be reviewed.

Sometimes the fix to a query performing high reads is a programming change to alter how those queries are used. If bad variables are used against a query, then you will be judged for the performance issues created.

The Smith family seeing a query using bad variables:

Tuesday, June 11, 2013

Urge to Purge part 2

Part 1 of this blog post discussed using standard purge functions to reduce overhead on the system. A query to identify the largest objects was provided as a starting point for purge investigations.

A couple of other items are worth reviewing. Often systems will have backup tables created to assist with troubleshooting. These tables should be removed when they are no longer needed.

One way to locate backup tables is to run a query like the following. It may not catch all tables depending upon the naming standards, but it is a good place to start. The list that gets returned may include valid tables, so be sure to check to determine if the tables are being used. You can also edit this query to use other possible strings such as TEMP, TMP, ARCH, ARCHIVE, HIST, BACK, OLD, etc.

select * from dba_segments
where owner not in ('SYS','SYSTEM')
and segment_name like '%BKP%'
and segment_type='TABLE'
order by bytes desc;

Another place to review, involves the log files for materialized views. Sometimes these logs can become quite large. There are a few options to reclaim the space, review Oracle notes for cleaning up materialized view logs.  If you are using E-Business Suite, look up the view name in My Oracle Support as there may be additional steps needed to reclaim the space. The following query will list the materialized view logs ordered by size. Focus on only the largest objects.

select *
from dba_segments
where segment_name like 'MLOG$_%'
order by bytes desc;