Tuesday 12 June 2007

dbms_stats

There are so many combinations of parameters, how does one decide the most optimum parameters when using dbms_stats?

One method I found useful is checking the accuracy of num_distinct_rows against time taken for gathering statistics. You want good accurate stats but produced in a short time.

First compile this Function to automatically and accurately calculate the number of distinct rows:

create or replace function pn_distinct(p_table in varchar2, p_col in varchar2)
return number
is
q_text varchar2(4000);
v_records number;

begin

q_text := 'select count(distinct ' p_col ') from 'p_table ;execute immediate(q_text) into v_records;
return v_records;

end;

Run the dbms_stat tweaking with the different parameters e.g. to collect stats on all tables in schema that begin with DWH in the example below;

create or replace procedure dwh_gather_stats
as
cursor c1 is
select table_name
from user_tables
where table_name like 'DWH%';

begin

For c1rec in c1 loop
dbms_stats.gather_table_stats(ownname=>'BI_USER',
tabname=>c1rec.table_name,
estimate_percent=>10,
method_opt=>'for all columns size auto',
cascade=>true);

end loop;

end;

record the time taken with each set of parameters.

and then run the following query and decide on a trade off between accuracy of the stats and time taken to gather the stats by comparing the num_distinct and actual_distinct columns. You can use this method to compare dbms_stats against analyze as well.


select p.table_name,p.last_analyzed,t.COLUMN_NAME,t.NUM_DISTINCT,
pn_distinct(p.table_name,t.column_name) actual_distinct
from user_tab_columns T,
user_tables P
where p.table_name like 'DWH%'
and p.table_name = t.TABLE_NAME

No comments: