There are so many combinations of parameters, how does one decide the most optimum parameters when using dbms_stats?
One method I found useful is checking the accuracy of num_distinct_rows against time taken for gathering statistics. You want good accurate stats but produced in a short time.
First compile this Function to automatically and accurately calculate the number of distinct rows:
create or replace function pn_distinct(p_table in varchar2, p_col in varchar2)
return number
is
q_text varchar2(4000);
v_records number;
begin
q_text := 'select count(distinct ' p_col ') from 'p_table ;execute immediate(q_text) into v_records;
return v_records;
end;
Run the dbms_stat tweaking with the different parameters e.g. to collect stats on all tables in schema that begin with DWH in the example below;
create or replace procedure dwh_gather_stats
as
cursor c1 is
select table_name
from user_tables
where table_name like 'DWH%';
begin
For c1rec in c1 loop
dbms_stats.gather_table_stats(ownname=>'BI_USER',
tabname=>c1rec.table_name,
estimate_percent=>10,
method_opt=>'for all columns size auto',
cascade=>true);
end loop;
end;
record the time taken with each set of parameters.
and then run the following query and decide on a trade off between accuracy of the stats and time taken to gather the stats by comparing the num_distinct and actual_distinct columns. You can use this method to compare dbms_stats against analyze as well.
select p.table_name,p.last_analyzed,t.COLUMN_NAME,t.NUM_DISTINCT,
pn_distinct(p.table_name,t.column_name) actual_distinct
from user_tab_columns T,
user_tables P
where p.table_name like 'DWH%'
and p.table_name = t.TABLE_NAME
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment