COLLECT STATISTICS is a Teradata sql command that scans
columns and indexes of a table and records demographics of the data.
COLLECT STATISITICS is used to provide the Teradata
Optimizer with as much information on data as possible. The Optimizer uses this
information to determine how many rows exist and which rows qualify for given
values.
Collecting statistics can improve the execution of a sql.
The optimizer can have more details about each column or index, and therefore
determine a better join plan for resolving the query.
COLLECT STATISTICS COMMANDS SHOULD ONLY BE USED WHEN A
SPECIFIC SQL (OR GROUP OF SQLS) IS BENEFITTED.
Because, Collecting
statistics may not improve a sql’s efficiency. Therefore, the processing time used to
collect the statistics was wasted, and this could be a large amount of processing time.
If a sql’s performance is shown to improve from a specific
collected statistic, then the statistic should be initially collected and then
refreshed on a regular basis.
To possibly avoid spoolspace errors in the application
design phase, some common problem columns are listed below in "Collect
Statistics - Problem Columns".
With COLLECT STATISTICS:
- COLLECT
STATISTICS can make the usage of these more efficient:
Primary
indexes
Secondary
indexes
Join
Columns or base tables
Join
Indexes
Qualifying
columns
-Statistics are especially informative if index values are distributed unevenly.
- When a
query uses conditionals based on non-unique index values, then Teradata uses statistics to determine whether indexing or a full search of all table rows is more efficient.
- If
Teradata determines that indexing is the best method, then it uses the
statistics to determine whether spooling or building a bitmap would be the most efficient method of qualifying the data rows.
Without COLLECT STATISTICS:
- The
Optimizer assumes:
Non-unique
indexes are highly non-unique. (Lots of rows per value).
Non-Index
columns are even more non-unique than non-unique indexes. (Lots of rows
per value)
- Teradata
derives row counts from a random AMP sample for:
Small
tables (less than 1000 rows per amp),Unevenly
distributed tables (skewed row distribution due to PI).
Random
amp sample:
Look at data from 1 amp of table, and from
this, estimate the total rows in the table.
Random
amp samples may not represent the true total number of rows in the table
because the rows in the table may not be distributed evenly. This occurs often
with small tables. Asof 9/2000, per table, the random amp sample uses the same amp for each sql or
query.