Teradata SQL Reference: Derived Tables

Derived tables can be a powerful technique to produce efficient queries, but they can also cause major performance problems when used in inappropriate situations.

Here are some guidelines for the use of derived tables:

Never use a derived table to simply restrict the records of a large table prior to joining it to some other table. Doing this prevents the optimizer from using statistics on the table when it is subsequently joined to another table, since the derived table is pulled into a spool file, and this spool file will not have statistics available to the optimizer to prepare downstream joins.

Do not use a derived table composed of a query which contains the same tables that are joined outside the derived table unless you have to perform aggregation or some other operation within the derived query that cannot be performed against those tables in the base query.

A permissible example would be a derived table which gets the keys of the latest records in a table (e.g. max(Txn_Date)) and is joined to the same table in the base query to get the latest records.

Use of a derived table may be appropriate when it significantly reduces the complexity/increases the readability of a query.

An example is the use of a derived table in from clause of an update statement. This is the recommended way to write an update.

General approach should be use of temporary tables instead of derived table if the expected dataset or involved table(s) have more then 250K records (1000 rows AMP * 240 AMPs).

Use Derived Tables in Updates:

It can significantly reduce query complexity and improve performance and readability if updates are written with from clause as a derived table. This is particularly useful when there are many table involved in the query. For example
Instead of:
UPDATE TB1
FROM
Table1 TB1
, Table2 TB2
, Table3 TB3
, Table4 TB4

SET TB1.COL3 = TB4.COL3

WHERE TB1.COL1 = TB2.COL1
AND TB1.COL2 = TB3.COL2
AND TB2.COL1 = TB3.COL1
AND TB2.COL1 = TB4.COL1
AND TB2.COL4 = 123 ;

This is preferred:
UPDATE Table1
FROM
( SELECT
TB2.COL1
, TB3.COL2
, TB4.COL3
FROM
Table2 TB2
INNER JOIN
Table3 TB3
ON TB2.COL1 = TB3.COL1
INNER JOIN
Table4 TB4
ON TB2.COL1 = TB4.COL1
WHERE TB2.COL4 = 123
) XXX

SET Table1.COL3 = XXX.COL3

WHERE Table1.COL1 = XXX.COL1
AND Table1.COL2 = XXX.COL2;

Whilst the code may be longer, the select which obtains the rows to be updated can be prototyped in isolation of the update, and therefore can also be individually tuned separate to the update. Logic errors that might be hidden in an all-in-one update statement become much more visible when written as a derived query.

Derived tables can be a powerful technique to produce efficient queries, but inappropriate usage can cause performance problems.

Guidelines for the use of derived tables:

Use of a derived table may be appropriate when it significantly reduces the complexity/increases the readability of a query. An example is the use of a derived table in from clause of an update statement. This is the recommended way to write an update ( Use Derived Tables in Updates above).
The general approach should be use of temporary tables instead of derived table if the expected dataset or involved table(s) have more then 250K records (1000 rows/AMP * 240 AMPs).

Derived Tables

1 comment: