Teradata NULL functions


The following are the NULL functions in Teradata sql:

1) NULLIFZERO
This will change a zero value to a NULL value.

2) NULLIFSPACES
This will change a space value to a NULL value.

3) ZEROIFNULL
This will change a NULL value to a zero.

4) NULLIF = 0
This will change a zero value to a NULL.
ex: define c1(decimal (5,2), NULLIF = 0);

5) NULLIF = ' '
This will change a space value to a NULL.
ex: define c1(char (2), NULLIF = ' ');

6) NULLIF = '?'
This will change a '?' value to a NULL.
ex: define c1(char (2), NULLIF = '?');

7) NULLIF = '40616263'XC
The '40616263'XC is a hexadecimal representation of '@abc'. Therefore, this will change a '@abc' value to a NULL.
ex: define c1(char (4), NULLIF = '40616263'XC );

8) CASE WHEN :c1 = '?' THEN NULL ELSE :c1 END
This will change a '?' to a NULL.

9) NULLIF(:c1,'?')  
This will change a '?' to a NULL. This is an abbreviation of the case command.

10) CASE WHEN c1 IS NOT NULL THEN c1 ELSE 0 END
This changes a null value to a zero.


Teradata data types

Teradata data types can be categorized as follows.

1) Character data:
This is character data that can be fixed or variable length.
The data types can be CHAR, VARCHAR, CHAR VARYING, LONG VARCHAR.

2) Arithmetic Numeric data:
This is numeric data that can have arithmetic operations performed on it.
This numeric data is usually money or quantity values.
The data types can be DATE, DECIMAL, NUMERIC, BYTEINT, SMALLINT, INTEGER, REAL, DOUBLE PRECISION, or FLOAT.
The arithmetic operations can be:
SUM(t1.c1), MAX(t1.c1), t1.c1 + t2.c1, etc.

3) Non-Arithmetic Numeric data:
This is numeric data that will not have arithmetic operations performed on it.
This numeric data is usually code or id values, and usually, is not money or quantity values.
The data types can be DATE, DECIMAL, NUMERIC, BYTEINT, SMALLINT, INTEGER, REAL, DOUBLE PRECISION, or FLOAT.

4) Byte  data:
This is bit oriented data, with hexadecimal orientation of values.
This data type is not currently used in the EDW application tables.
The data types can be BYTE, VARBYTE.
This data will not be covered as part of this NULL topic.

5) Graphic data:
This is Japanese Kanji oriented data, with a mixture of single and multiple bytes representing the Japanese language.
This data type is not currently used in the EDW application tables.
The data types can be GRAPHIC, VARGRAPHIC, LONG VARGRAPHIC.
This data will not be covered as part of this NULL topic.


Teradata Architecture

Teradata Architecture diagram



Here the basic and most important component is PE (Parsing Engine) and PE can support up to 120 sessions.

1) Session Control : It checks authorization of users before processing any SQL query.

2) Parser : It checks syntax of the SQL query and access rights to the user to access various database objects referred in the SQL query.

3) Optimizer : Determines the most effective way by creating a plan or execution steps to follow  in order to perform actions on database objects as per SQL query submitted by user and scans the SQL query to determine where to place locks.

4) Dispatcher : It controls the sequence in which steps are executed and  It passes the execution steps to AMP by using BYNET. Dispatcher also plays an important role by combining all the responses received and send it to user.

The Next component is BYNET. It is used for communication between PE's and AMP's (Access Module Processor). There are two BYNET's available in any TERADATA environment: BYNET-0 & BYNET-1. Two BYNET's allows for continuous sending and receiving messages between PE's and AMP's.

AMP's can be considered as the worker in TERADATA Architecture. All the calculations and actual execution takes place in AMP only.Each AMP has it's own dedicated VDISK (Virtual DISK) to which it queries and process the steps planned by Optimizer. AMP's work only on their own VDISK and do not have access to other AMP's VDISK. Once the AMP perform the steps, it send back response to PE via BYNET where all the responses from various AMP's is collected and sent back to user.


Drop Table and Create Table Commands in Teradata SQL

Drop Table and Create Table commands  will address the following problems:

1. Reduce update activity on the DD/D (Data Dictionary/Directory), and improve overall system throughput.
2. Resolve the occasional deadlock problems which occur from simultaneous DD/D updates. (This could cause your application to be idle for multiple hours.)

The drop table command should be replaced with the “DELETE FROM table ALL” command. The delete command will perform better than the drop command, and without the DD/D updates.

The create command should only be executed when needed. The create table command should only be executed for table alterations or recovery situations.

There are exceptions to this requirement:
1. The fastload error tables must always be dropped, so that fastload can re-create them.
2. The DSS(Decision support system) Query tool creates tables for intermediate data storage.


Disadvantages of Compression

·          Does not compress varchar columns.
·   There should not be any spoolspace usage differences, because where compressed values are converted back into spoolspace, there should have been a spool file anyway. The spoolspace usage differences may not be fully understood and tested yet.


Advantages of Compression

The below are the advantages of data compression in Teradata.

·      Compression results in shorter rows which are generally more performant.
·      Column compression provides the following capacity and performance benefits:
              Reduced capacity
            Reduced I/O traffic
            Moderate CPU savings
·    The I/O savings correlates to the percentage of data compressed out of a row.
·    Table backups use the compressed format, so backups will also be faster.


Compression in teradata

Here are some rules and guidelines of Teradata's data compression via the COMPRESS phrase:

1) COMPRESS:
COMPRESS is not a complex data reducing algorithm. It does not reduce repeating characters (or bits), or character (or bit) patterns within rows, columns, blocks or cylinders.

COMPRESS practically eliminates the data storage of nulls (or 256 constant values) for fixed-length, non-primary index columns.


2) Rules for compression:
·         Column must be fixed-length, 255 characters or less, and not part of the primary index.
·          The following types of data can be compressed:
          Nulls, Zeros, Blanks, Any numeric data type,
        DATE (expressed as COMPRESS (DATE 'yyyy-mm-dd')),
        CHARACTER(up to 255 characters).
·          The default setting is no compression.
·          Additional rules have been specified in the documentation.
·        Only FIXED WIDTH data types can be compressed. VARIABLE WIDTH (VARCHAR) data types cannot be compressed.
·   The compress column values are case specific. The value "PRODUCT" does not get compressed, if "product" is the specified compression value.
·     Compression values are stored in the table header, and in V2R5, there is a limit of 255 values, and a header size limit of 64KB.
·      There are 2 forms of compression:
    - Teradata RDBMS V2R4.1 (current version): The compress attribute  
      compresses the nulls, and 1 column value.
      - Teradata RDBMS V2R5 (next version): The compress attribute compresses 
      the nulls, plus up to 256 column values.
·         The syntax for the compression attribute is as follows:
            CREATE SET TABLE tbl1,
            ( col1    CHAR(1)  COMPRESS,    /* Compresses nulls */
              col2    CHAR(1)  COMPRESS 'product' )  /* Compresses nulls plus the   
              value 'product' */
            PRIMARY INDEX ...;

3) Guidelines for Compression:
·         Only compress columns when compression affects enough of the rows in the table to make compression worthwhile.
·         Measure the potential savings with the following calculation:
       CC * CW , where CC is the count of potential compressed nulls (or values) 
     for the column, and CW is the column width.
        Sqls for calculating CC:
                  - Get actual null count:
                              select count(*) from tbl1 where col1 IS NULL;
               - Get actual value list by value count:
                        select col1, count(*) from tbl1 group by 1 order by 1 desc;
·          Measure the actual savings with the following calculation:
       (TB - TA) / TA, where TB is the tablesize before the compression, and TA 
     is the tablesize after the compression.
·          Presence bits:                                                                                                                                 For every data column, there will be either 0, 1, or 2 Presence Bits:                
      1) For each 8 columns compressable, add 1 byte of compression presence bits per row.
     2) And, for each 8 columns nullable, add 1 byte of nullable presence bits per row.
·         Convert VARCHAR to Fixed:      
     Since a VARCHAR cannot be compressed, then it could be converted to a fixed length CHAR, and then compressed. Weigh these differences: A VARCHAR column contains a 2 byte VLI (variable length indicator) for each value, and a CHAR column contains the same number of characters per value.
·          Implement the compressions on the test system first (DW TEST (t62tst)), and then migrate the changes to the production system (DW EDW (t61edw)).


Collect Statistics Syntax in Teradata

The following are the Collect Statistics Syntaxes in Teradata.

“COLLECT STATISTICS ON tablename COLUMN columnname;” will collect statistics  on a column.

                .
“COLLECT STATISTICS ON tablename INDEX (columnname)” will collect statistics  on an index.
                    
“COLLECT STATISTICS ON tablename INDEX (col1, col2, ...)” will collect statistics  on multiple columns of an index.                      

“HELP STATISTICS tablename;” will display the number of distinct values of the columns. 
                        
“COLLECT STATISTICS tablename;” refreshes (recollects) the table statistics.

 “DROP STATISTICS ON tablename ... ;" will drop the statistics.