Amp Offline in Teradata Database

The Teradata database system is impacted in the following ways when an amp is offline:

            1) System Performance                                                
            2) Table Updates/Offline Recovery
            3) Table Rebuild
            4) Spoolspace
            5) ASF Archives
            6) ASF Restore/Copy
            7) Amp Usage Statistics
            8) 2nd Amp Risk
            9) Fastload Configuration Change

1) System Performance

The system performance is impacted because the processing power of the down amp is missing. The application processing normally done on the down amp must be absorbed by the other amps in the system. This is accomplished with clustering.

When an amp is down, a table's primary copy of the data on that amp is not available, but the fallback copy is available on the other amps within the cluster. In our environment, the cluster size is 4 amps, so the table data from the down amp is spread across the other 3 amps. This means that when an amp is offline, the access to the table data on the other 3 online amps in the cluster is increased.

The amount of processing increase in the other 3 amps can be determined as follows: Assume that the system processing is spread evenly across all 4 original amps. Then if 1 amp is removed, then to do the same processing on the other 3 amps, the processing would have to increase by 33% on each of the other 3 amps.

Therefore, when an amp is offline, the system performance of the cluster decreases by 33%. And, if this particular cluster has amps that are very highly utilized, then the overall system performance degradation would be noticable.

2) Table Updates/Offline Recovery

Table data can be updated while an amp is down. The data slated for the down amp is directed to the other 3 amps instead. A recovery journal ("dbc.changedrowjournal") is used on the other 3 amps. But, when the amp is brought back online, then the table updates stored on the other 3 amps must be updated on the new amp when it is brought back online. This procedure is "offline recovery" or "catch-up".

TOS single threads each logon's "offline recovery".

3) Table Rebuild

If the down amp's portion of the table data is not available when the amp is brought back online, then it would be rebuilt with a procedure called "table rebuild". This procedure reads the fallback copy of the table data from the 3 online amps in the cluster, and creates both the primary and fallback copy of the table data on the new amp.

4) Spoolspace

The DBA sets the spoolspace limit per logon, but TOS sets the spoolspace limit per amp (See "Spoolspace" Tip). When an amp is down, the spoolspace limit per amp remains the same, but the spoolspace limit per logon is decreased by 1 amp's worth. Therefore, on the 3 online amps, the processing will increase, and the spoolspace usage will increase, but the spoolspace limit per amp will remain the same.

5) ASF Archives

When an amp is down, the ASF archive will use the fallback copy. If indexes are also archived, then NUSI's (non-unique secondary indexes) will be skipped, and a warning message displayed. The message also includes the amp number that is down.

6) ASF Restore/Copy

ASF Restore does not work when an amp is down. ASF Copy will work and use the online amps in a cluster.

7) Amp Usage Statistics

When the "table rebuild" is executed for the amp, the dbc.acctg table (dbc.ampusage view) is also rebuilt. But this dictionary table does not have fallback, therefore the cpu and diskio statistics for each logon on that amp are lost. This is seen in the Opermenu utility as an unusually high number of logons accessing the system during the hour.

8) 2nd Amp Risk

During the time that 1 amp is down in a cluster, there is a very small risk that a 2nd amp could also go down in the same cluster.

Within the same cluster as the down amp: If a 2nd amp goes down, but the table data is recoverable, then the system processing would be halted, and after the amp is fixed, the system would be restarted. But if the table data is not recoverable, then the table data of the entire system would be restored, and then the system would be restarted.

9) Fastload Configuration Change

Fastload cannot continue when an amp goes offline while the fastload is executing.One of the recovery options is to wait until the original configuration is attained, but this will probably never be used.


No comments:

Post a Comment