The Teradata database system is impacted in the following
ways when an amp is offline:
1) System
Performance
2) Table
Updates/Offline Recovery
3) Table
Rebuild
4)
Spoolspace
5) ASF
Archives
6) ASF
Restore/Copy
7) Amp
Usage Statistics
8) 2nd Amp
Risk
9) Fastload
Configuration Change
1) System Performance
The system performance is impacted because the processing
power of the down amp is missing. The application processing normally done on
the down amp must be absorbed by the other amps in the system. This is
accomplished with clustering.
When an amp is down, a table's primary copy of the data on
that amp is not available, but the fallback copy is available on the other amps
within the cluster. In our environment, the cluster size is 4 amps, so the
table data from the down amp is spread across the other 3 amps. This means that
when an amp is offline, the access to the table data on the other 3 online amps
in the cluster is increased.
The amount of processing increase in the other 3 amps can be
determined as follows: Assume that the system processing is spread evenly
across all 4 original amps. Then if 1 amp is removed, then to do the same
processing on the other 3 amps, the processing would have to increase by 33% on
each of the other 3 amps.
Therefore, when an amp is offline, the system performance of
the cluster decreases by 33%. And, if this particular cluster has amps that are
very highly utilized, then the overall system performance degradation would be
noticable.
2) Table Updates/Offline Recovery
Table data can be updated while an amp is down. The data
slated for the down amp is directed to the other 3 amps instead. A recovery
journal ("dbc.changedrowjournal") is used on the other 3 amps. But,
when the amp is brought back online, then the table updates stored on the other
3 amps must be updated on the new amp when it is brought back online. This
procedure is "offline recovery" or "catch-up".
TOS single threads each logon's "offline recovery".
3) Table Rebuild
If the down amp's portion of the table data is not available
when the amp is brought back online, then it would be rebuilt with a procedure
called "table rebuild". This procedure reads the fallback copy of the
table data from the 3 online amps in the cluster, and creates both the primary
and fallback copy of the table data on the new amp.
4) Spoolspace
The DBA sets the spoolspace limit per logon, but TOS sets
the spoolspace limit per amp (See "Spoolspace" Tip). When an amp is
down, the spoolspace limit per amp remains the same, but the spoolspace limit
per logon is decreased by 1 amp's worth. Therefore, on the 3 online amps, the
processing will increase, and the spoolspace usage will increase, but the
spoolspace limit per amp will remain the same.
5) ASF Archives
When an amp is down, the ASF archive will use the fallback
copy. If indexes are also archived, then NUSI's (non-unique secondary indexes)
will be skipped, and a warning message displayed. The message also includes the
amp number that is down.
6) ASF Restore/Copy
ASF Restore does not work when an amp is down. ASF Copy will
work and use the online amps in a cluster.
7) Amp Usage Statistics
When the "table rebuild" is executed for the amp,
the dbc.acctg table (dbc.ampusage view) is also rebuilt. But this dictionary
table does not have fallback, therefore the cpu and diskio statistics for each
logon on that amp are lost. This is seen in the Opermenu utility as an
unusually high number of logons accessing the system during the hour.
8) 2nd Amp Risk
During the time that 1 amp is down in a cluster, there is a
very small risk that a 2nd amp could also go down in the same cluster.
Within the same cluster as the down amp: If a 2nd amp goes
down, but the table data is recoverable, then the system processing would be
halted, and after the amp is fixed, the system would be restarted. But if the
table data is not recoverable, then the table data of the entire system would
be restored, and then the system would be restarted.
9) Fastload Configuration Change
Fastload cannot continue when an amp goes offline while the
fastload is executing.One of the recovery options is to wait until the original
configuration is attained, but this will probably never be used.
No comments:
Post a Comment