TPUMP Basics

TPump is a data loading utility that helps you maintain (update, delete, insert, and atomic upsert) the data in your Teradata Database. Used to keep the target table updated continuously .. helps you achive near real time data in the data warehouse.  
Concurrency: MultiLoad is limited to a maximum of 15 instances running concurrently. TPump does not impose this limit.

TPump uses row hash locks rather than table level locks. This allows the users to run queries while TPump is running. This also means that TPump can be stopped instantaneously.
Instead of updating Teradata Databases overnight, or in batches throughout the day, TPump updates information in real time, acquiring data from the client system with low processor utilization. It does this through a continuous feed of data into the data warehouse, rather than through traditional batch updates. Continuous updates result in more accurate, timely data.
TPump provides a dynamic throttling feature, that we can specify the number of statements run per minute, or may alter throttling minute-by-minute.

TPump’s main attributes are:
  • Simple, hassle-free setup – does not require staging of data, intermediary files, or special hardware.
  • Efficient, time-saving operation – jobs can continue running in spite of database restarts, dirty data, and network slowdowns. Jobs restart without intervention.
  • Flexible data management – accepts an infinite variety of data forms from an infinite number of data sources, including direct feeds from other databases.

Resource Consumption: TPump has a built-in resource governing facility. This allows the operator to specify how many updates occur (the statement rate) minute by minute, and then change the statement rate, while the job continues to run. Thus, this facility can be used to increase the statement rate during windows when TPump is running by itself, but then decrease the statement rate later on, if users log on for ad hoc query access.

The TPump task provides the acquisition of data from client files for application to target tables through INSERT, UPDATE, or DELETE statements that specify the full primary index.
  • TPump examines all commands and statements for a task, from the BEGIN LOAD command through the END LOAD command, before actually executing the task.
  • After all commands and statements involved in a given task have been processed and validated by TPump
  • Optionally, TPump supports data serialization for a given row, which guarantees that if a row insert is immediately followed by a row update, the insert is processed first. This is done by hashing records to a given session.
  • TPump supports bulletproof restartability using time-based checkpoints. Using frequent checkpoints provides a greater ease in restarting, but at the expense of the checkpointing overhead.
  • TPump supports upsert logic similar to MultiLoad.
  • TPump uses macros to minimize network overhead. Before TPump begins a load, it sends the statements to the Teradata Database to create equivalent macros for every insert/update/delete statement used in the job script.     The execute macro requests, rather than lengthy text requests, are then executed iteratively during a job run.
  • TPump supports error treatment options, similar to MultiLoad.
  • TPump runs as a single process


No comments:

Post a Comment