Hashing Algorithm

When the primary index value of a row is input to the hashing algorithm, then the output is called the “row hash”. Row hash is the logical storage address of the row, and identifies the amp of the row. Also, the “table id” plus the row hash identifies the cylinder and data block, and is used for row distribution, placement and retrieval of the row. Based on the row hash uniqueness, data distribution happens.

The “table id” is a sequential number assigned whenever a table is created. This number changes whenever a table is re-created.

“Hash code redistribution” is used in join operation. This is used when the foreign key (join column) of a table (i.e. table A) is joined to a primary index of another table (i.e. table B). For each table A row, the row hash of the foreign key is calculated. Then, the table A row is sent to the amp dictated by the row hash, which is the same amp that contains table B’s row for that row hash.

“Join column hash code sequence” is the result of a sorting. The row hash of the foreign key (join column) of a table (i.e. table A) is sorted into this sequence. These are matched in sequence to the other table (i.e. table B) on the same amp.


No comments:

Post a Comment