Skew Factor

The data distribution of table among AMPs is called Skew Factor.

Generally for NUPI, we get duplicate values, so the more dulicate values you get the more the data will have the same row hash so all the same data will be loaded into same AMP. It makes data distribution inequality

One AMP will store more data and another AMP will store less data, when we access the full table, the AMP with more data will take longer time to retrive data and make other AMPs wait until it fetches data which leads to processing wastage.

In this situation we should avoid full table scans.


1 comment: