Wednesday, January 15, 2014

We looked at Teradata's PE tasks. We will now look into what the AMP does.
The AMP maintains the following:
storing and retrieving rows from their assigned disks.
locking the tables and the rows
sort rows and do all aggregation
handle all join processing
handle all space management and space accounting
The AMP executes the plan from the PE. With a where clause there's only 1 AMP retrieve.
By using the keyword EXPLAIN, we can see the parsing engine's plan. A Non-unique Primary Index will have duplicates grouped together on the same AMP. 64 columns can be grouped together in a multicolumn primary index.
A table can have one primary index. If the table is specified with NO PRIMARY INDEX, it will receive no primary index. A full table scan is likely on a table with no primary index.
 When we type EXPLAIN with a sql query on a table with no primary index, it can show the full table scan by utilizing all-AMPs retrieve. When the primary index is omitted, the first column is used by default.
The NoPI tables are designed to be staging tables. Data from a mainframe or server can be loaded onto Teradata quickly with perfect distribution. Data is moved from staging tables and not queried so an index is not necessary.
In Teradata, a row's primary index value is hashed and the output (row hash) is stored with the  Row_ID. Since the index creates the same hash value, the row hash retrieves the same row again. There is only one Hashing Formula.
The HashMap determines which AMP will own the Row. The RowHash is used to lookup the AMP in the HashMap. Different rows are hashed to different AMPs. The HashMap gives a global perspective of all the rows.
If we were to create an index on a column that allowed duplicate values, Each AMP will place a uniqueness value as a subidentifier after the RowHash to track duplicate values.
The Row-Hash and the uniqueness value make up the Row-ID.
 For the unique primary index, the Row-ID is 1. Each AMP sorts their rows by the Row-ID.

No comments:

Post a Comment