Wednesday, January 15, 2014

We mentioned hashing of rows in Teradata. We also sort the rows based on this row id. AMPs sort their rows by Row-ID to Group like Data . Since the Row-IDs are sorted, the AMPs use binary search. NULL values all hash to the same AMP. Note that this means data is skewed. So handling NULLs can be avoided by having Unique Primary Index or a multi-column primary index. A Unique Primary Index will spread the data perfectly evenly. A multi-column can overcome the data skewing by NULL values.
 A full table scan is performed by the participation of all AMPs. All AMPs read all of their rows because there is no Primary Index.
When data is loaded, each AMP holds a portion of the rows for every table. When queries are run, each AMP simultaneously retrieves the rows. The table header and data rows are stored on each AMP. The table header is created on each AMP when the table is created.
Each AMP stores the rows they own inside a data block. Data is read into each AMPs cache called the First System Generating Cache (FSG Cache). When the header and the data is in memory, the AMP can read and process.
When a proper index is chosen, a single AMP needs to move only one block of data into the FSG cache. When the rows are inserted into the block and the block grows, the block is split. If a block moves into the FSG Cache
Synchronized Scan (Sync Scan) is another feature. When there are multiple users reading different blocks, the system shares reads between multiple scans at the block level. New query joins scans at the current scan point.
 The system wraps around after reading all the blocks so the late comer gets the blocks they missed. Scan could have also started before the query ran.
Teradata has a measure for data temperature i.e how hot or frequently used it is. The hottest data is kept in memory automatically. There is no DBA intervention or application changes required. While FSG Cache is used to move data from disk temporarily, this intelligent memory doesn't purge the data in memory but keeps it there. The data can stay there for days.
Both are complimentary to each other. Most frequently queried tables, dictionaries and anything that requires sub-second data are all valid candidates.  Data deemed very hot stays in each AMPs intelligent memory.
The physical database is also designed to help with queries.
First, the primary index is chosen such that it is most often used by users to find rows.
Second, the primary index will reduce the data transfers from disk.
Third, the secondary indexes will limit the number of AMPs used and data blocks transferred. Upto 32 secondary indexes can be used.
Fourth, the tables are partitioned to sort the rows horizontally.
Fifth, the tables can be partitioned vertically so that the transfer of columnar blocks can be efficient.
Sixth, the modeled tables can be denormalized.
 

No comments:

Post a Comment