Cluster computing

We now look at Partition Primary Index tables in the Teradata server.
Each table in Teradata has a primary Index unless its a NoPI table. Partitioning tells the AMPs to sort the table rows by partition first, and then sort the rows of a table with the Row-ID. Partitioning is a way to prevent FULL table scans.
Partitions can be specified with Range_N options such as for creating a partition each day/week/month/year.
The query would look like CREATE TABLE Order_Table_PPI
(
Order_Number INTEGER NOT NULL,
Order_Date Date
) PRIMARY INDEX (Order_Number)
Partition by Range_N(OrderDate BETWEEN date '2013-01-01' AND DATE '2013-12-31'
EACH INTERVAL '7' DAY);
this would create about 52 partitions. Or the partitions could be based on each partition size.
Partitions can be of arbitrary number. Teradata supports a very very large number.
Older data and newer data in PPI can be combined.
Partitions can be nested too. The top partition is usually the only one that can be altered. Upto 15 levels of Range_N or Case_N partitions can be nested.
A primary index must be non-unique on all partitioned tables unless the primary index column is also used to define the partition. Most PPI tables are defined as NUPI.
Character based PPI are also common.
Alter Table Order_Table to current with insert into Order_table_History
Columnar table partitions are also used often. Columnar tables must be NoPI tables. Since the partitions are along columns, the columns are placed inside their own container. All containers have the same amount of rows. Each container looks like a small table for I/O purposes. Columnar tables make sense when users query only certain data.
In a columnar table when a row is deleted, it is not physically deleted but just marked deleted.
Columnar can move just one container to memory. Containers match up perfectly to rebuild a row.
Containers can have more than one columns.

Cluster computing

Thursday, January 16, 2014

No comments:

Post a Comment