Cluster computing

In this post, we talk about temporal tables create function in Teradata
There are three types of temporal tables.
- valid time temporal tables
- transaction time temporal tables
- bi-temporal tables that are a combination of both.
These tables are created with the PERIOD data type that introduces a begin and end timestamp.
The only difference is that the ValidTime can be a date or a timestamp. The Transaction Time has to be a timestamp.
The timestamp changes on update both before and after updates as in the case of Bi-Temporal tables. The non-sequenced ValidTime gives the data as the way things were while the current valid time gives the way the data is.
We will now look at how joins work internally in Teradata.
Teradata requires that for two rows to be joined together, both rows are physically on the same AMP.
One or both tables are redistributed or the smaller table is duplicated across all AMPs
Based on the column for the join, the matching rows from different AMPs are brought together on the same AMP. A volatile table is used for this purposes. On all joins, the matching rows must be on the same AMP and hashing is used to bring the rows together.
If two tables were to be joined thousand times a day and given that there will be the same primary index on the PK/FK join condition, there will be no data movement because the rows will already be hashed to the same AMP. The EXPLAIN operator shows this as Row Hash Match join. In fact, it is preferable to have a where clause with the primary index where this is not the case. The volatile tables are usually deleted on logoff.
There are three types of temporary tables:
1)Derived Table
This exists only within a query and is materialized by the SELECT statement inside a query.
The space is used on the users spool space and it is deleted at the end of the query.
Derived tables are usually specified within a parenthesis in the SQL or with a WITH clause. Columns can be aliased and can default to normal columns. Most derived tables are used join to other tables.
2)The Volatile tables are created by the user and materialized with an INSERT/SELECT
The space is used on the users spool space and the table and data are deleted when the user logs off.
The ON COMMIT DELETE ROWS can be explicitly specified so that the volatile table is deleted at the end of the query.
The HELP command shows the volatile tables.
The volatile tables can have primary index such that when used for joins between tables, there is no movement of data.
Statistics can also be collected on volatile table so as to improve performance. For example, we can have this for all non-unique primary indexes (NuPI), unique primary indexes of small tables i.e with less than thousand rows per AMP, columns that appear frequently in WHERE search conditions and non-indexed columns used in joins, and partitioning column of a PPI table.
3)Then there are global temporary tables
Here the table definition is specified by the user and the table definition is permanent. It is materialized with an INSERT/SELECT.
The space is used on the user's TEMP space. When the user logs off the session, the data is deleted, but the table definition stays. This can be shared by many users who can then populate the same table but with their own copy.

Cluster computing

Saturday, January 18, 2014

No comments:

Post a Comment