Cluster computing

Tuesday, January 14, 2014

In this post we talk about Teradata - it's architecture and SQL as read from the book by Tom Coffing and William Coffing. Teradata is about parallel processing. With each query, a single step is performed in parallel by each AMP. A Teradata system consists of a series of AMPs that work in parallel. Teradata systems usually involve several AMPs for greater ROI. They usually start small and grow.
Nexus is the tool used to load the logical model and display the tables and views visually. It has a Super Join Builder that guides joins and builds the SQL on the joins specified. Nexus works with Teradata and Hadoop.
Data Models are inexpensive to be tested and altered. ERWin from CA is a data modeling tool. It allows you to model your tables, views and the physical model. When the data model is shared with the business users, there is more insight and its use is facilitated with the tool building the SQL.
Teradata's advantages include using Massively Parallel Processing (MPP) to analyze very large data.
In a table of trillion rows, Teradata can find a single record in under a second with just one AMP. Teradata has sophisticated query traffic control with rules, times, delays, green lights to query and red lights to wait. Longer duration and volume queries are not placed in the same category as critical queries.
Teradata gives users their own views of the query processing with Viewpoint.
The Teradata environment enables a variety of queries from large massive joins, sub second tactical queries, partitioned tables for range queries, columnar designs for specific columns combined with secondary indexes, joins and hash indexes etc.
The rows of data are spread across the AMPs so that each user query can be executed on similar number of rows by each AMP. In parallel processing we distribute the work for each AMP. Communication is done over BYNET. The Parsing Engine (PE) takes the Users' SQL and builds a plan for each AMP.
All Teradata tables spread their rows equally across the AMPs. Teradata systems can add AMPs for linear scalability. Teradata can scale to arbitrary size.
AMPs and PEs are called virtual processors. Each SMP node has both processors and typically many of both. SMP stands for symmetric multiprocessing. Each node is attached via a network to a disk farm. Each AMP in the node is assigned its own virtual disk.
When nodes are connected to the BYNETs, they become part of one large MPP system. Each node addition brings in additional PEs, AMPs, and disks.
Nodes, disks and BYNETs are usually kept inside what is called a Cabinet. Gateway and Channel drive software runs as processes One controls the communication with the LAN and the other connects the mainframe. The Access module processors (AMPs) and PEs live in memory and are controlled by a Parallel Database Extension (PDE) usually running on a Linux operating system. While BYNETs enable communication between nodes, there's also a boardless BYNET within the nodes. PEs and AMPs can communicate over this boardless BYNET. Each PE can manage upto 120 individual sessions. When a user logs into a Teradata, a PE will log them and manage the session. The PE is also responsible to check the SQL syntax, creates the EXPLAIN, check the security and builds a plan. The PE uses the COLLECTED STATISTICS to build the best plan. The PE delivers the final answer set to the user.

Cluster computing

Tuesday, January 14, 2014

No comments:

Post a Comment