Cluster computing

Monday, December 16, 2013

In data warehouses, exploration and data mining are used to analyze masses of historical data and to discover patterns of unknown business activity. The data warehouse contains cleansed, integrated and organized data.
Data warehouse can be designed in a hybrid form to create what is known as a living sample database which is especially useful when the data has grown very large. The data in this database could be true archival data or lightly summarized data. Since its a subset of the original data warehouse, it needs to be periodically refreshed. This is also not a general purpose database and they are useful only for analysis or to find trends. Queries that can't be run over the full database can only be run on this database.
The selection of data for this database is usually random but in some cases, a "judgement sample" is taken in which the records meet a certain criteria.
This translates to improved productivity for a DSS analyst with reduced time for turnarounds.
A second major issue for the data warehouse is partitioning. Partitioning of data refers to the breakup of data into separate physical units that can be handled independently.
Proper partitioning helps the warehouse in the following ways:
Loading data
Accessing data
Archiving data
Deleting data
Monitoring data
and Storing data
Independently managed partitions of data are portable for different processing environments. When data is monolithic, it becomes harder to restructure, index, sequentially scan, reorganize, recover and monitor.
Flexible access of data is a key design goal and partitions help with that goal. Data can be partitioned in many different ways such as by date, by line of business, by geography, by organizational unit, and all of the above. The choices for partitioning data is dependent on the developer.
Partitioning can be done in many ways. Partition can be done at the system level or at the application level. If the partitioning is at the system level, the DBMS requires that there is a single definition of data. If the partitioning is at the application level, the data can be moved around between environments. One test for a good partitioning of data is to see if an index can be added without major restructuring. or hampering of operations.

Cluster computing

Monday, December 16, 2013

No comments:

Post a Comment