Cluster computing

Thursday, July 18, 2013

This post talks about some business intelligence practices :

OLTP and OLAP processing have different systems due to several differences:

The first difference is that an online transaction processing is one in which the response time is critical and touches existing data. The Online Analytical Processing on the other hand is based on data accumulated over time such as for decision support.

The second difference is that the latter require a schema where they have a star or multi-level star or snowflake like design.

The third difference is that the latter require very different data structures such as bitmaps and conjunctive filters while the former uses B+ trees.

The fourth difference is that the databases in the latter required fast load of large data periodically.

The fifth difference is that the joins on the data were very costly to perform and the views need to be captured and persisted till the next update.

OLAP systems could be implemented with a CIF architecture or a Star-Kimball model:

The CIF architecture uses Entity Relationships and database normalization rules to build a normalized data model in its data warehouse. In the Star-Kimball model, a dimensional model is used where the transactions are split into facts or dimensions and arranged in a star like schema and if they are hierarchical in a multi-level star like schema.

The advantage of the dimensional model is that it is easy to understand and use. The joins are simpler and the users don’t need to know the source of the data or the data structure but can work off materialized views although the operating systems have to handle the complex transformations to maintain the dimensions.

The advantage with the normalization model is that it holds a lot of information and is flexible to changing business needs.

A day to day task may involve the task of preventing orphaned data. We prevent this by adding constraints and checks to the data. For example in a table that maintains employees and their managers, the column to denote the manager points back to another employee in the table. For the top of the organization, there is usually no manager. We represent this by making him his own manager. This is useful now because none of the employees can be orphaned when all the data in the manager column is checked for consistency. In general, we use constraints, checks, primary and foreign keys, surrogates and natural keys to enforce integrity of the data.

Cluster computing

Thursday, July 18, 2013

No comments:

Post a Comment