This post talks about some business intelligence practices :
OLTP and OLAP processing have different systems due to several differences:
The first difference is that an
online transaction processing is one in which the response time is critical and
touches existing data. The Online
Analytical Processing on the other hand is based on data accumulated over time
such as for decision support.
The second difference is that the
latter require a schema where they have a star or multi-level star or snowflake
like design.
The third difference is that the
latter require very different data structures such as bitmaps and conjunctive
filters while the former uses B+ trees.
The fourth difference is that the
databases in the latter required fast load of large data periodically.
The fifth difference is that the
joins on the data were very costly to perform and the views need to be captured
and persisted till the next update.
OLAP systems could be implemented with a CIF architecture or a Star-Kimball model:
The CIF architecture uses Entity
Relationships and database normalization rules to build a normalized data model
in its data warehouse. In the Star-Kimball model, a dimensional model is used
where the transactions are split into facts or dimensions and arranged in a
star like schema and if they are hierarchical in a multi-level star like
schema.
The advantage of the dimensional
model is that it is easy to understand and use. The joins are simpler and the
users don’t need to know the source of the data or the data structure but can
work off materialized views although the operating systems have to handle the
complex transformations to maintain the dimensions.
The advantage with the
normalization model is that it holds a lot of information and is flexible to
changing business needs.
A day to day task may involve the task of preventing orphaned data. We prevent this by adding constraints and checks to the data. For
example in a table that maintains employees and their managers, the column to
denote the manager points back to another employee in the table. For the top of
the organization, there is usually no manager. We represent this by making him
his own manager. This is useful now
because none of the employees can be orphaned when all the data in the manager
column is checked for consistency. In general, we use constraints, checks,
primary and foreign keys, surrogates and natural keys to enforce integrity of
the data.
No comments:
Post a Comment