Cluster computing

Monday, December 30, 2013

We talk about the differences between the relational and the star join models as the basis for data warehouse design. The first difference is in terms of flexibility and performance. The relational model has high flexibility but is not optimized for performance. The multidimensional model targets a user community and hence is more performant. When new requirements are added to the star join model, the optimizations for performance are moot. When the relational model is used, there is no particular optimization for performance one way or another but the data is stored at the lowest level of granularity. If performance is required by the data model, it is better to pull data out of the relational and then optimize access for performance.
The relational model is shaped by a pure data model while the multidimensional model is stored by processing requirements which leads to several consequences. The first consequence is in terms of serviceability. As discussed, this is more flexible and performant when the data is accessed indirectly. The multidimensional model is good for direct access of data. The second consequence is in terms of level of abstraction and origins. The relational model organizes data at a higher level of abstraction and can work with different usages while the dimensional model uses the processes to suit only some workflows.
The relational model supports shaping and reshaping of many different models of data. For example, merging is easy because data is stored at the most granular or normalized levels. With this granularity, it is easy to meet future unknown needs because it is easy to combine the atomic data into more meaningful representations when the need becomes known. There is yet another advantage of the relational model which is the ability to change gracefully. The relational model is designed to be used in an indirect fashion. This means that there is a lag between the direct uses of data and the changes to the data.
The multidimensional model is aligned with the what is referred to as the independent data mart approach. A data mart is a data structure that is dedicated to serving the analytical needs of one group of people. With independent data marts, its easier and less expensive to build systems for the growth of the organization's data. A dependent data mart on the other hand is built from the data coming from the warehouse. The dependent data mart does not depend on legacy or operational data for its source. It depends on only the data warehouse for its source of data. The dependent data mart requires fore thought and investment.
However independent data marts as they crop up also tend to have redundancies between the systems With the progression of time, this data may be unincorporated, uncorrelated and even worse un-synchronized. There is an exponential redundancy in data. The momentum for the independent data mart is now fully established. This affects the long term perspective and maintenance.Consider the warehouse on the other hand. The data would be more reusable, limited number of interface programs and reconcile-ability. Further the architectural problems from the independent data marts would not have arisen.
Hence the recommendation against the independent data marts and in favor of a data warehouse albeit the challenges encountered.

Cluster computing

Monday, December 30, 2013

No comments:

Post a Comment