Cluster computing

Wednesday, December 25, 2013

We mentioned the types of the distributed data warehouses in the earlier post. We will look into it now. First, the data warehouse may be affected by many different development efforts and these call for the data architect to manage the changes. For example, the development efforts could be based on different lines of products which are unintegrated or the same warehouse could be distributed based on geography such as north, south, east or west. There could also be different levels of data within the same data warehouse such as lightly summarized data , detailed data and OLAP that are build by different groups. Even the detailed non-distributed part of the same data warehouse could be build by different groups. The unintegrated lines of business pose little or no conflicts however that is rare in the cases of data warehouses. The second case of multiple data warehouses across locations is more common. Different groups want to isolate their development effort often at the risk of repeating efforts across groups. The third case of building multiple levels of data simultaneously is much easier to manage than either of the two earlier cases. However, due to the difference in the levels, there are different uses and expecations. The fourth case requires the most attention from the data warehouse architect. This is because the non-distributed warehouse is being built like slices of a pie and they all need to come together. The architect is responsible for consistency and growth of the data warehouse. Another approach different from the above is to develop completely independent warehouses. They are integrated by a common minimal corporate warehouse . The corporate warehouse also requires metadata just like the others but its simply not tied to the others since there is no business integration.
Now we will look at building the business in multiple levels. Here the main tool for the architect is the definitions for interconnectivity between the levels. And the primary data model is driven by the group that is building the current level of detail because they get to use the data warehoused data model. Interconnectivity addresses compatibility of access at the call level. And the current level of detail must be sufficient to generate the lightly summarized data. The co-ordination proceeds with agreements either informal or formal and time-sequenced such that there is no team waiting for data that is not made available yet.
In the case where there are multiple groups forming current level of detail, if the data is mutually exclusive between the groups, then there is little or no coordination. Otherwise the overlap can create redundancies and cost. Further, this redundancy introduces inconsistencies and the so-called spider web into the DSS environment. Therefore, a common data model needs to be established. With the separation of a common versus local data models, there is no redundancy across the local groups.
The same technology can be used for all the data models. Another strategy could be to use different platforms for different types of data found at the detailed level. There are data transfers between platforms and the boundary between technologies may be crossed many times. In all these cases, metadata sits on top of all these data.

Cluster computing

Wednesday, December 25, 2013

No comments:

Post a Comment