Cluster computing

Tuesday, December 17, 2013

In today's blog post, we continue on our discussion with data warehouse. We mentioned that the warehouse is built iteratively This has been proved correct by the industry track record, because the end user unable to give all the requirements up front, management doesn't make a commitment until there are some results, and visible results must be seen quickly.
We will now see the role of data model in the iterative development. The data warehouse serves as a roadmap for each of the development efforts. The data model plays a critical factor in these efforts. First the data model tells what needs to be done.
Second the data model tells how one development effort is to be integrated with the other.
The development efforts are like puzzle pieces and the data model makes sure the efforts fall in place.
If the data model were absent, many of the efforts would not fit well with others and there would be overlap and redundancies.
The output of a data model is a set of tables in fact a lot of small tables with little data. If a program has to go for several tables it has to interconnect with them.
A better approach would be to physically merge some of the tables so that minimal I/O is consumed.
Merging tables is one approach. Another approach is to create an array of data. An array of data lets the sequences reside in different locations and it would be easier to directly access the resource. For example creating an array by month is easy. It might be interesting to note that sometime redundant data is inserted. For example, if a base parts table has a non-redundant field call description then several parts table have to access the base the parts table for that field. This is expensive. Instead if the description field is redundantly placed in many tables, its access is independent and less expensive. Note however, this is for the access to the tables.
Likewise if there are other fields in the table that is not as frequently or more frequently used, then they can be stored in separate tables such that the most frequent ones are by themselves.
Another technique is to use a derived data with the physical database design. For example, if we calculate the annual taxes at the year-end every year, we can store the calculated data. Then all subsequent access can use the calculated field.
Another technique is to use a creative index or a creative profile. This type of creative index is created as data is passed from operational to warehouse environment. Since the data is accessed unit by unit, the index creation is easy, inexpensive and based on users' interests.
Finally, a referential integrity can be used to establish relationships between databases. Each solution can be tested with a few use cases such as name, dept .

Cluster computing

Tuesday, December 17, 2013

No comments:

Post a Comment