Since we discussed building the data warehouse, we will now discuss the migration to this environment and the ease of use. We mentioned earlier that this is a step by step activity. Most successful implementations have been built one iteration at a time. The size and the speed of the iterative development is important and the results must be delivered quickly.
But Migrations is different from the iterative development. Many migrations have failed and hence the book mentions about a generic migration plan that has worked in many cases.
This migration plan begins with a corporate data model. This data model includes major subjects, their definitions, relationships, and groupings.
Note that this model excludes the derived data or DSS data because they are subject to change and are temporal.
After the corporate data model and the mid-level models are in place, the next activity is defining the system of record. The system of record is defined in terms of the corporations existing systems. It represents the best data in terms of completeness, timeliness and accuracy.
Next the challenges in bringing the data into the data warehouse are identified.
These could include the following:
if the data arrives in large volumes, then the arrival rate has to be used to handle the influx of data
data that is voluminous is also summarized, aggregated or partitioned.
data may have to be extracted using the interfaces. Special transformation that include integration, alteration of time-basis, condensation and scanning may need to be done.
If the data modeling activity has been done properly, the design of the data warehouse is fairly simple. When the design is complete, the warehouse is organized by subject area.
Once the interfaces have been built, the first subject area is populated. The data is captured and transported into the warehouse environment. Sometimes only a fraction of data is required.
End users work by discovering the interfaces and then using it to populate the data.
Often the first population of the data is incorrect but it gives many of the corrections needed to populate it correctly going forward.
the population and the feedback processes continue afterwards.
By adding a lag time, the data warehouse developer serves the DSS needs of the company. This lag time is usually 24 hours.
Sometimes there is call for rapid population of warehouse which is referred to as active data warehousing since it supports a small amount of online access processing.
But Migrations is different from the iterative development. Many migrations have failed and hence the book mentions about a generic migration plan that has worked in many cases.
This migration plan begins with a corporate data model. This data model includes major subjects, their definitions, relationships, and groupings.
Note that this model excludes the derived data or DSS data because they are subject to change and are temporal.
After the corporate data model and the mid-level models are in place, the next activity is defining the system of record. The system of record is defined in terms of the corporations existing systems. It represents the best data in terms of completeness, timeliness and accuracy.
Next the challenges in bringing the data into the data warehouse are identified.
These could include the following:
if the data arrives in large volumes, then the arrival rate has to be used to handle the influx of data
data that is voluminous is also summarized, aggregated or partitioned.
data may have to be extracted using the interfaces. Special transformation that include integration, alteration of time-basis, condensation and scanning may need to be done.
If the data modeling activity has been done properly, the design of the data warehouse is fairly simple. When the design is complete, the warehouse is organized by subject area.
Once the interfaces have been built, the first subject area is populated. The data is captured and transported into the warehouse environment. Sometimes only a fraction of data is required.
End users work by discovering the interfaces and then using it to populate the data.
Often the first population of the data is incorrect but it gives many of the corrections needed to populate it correctly going forward.
the population and the feedback processes continue afterwards.
By adding a lag time, the data warehouse developer serves the DSS needs of the company. This lag time is usually 24 hours.
Sometimes there is call for rapid population of warehouse which is referred to as active data warehousing since it supports a small amount of online access processing.
No comments:
Post a Comment