Data Warehouse has special needs so the DBMS have also evolved to include data warehouse-specific features. Instead of transaction processing, data warehouse calls for load-and-access processing. Data is integrated, transformed and loaded into the warehouse from the operational environments and ODS. Data warehouse does not have updates. Instead data is stored in a series of snapshot records. When there is a change, a new snapshot record is added, rather than an update being done.
There are different storage media required for a data warehouse often used in a dual environment where one processing environment is the DASD environment where online interactive processing is done and the other is the tape or mass store where the data is accumulated.
The DASD environment could use a different vendor from the mass store environment. It's also possible that the DASD environment is split over more than one vendors. As long as these are deliberate instead of political or historical, these should not pose issues.
When comparing operational environment to data warehouse environment, the role of metadata is also very different. For example, in the operational environment, the audience for the metadata is the IT professional and he/she is usually savvy about computers whereas in the data warehouse the metadata is used by DSS analyst who could do with as much help as possible and hence the use for metadata services.
In the data warehouse, the metadata tracks the significant transformations to the data as it passes from operational to data warehouse environments.
Moreover, data in a data warehouse exists for a lengthy span of time and the warehouse changes its structure in this time. Keeping track of this change in structure is a natural task for the metadata. In the operational environment on the other hand, the metadata keeps track of one and only one correct definition of the structure of the data.
Also, operational systems focus on the current data as in the current balance or the current inventory, however trends are not available from current data. They become apparent only when there is data accumulated over time. For the span of time that this data is accumulated, a new dimension by name data context becomes important. This data context is all the context information such as changes in the data source etc. that may explain the discrepancies in the trends on the mere data accumulated over time.
Three levels of context information must be maintained - simple, complex and external.
Simple contextual information relates to the basic structure of the data itself and includes structure, encoding, naming conventions, and metrics. Complex contextual information includes such things as product definitions, marketing territories, pricing, packaging etc. External contextual information includes such things as inflation, financial trends, taxation and economic growth. Complex and external contextual information can vary a lot and harder to keep in a structure.
In the past, collecting contextual information has been difficult because the audience was different, the interest was passive, the resources had vanished, and even when it was collected, it was restricted to simple contextual information.
There are different storage media required for a data warehouse often used in a dual environment where one processing environment is the DASD environment where online interactive processing is done and the other is the tape or mass store where the data is accumulated.
The DASD environment could use a different vendor from the mass store environment. It's also possible that the DASD environment is split over more than one vendors. As long as these are deliberate instead of political or historical, these should not pose issues.
When comparing operational environment to data warehouse environment, the role of metadata is also very different. For example, in the operational environment, the audience for the metadata is the IT professional and he/she is usually savvy about computers whereas in the data warehouse the metadata is used by DSS analyst who could do with as much help as possible and hence the use for metadata services.
In the data warehouse, the metadata tracks the significant transformations to the data as it passes from operational to data warehouse environments.
Moreover, data in a data warehouse exists for a lengthy span of time and the warehouse changes its structure in this time. Keeping track of this change in structure is a natural task for the metadata. In the operational environment on the other hand, the metadata keeps track of one and only one correct definition of the structure of the data.
Also, operational systems focus on the current data as in the current balance or the current inventory, however trends are not available from current data. They become apparent only when there is data accumulated over time. For the span of time that this data is accumulated, a new dimension by name data context becomes important. This data context is all the context information such as changes in the data source etc. that may explain the discrepancies in the trends on the mere data accumulated over time.
Three levels of context information must be maintained - simple, complex and external.
Simple contextual information relates to the basic structure of the data itself and includes structure, encoding, naming conventions, and metrics. Complex contextual information includes such things as product definitions, marketing territories, pricing, packaging etc. External contextual information includes such things as inflation, financial trends, taxation and economic growth. Complex and external contextual information can vary a lot and harder to keep in a structure.
In the past, collecting contextual information has been difficult because the audience was different, the interest was passive, the resources had vanished, and even when it was collected, it was restricted to simple contextual information.
No comments:
Post a Comment