Data Warehouses are built for a wide variety of systems.However each datacenter maintains a structure of data called a snapshot. This has four components - a key, a timestamp, a primary data that relates only to the key, secondary data captured as part of the snapshot process with no relationship to the primary.
The key identifies the record and the primary data.
The unit of time refers to the moment where the event being described by the snapshot has occurred.
The primary data is the non-key data
The secondary data is circumstantial data and is typically used with DSS processing
A relationship between the primary and the secondary can be inferred because they are adjacent. This is called an artifact.
We now discuss metadata.Metadata is very useful to describe the data in the warehouse. It acts like an index and it stores the structure of the data as known to the programmer and DSS analyst, a hint for the source data, the data model, and a history of extracts.
Reference Data is another component commonly used with data warehouses. This keeps track of the referential integrity. Reference data tends to be ignored with data accumulation however it should be made time variant just like any other part of the warehouse.
There are two different designs for reference data.
The first approach requires taking a snapshot of the reference data every six months. This approach is quite simple but its not sufficient because it might miss the activity between snapshots.
The second approach keeps track of all the activities against the reference table.
The actual implementation might be something in between the first and the second approach.
There is another important factor pervasive in the data warehouse. This is the length of time a change of data in the operational environment takes to be reflected in the data warehouse and is referred to as the cyclicity of data.
The data warehouse contains historical information. The changes in the operational system should take at least 24 hours to show up in the warehouse. This introduced delay is referred to as the wrinkle of time.
A wrinkle of time smaller than 24 hours is expensive to do but the goal is to not do the operational changes in the warehouse and vice versa.
The key identifies the record and the primary data.
The unit of time refers to the moment where the event being described by the snapshot has occurred.
The primary data is the non-key data
The secondary data is circumstantial data and is typically used with DSS processing
A relationship between the primary and the secondary can be inferred because they are adjacent. This is called an artifact.
We now discuss metadata.Metadata is very useful to describe the data in the warehouse. It acts like an index and it stores the structure of the data as known to the programmer and DSS analyst, a hint for the source data, the data model, and a history of extracts.
Reference Data is another component commonly used with data warehouses. This keeps track of the referential integrity. Reference data tends to be ignored with data accumulation however it should be made time variant just like any other part of the warehouse.
There are two different designs for reference data.
The first approach requires taking a snapshot of the reference data every six months. This approach is quite simple but its not sufficient because it might miss the activity between snapshots.
The second approach keeps track of all the activities against the reference table.
The actual implementation might be something in between the first and the second approach.
There is another important factor pervasive in the data warehouse. This is the length of time a change of data in the operational environment takes to be reflected in the data warehouse and is referred to as the cyclicity of data.
The data warehouse contains historical information. The changes in the operational system should take at least 24 hours to show up in the warehouse. This introduced delay is referred to as the wrinkle of time.
A wrinkle of time smaller than 24 hours is expensive to do but the goal is to not do the operational changes in the warehouse and vice versa.
No comments:
Post a Comment