Cluster computing

Friday, December 20, 2013

Managing volume in a data warehouse
We discussed profile records in a previous post and they come in handy to manage volume. The detailed records are reduced to a single profile record in the warehouse and as such are useful to reduce volume. This comes at a loss of detail but it isn't a bad thing as long as the DSS analysts gets what he wants from the summary. What goes into a profile record may need to be iteratively worked out with DSS analysts.
Another approach used together with the iterative approach is to create another level of historical data. An alternated detail is not designed to be used frequently. This additional level may not be used frequently but it comes in useful to save the architect whenever additional fields are demanded.
Multiple profile records could also be created from the same detail depending on the subject. The metadata records written for profile records are similar to those written for single activity snapshots.These records are generated usually on the operational server and require sorting and merging of data.
In general, data flows forward from operational systems to warehouse. In rare cases, its possible to flow the data backwards.
When data from a data warehouse is required by the operational systems, it is directly queried by the applications. Alternatively, the results of the query could be cached and the application can make do with the results of the query. Such direct queries are typically are responded in time that may take say 24 hours and further the requested data should be small and not in the order of MB or GB. There must be compatibility between the operational and warehouse systems for direct query and results may need to be transformed.
Indirect access of the data warehouse, on the other hand, is entertained. For example in an airline reservation system where an airline employee talks to a travel agent and has a window of 2-3 minutes to complete the order, the flight date and flight status need to be looked up from historical information. This flight status file is created periodically by reading historical data. It then becomes easier for the employee to read the current bookings and flight status table to facilitate the transaction.
Another example for the indirect use of data warehouse in operational environment is credit scoring in banking. When a customer applies for a loan, there's quite a bit of background check involved and this requires extensive historical data such as account history, payment history, job history, salary history, asset management history, etc. To satisfy most customers in short time, a program can periodically run and produce a file for pre-qualifications that has customer identification, approved credit limit, and specially approved limit. When the customer applies for and gets a loan, if the loan is within the approved limit, it is granted.

Cluster computing

Friday, December 20, 2013

No comments:

Post a Comment