Friday, October 12, 2018

If the cache is distributed, the performance analysis may need to find if any of the distributed hashing leads to nodes that are overloaded. These nodes can further be expanded by the addition of new nodes. Performance analysis in a distributed framework is slightly more involved than on a single server because there is a level of indirection. Such study must not only make sure that the objects are cached satisfactorily at the local level but also that they participate in the global statistics.
Statistics cannot be at the object level alone. We need running counters of hits and misses across objects. These may be aggregated from all the nodes in a distributed hash table. Some like to view this independent of the network between the nodes. For example, they take a global view regardless of the distribution of the actual objects. As long as we can cumulate the hits and misses on per object level and across objects in a global view, the networking does not matter at this level. Although, nodes are expected to have uniform distribution of objects, they may get improperly balanced at which point the network level statistics become helpful.
The performance measurements of a cache can be done in a test lab by simulating the workload from a production system. This requires just the signature of the workload in terms of the object accesses and everything else can be isolated from the production system. The capture of the workload is not at all hard because we only want the distribution, duration, type and size of accesses. The content itself does not matter as much as it does to applications and users. If we know the kind of object accesses done by the workloads, we know what the cache is subjected to in production. Then we can artificially generate as many objects and their access as necessary and it would not matter to the test because it would not change the duress on the cache. We can also maintain different t-shirt size artificial workloads to study the impact on the cache. Some people raise the concern that a system in theory may not work as well in practice but in this case, when we keep all the parameters the same, there is very little that can deviate the results from the theory. The difference between the lab and the production can be tightened so thin that we can assert that the production will meet the need of the workloads after they have been studied in the lab. Many organizations take this approach even for off-the shelf software because they don’t want to exercises anything in production. Moreover, access to the production system is very restricted because it involves many perspectives and not just performance. Compliance, regulations, auditing and other such concerns require that the production system is hardened beyond development and test access.  Imagine if you were gain to access to all the documents of the users using the production cache as a development or test representative of the organization. Even the pipelines feeding into the production are maintained with multiple stages of vetting so that we minimize the pollution to the production where it becomes necessary to revert to a previous version. Productions systems are also the assets of a company that represent the cumulative efforts of the entire organization if not the company itself. It becomes necessary to guard it more than others. Moreover, there is only one production system as opposed to multiple test and development environments.
#codingexercise
We were discussing subset sum yesterday. If we treat every element of the array as candidate for subset evaluation as above, we can find all this elements that satisfy the subset property. Since each iteration has overlaps with the previous iteration we can maintain a dp table of whether an element has a subset sum or product property. If the current element  has subset each of which has a value in the integer dp table corresponding to the number of ways of forming the subset, we can aggregate it for the current element.
The memoization merely helps to not redo the same calculations again.

No comments:

Post a Comment