Cluster computing

Saturday, October 12, 2019

We were discussing the cache usage for stream access:

When the segments in the stream have skip level access by say 2,4,8 adjacent nodes, the cache can access the segments via skip levels and prefetch those that will be read. Skip level access on streams means that we are able to perform as fast as random access over sequential streams.

The cache may use indexes on locations to augment the deficiency of storing record locations in the stream store. This index is merely a translation of the sequential segment number from the stream store in terms of the leaps of the contiguous segments we need to make. And the best way to do that for that particular segment. Given a segment it’s sequential number from the start may be internal to the stream store. However, if that number is available from the stream store to be mapped with the segment whenever it is cached, then the translation of the location to the segment in terms of skip-level access is straightforward for example the number 63 from start, will require as many multiples of 8 less than target, same with multiples of 4 starting from the position left with the previous step, then multiples of 2 such that they are maximized in that order so that the overall count is least. This computation benefits in bringing ranges based on numbers alone rather than range indexes based on say BTree

Without the location available from the stream store some persistence is needed for the lookup of the segment number for the corresponding segment and usually involves an iteration of all the segments from the store. A hash of the segment may be sufficient for these lookups.

The hierarchical representation of stream segments may be facilitated with other data structures but they tend to centralize all operations. The purpose of skip level access is faster access on the same sequential access so that no other data structures are necessary

Another approach is to maintain the segment numbers on both sides. For example, the cache may have clients that read from the start of the stream up to a targeted segment. The stream store may perform repeated scans as it serialized the client’s accesses to the stream. The cache has the opportunity to bypass the stream store and alleviate the workload on the stream store by providing the segments that are most popular between accesses. As each client presents a target segment number and the stream store presents the segment numbers from the same or different stream, the cache has the opportunity to come up with segment numbers based on relevance via skip level access and priority via its eviction policy. The cache therefore becomes an intelligent agent that does away with redundant scans of streams by the store.

The overlapping interval range between segment numbers is decided by the cache which it uses with skip level access to fetch the segments. Efficient representation of such a range of segments is easy with the same logic as demonstrated for a targeted segment number. In this case, the same algorithm is repeated for begin and end of targeted range and the interval is represented in terms of the skip level access between the start and the end.

Friday, October 11, 2019

Using cache for stream access:

The cache for streams is a write through cache because the writes are at the end of the stream. These writes are hardly a concern for cache and their performance is mitigated by Bookkeeper. Batched writes from Bookkeeper are not different from a periodic backup schedule.
Read access patterns benefit from some organization of segments
When the segments in the stream have skip level access by say 2,4,8 adjacent nodes, the cache can access the segments via skip levels and prefetch those that will be read. Skip level access on streams means that we are able to perform as fast as random access over sequential streams.
The cache may use indexes on locations to augment the deficiency of storing record locations in the stream store. This index is merely a translation of the sequential segment number from the stream store in terms of the leaps of the contiguous segments we need to make. And the best way to do that for that particular segment. Given a segment it’s sequential number from the start may be internal to the stream store. However, if that number is available from the stream store to be mapped with the segment whenever it is cached, then the translation of the location to the segment in terms of skip-level access is straightforward for example the number 63 from start, will require as many multiples of 8 less than target, same with multiples of 4 starting from the position left with the previous step, then multiples of 2 such that they are maximized in that order so that the overall count is least. This computation benefits in bringing ranges based on numbers alone rather than range indexes based on say BTree
Without the location available from the stream store some persistence is needed for the lookup of the segment number for the corresponding segment and usually involves an iteration of all the segments from the store. A hash of the segment may be sufficient for these lookups.
The hierarchical representation of stream segments may be facilitated with other data structures but they tend to centralize all operations. The purpose of skip level access is faster access on the same sequential access so that no other data structures are necessary

Thursday, October 10, 2019

This is a continuation of the previous post to enumerate funny software engineering practice

90) Build a product that scales out to high workload but with more faults than before.

91) Build a product that is oblivious to market needs as the release cycle grows to years

92) Build a product that trades-off compute in favor of storage but leaves the onus of moving data to users

93) Build a product that trades-off storage for compute but never get exercised in routine activities

94) Build a product that requires proprietary protocol, format or content and have the ecosystem scratch their heads for integration

95) Build a product that does not work well with others because it does not provide a bridge or a connector

96) Build a product that is eager to monopolize the market rather than leaving space for better solutions to thrive without sacrificing the competence of the product.

97) Build a product that is measured by its revenue rather than the mindshare.

98) Build a product without embracing developers with attractive software developer kits or community editions

99) Build a product without embracing enticing developers with easy install lite editions for their developmental work

100) Build a product that does not allow a forum for ideas to be exchanged about the project and find that the home-grown ideas are not appealing enough.

101) Build a product that leaves out the details of the use cases which the customers would find most relevant to their needs and find out afterwards how much cost it would have saved.

102) Build a product that does not explain what it cannot do but keep finding that they try it anyway. .

103) Build a product with little or no support for administrators to configure against inappropriate usages.

104) Build a product that produces more power brokers than users.

105) Build a product that reimagines functionality tantalizing new users at the cost of habitual old users.

106) Build a product that styles the user interface so drastically it looks designer rather than commercial.

107) Build a product with translated text in local languages that reads funny.

108) Build a product that doesn’t take into consideration of reading right to left in certain regional languages.

109) Build a product that improves the quality but cuts down the features at crunch time

110) Build a product that starts to sound discrepant in the ads as releases slip.

111) Build a product that does not uninstall clean

112) Build a product that leaves artifacts even when the customer does not want it

#codingexercise
Void MinHeapify(List<Node> sorted, int i)
{
Sorted[i-1].left = ( 2 x i <= sorted.count ) ? sorted[2xi-1] : null;
Sorted[i-1].right = (2 x I + 1 <= sorted.count) ? sorted[2xi+1-1] : null;
}

Wednesday, October 9, 2019

This is a continuation of the previous post to enumerate funny software engineering practice

90) Build a product that scales out to high workload but with more faults than before.
91) Build a product that is oblivious to market needs as the release cycle grows to years
92) Build a product that trades-off compute in favor of storage but leaves the onus of moving data to users
93) Build a product that trades-off storage for compute but never get exercised in routine activities
94) Build a product that requires proprietary protocol, format or content and have the ecosystem scratch their heads for integration
95) Build a product that does not work well with others because it does not provide a bridge or a connector
96) Build a product that is eager to monopolize the market rather than leaving space for better solutions to thrive without sacrificing the competence of the product.
97) Build a product that is measured by its revenue rather than the mindshare.
98) Build a product without embracing developers with attractive software developer kits or community editions
99) Build a product without embracing enticing developers with easy install lite editions for their developmental work
100) Build a product that does not allow a forum for ideas to be exchanged about the project and find that the home-grown ideas are not appealing enough.
101) Build a product that leaves out the details of the use cases which the customers would find most relevant to their needs and find out afterwards how much cost it would have saved.
102) Build a product that does not explain what it cannot do but keep finding that they try it anyway. .
103) Build a product with little or no support for administrators to configure against inappropriate usages.
104) Build a product that produces more power brokers than users.
105) Build a product that reimagines functionality tantalizing new users at the cost of habitual old users.
106) Build a product that styles the user interface so drastically it looks designer rather than commercial.
107) Build a product with translated text in local languages that reads funny.
108) Build a product that doesn’t take into consideration of reading right to left in certain regional languages.
109) Build a product that improves the quality but cuts down the features at crunch time
110) Build a product that starts to sound discrepant in the ads as releases slip.

Tuesday, October 8, 2019

This post tries to discover the optimum access patterns for the stream storage. The equivalent for web accessible storage has proven to be most efficient with a batched consistently uniform periodic writes such as those for object storage. The streams are read from the beginning to the end and this has generally not been a problem because the workers doing the read can scale out with low overhead. The writes are always at the end. Temporary storage with the help of Bookkeeper and coordination with zookeeper helps with the writes to be append only

However statistics collected by Bookkeeper and Zookeeper can immensely improve the discoverability of the access pattern. These statistics are based on the table of stream and their accessors count with attributes. As the workload varies, it may show that some streams are read heavy such as for analysis and others are write heavy such as for IoT traffic. The statistics may also show that even some segments within the stream are more popular than others. While this may have any distribution, the idea that the heavily used segments can benefit with a cache is not disputed

The cache for streams is a write through cache because the writes are at the end of the stream. These writes are hardly a concern for cache and their performance is mitigated by Bookkeeper. Batched writes from Bookkeeper are not different from a periodic backup schedule.

The cache layer for stream store benefits from being close to the stream store. The segments can have a naming convention or a hash.

Reference to a case study for event stream :
https://1drv.ms/w/s!Ashlm-Nw-wnWu2hMRA7zp__mivoo

Monday, October 7, 2019

This is a continuation of the previous post to enumerate funny software engineering practice:

Build a product where users are quick to assume something but the product does another thing

Build a product where the defects are hidden in the form of caveats and workarounds

Build a product that scales out to high workload but with more faults than before.

Build a product that is oblivious to market needs as the release cycle grows to years

Build a product that trades-off compute in favor of storage but leaves the onus of moving data to users

Build a product that trades-off storage for compute but never get exercised in routine activities

Build a product that requires proprietary protocol, format or content and have the ecosystem scratch their heads for integration

Build a product that does not work well with others because it does not provide a bridge or a connector

Build a product that is eager to monopolize the market rather than leaving space for better solutions to thrive without sacrificing the competence of the product.

Build a product that is measured by its revenue rather than the mindshare.

Build a product without embracing developers with attractive software developer kits or community editions

Build a product without embracing enticing developers with easy install lite editions for their developmental work

Build a product that does not allow a forum for ideas to be exchanged about the project and find that the home-grown ideas are not appealing enough.

Build a product that does not allow a forum for ideas to be exchanged about the project and find that the home-grown ideas are not appealing enough.

Build a product that leaves out the details of the use cases which the customers would find most relevant to their needs and find out afterwards how much cost it would have saved.

Build a product that does not explain what it cannot do but keep finding that they try it anyway. .

Build a product with little or no support for administrators to configure against inappropriate usages.

Sunday, October 6, 2019

The Kubernetes Event generation, filtering and propagation considerations for component owners and destination architect:
1) A Kubernetes Label is what makes a regular K8s (short for Kubernetes) event a Selected event. This label is not something that the Selected team can do without. It is their criteria in their configmap with saying “matchOn” with a name. They cannot create alternate criteria that says matchOn level = critical or events with remedies. This is a label that they need the event generators to decide and add-on. Therefore, we must work with some of their limitations here. At the same time the event promotion rules is a good idea and having a set for criteria for the rules is helpful to anyone. If the criteria is specific to a component or to Selected, then its best to put it in their corresponding ConfigMap.
2) 1) does not mean we have to generate Selected events only. In fact, we should generate K8S events because with or without the criteria for Selected, this only improves the diagnostics of the component and it is something that only the component owners know best.
3) The advice for the component owner has been generate more K8s events. However, the Kubernetes also raised K8s events for almost all resources. Therefore, there should not be any overlap between the events raised by the Kubernetes by default and those from the component
4) In the past, we have discussed a layered approach to publishing K8s events, promoting K8s events to Selected events which in turn gets notified to Special events . There has been speculation about how Selected picks these events or should be doing so in the future. I suggest that picking and promoting events based on matchOn and rules is something that any layer can do. Therefore, it is better to separate the event generation from the event propagation. These operations are not the dedicated responsibility of any single component or layer but something that can be performed by each if they choose as such.
5) The event generation has also been different from different components but anything to do with the operational aspect is welcome. The difference has been to justify emitting events to authoring the resources with states so that Kubernetes automatically emits the events.
6) In some cases, the native K8s events don’t have enough information or message. They could be enhanced with identifiers and names that may assist SRS and other consumers by finding everything in the event itself.
7) The most important aspect for any specific component has been 5) and 6) above because it handles requests for all service instances and service bindings which are dynamic during the lifetime of the product and the source for most troubleshooting tickets.
8) As the service catalog does not capture the context in which these instances and bindings are created, we have an opportunity to improve these from this component.
9) Lastly, I want to add that the handlers of Special events are concerned about having few events rather than a flood of events. Unlike other products which don’t have infrastructure like Kubernetes, events can be numerous. Therefore, this restriction for number of events must be removed from both special events handlers and user interface otherwise the customers will not benefit as much from these events.