Cluster computing

Monday, October 14, 2019

This is a continuation of the earlier posts to enumerate funny software engineering practice:

110) Build a product that starts to sound discrepant in the ads as releases slip.
111) Build a product that does not uninstall clean
112) Build a product that leaves artifacts even when the customer does not want it
113) Build a product that makes it annoying for the customers to switch out components
114) Build a product that improves security with a trade-off on convenience.
115) Build a product that provides inadequate segregation of user activities or secrets letting others to see the actions taken.
116) Build a product that provides that behaves well in development but fail in production because of deficiencies in system resources.
117) Build a product that is considered golden only to find that the customer has been using an incorrect version
118) Build a product that takes a sweeping approach of solving problems by updating versions while not sufficiently resolving special cases.
119) Build a product that takes does not have sufficient levers to resolve small customer cases.
120) Build a product that takes a lot of effort involved in explaining workarounds when a feature could be added.
121) Build a product where the cost of servicing overwhelms the amortized cost of purchase.
122) Build a product where the customers are enticed to subscription model or cloud tenancy only to discover the total cost to be significantly high for developmental activities
123) Build a product where the customers have to deal with two problems simultaneously: on-premise as well as cloud encumbrances
124) Build a product that requires subscription to services and use the legal language in the contract to extort payments
125) Build a product that requires processing power that is only available on desktops.
126) Build a product that is not elastic I’m storage, compute and networking only to be surprised by crowd peak traffic
127) Build a product that has little or no configuration option for sources of input prompting dedicated instances for different use cases
128) Build a product that reveals hidden bugs after warranty period
129) Build a product where the maker funds white hat discovery of vulnerabilities which prompt the customer to upgrade
130) Build a product with changing goals for v1 while the market tide would have lifted any v1 for immediate business need and with a compelling v2 to establish business.

Sunday, October 13, 2019

The benefits of Async stream readers in cache for stream storage:
The cache essentially projects portions of the stream so that the entire stream does not have to scanned from begin to end repeatedly. This mode of operation is different from finding popular stream segments across stream readers.
The cache may use a queue to hold contiguous segments from a stream.
The choice of queue is extremely important for peak throughput. The use of a lock free data structure as opposed to an ArrayBlockingQueue can do away with lock contention. The queues work satisfactorily until they are full. Garbage free Async stream readers have the best response time.
The benefits here are similar to LogAppenders except that they are on write path whole these are in read path.
Asynchronous read together with lock free access and skip level access boost performance. The cache may encounter significant size for each segment and a writer that transfers a segment over the network may take time in the order of hundreds of milliseconds. Instead, having a continuous background import for adjustable window of steam segments tremendously improves the load on the stream store while responding to seek requests on segments

Please refer to the discussion on caching context for stream managers with:

https://1drv.ms/w/s!Ashlm-Nw-wnWvBXkU9jz_Z2EXWLp

and implementation where it is applicable with: https://github.com/ravibeta/JavaSamples/tree/master/logging

Saturday, October 12, 2019

We were discussing the cache usage for stream access:

When the segments in the stream have skip level access by say 2,4,8 adjacent nodes, the cache can access the segments via skip levels and prefetch those that will be read. Skip level access on streams means that we are able to perform as fast as random access over sequential streams.

The cache may use indexes on locations to augment the deficiency of storing record locations in the stream store. This index is merely a translation of the sequential segment number from the stream store in terms of the leaps of the contiguous segments we need to make. And the best way to do that for that particular segment. Given a segment it’s sequential number from the start may be internal to the stream store. However, if that number is available from the stream store to be mapped with the segment whenever it is cached, then the translation of the location to the segment in terms of skip-level access is straightforward for example the number 63 from start, will require as many multiples of 8 less than target, same with multiples of 4 starting from the position left with the previous step, then multiples of 2 such that they are maximized in that order so that the overall count is least. This computation benefits in bringing ranges based on numbers alone rather than range indexes based on say BTree

Without the location available from the stream store some persistence is needed for the lookup of the segment number for the corresponding segment and usually involves an iteration of all the segments from the store. A hash of the segment may be sufficient for these lookups.

The hierarchical representation of stream segments may be facilitated with other data structures but they tend to centralize all operations. The purpose of skip level access is faster access on the same sequential access so that no other data structures are necessary

Another approach is to maintain the segment numbers on both sides. For example, the cache may have clients that read from the start of the stream up to a targeted segment. The stream store may perform repeated scans as it serialized the client’s accesses to the stream. The cache has the opportunity to bypass the stream store and alleviate the workload on the stream store by providing the segments that are most popular between accesses. As each client presents a target segment number and the stream store presents the segment numbers from the same or different stream, the cache has the opportunity to come up with segment numbers based on relevance via skip level access and priority via its eviction policy. The cache therefore becomes an intelligent agent that does away with redundant scans of streams by the store.

The overlapping interval range between segment numbers is decided by the cache which it uses with skip level access to fetch the segments. Efficient representation of such a range of segments is easy with the same logic as demonstrated for a targeted segment number. In this case, the same algorithm is repeated for begin and end of targeted range and the interval is represented in terms of the skip level access between the start and the end.

Friday, October 11, 2019

Using cache for stream access:

The cache for streams is a write through cache because the writes are at the end of the stream. These writes are hardly a concern for cache and their performance is mitigated by Bookkeeper. Batched writes from Bookkeeper are not different from a periodic backup schedule.
Read access patterns benefit from some organization of segments
When the segments in the stream have skip level access by say 2,4,8 adjacent nodes, the cache can access the segments via skip levels and prefetch those that will be read. Skip level access on streams means that we are able to perform as fast as random access over sequential streams.
The cache may use indexes on locations to augment the deficiency of storing record locations in the stream store. This index is merely a translation of the sequential segment number from the stream store in terms of the leaps of the contiguous segments we need to make. And the best way to do that for that particular segment. Given a segment it’s sequential number from the start may be internal to the stream store. However, if that number is available from the stream store to be mapped with the segment whenever it is cached, then the translation of the location to the segment in terms of skip-level access is straightforward for example the number 63 from start, will require as many multiples of 8 less than target, same with multiples of 4 starting from the position left with the previous step, then multiples of 2 such that they are maximized in that order so that the overall count is least. This computation benefits in bringing ranges based on numbers alone rather than range indexes based on say BTree
Without the location available from the stream store some persistence is needed for the lookup of the segment number for the corresponding segment and usually involves an iteration of all the segments from the store. A hash of the segment may be sufficient for these lookups.
The hierarchical representation of stream segments may be facilitated with other data structures but they tend to centralize all operations. The purpose of skip level access is faster access on the same sequential access so that no other data structures are necessary

Thursday, October 10, 2019

This is a continuation of the previous post to enumerate funny software engineering practice

90) Build a product that scales out to high workload but with more faults than before.

91) Build a product that is oblivious to market needs as the release cycle grows to years

92) Build a product that trades-off compute in favor of storage but leaves the onus of moving data to users

93) Build a product that trades-off storage for compute but never get exercised in routine activities

94) Build a product that requires proprietary protocol, format or content and have the ecosystem scratch their heads for integration

95) Build a product that does not work well with others because it does not provide a bridge or a connector

96) Build a product that is eager to monopolize the market rather than leaving space for better solutions to thrive without sacrificing the competence of the product.

97) Build a product that is measured by its revenue rather than the mindshare.

98) Build a product without embracing developers with attractive software developer kits or community editions

99) Build a product without embracing enticing developers with easy install lite editions for their developmental work

100) Build a product that does not allow a forum for ideas to be exchanged about the project and find that the home-grown ideas are not appealing enough.

101) Build a product that leaves out the details of the use cases which the customers would find most relevant to their needs and find out afterwards how much cost it would have saved.

102) Build a product that does not explain what it cannot do but keep finding that they try it anyway. .

103) Build a product with little or no support for administrators to configure against inappropriate usages.

104) Build a product that produces more power brokers than users.

105) Build a product that reimagines functionality tantalizing new users at the cost of habitual old users.

106) Build a product that styles the user interface so drastically it looks designer rather than commercial.

107) Build a product with translated text in local languages that reads funny.

108) Build a product that doesn’t take into consideration of reading right to left in certain regional languages.

109) Build a product that improves the quality but cuts down the features at crunch time

110) Build a product that starts to sound discrepant in the ads as releases slip.

111) Build a product that does not uninstall clean

112) Build a product that leaves artifacts even when the customer does not want it

#codingexercise
Void MinHeapify(List<Node> sorted, int i)
{
Sorted[i-1].left = ( 2 x i <= sorted.count ) ? sorted[2xi-1] : null;
Sorted[i-1].right = (2 x I + 1 <= sorted.count) ? sorted[2xi+1-1] : null;
}

Wednesday, October 9, 2019

This is a continuation of the previous post to enumerate funny software engineering practice

90) Build a product that scales out to high workload but with more faults than before.
91) Build a product that is oblivious to market needs as the release cycle grows to years
92) Build a product that trades-off compute in favor of storage but leaves the onus of moving data to users
93) Build a product that trades-off storage for compute but never get exercised in routine activities
94) Build a product that requires proprietary protocol, format or content and have the ecosystem scratch their heads for integration
95) Build a product that does not work well with others because it does not provide a bridge or a connector
96) Build a product that is eager to monopolize the market rather than leaving space for better solutions to thrive without sacrificing the competence of the product.
97) Build a product that is measured by its revenue rather than the mindshare.
98) Build a product without embracing developers with attractive software developer kits or community editions
99) Build a product without embracing enticing developers with easy install lite editions for their developmental work
100) Build a product that does not allow a forum for ideas to be exchanged about the project and find that the home-grown ideas are not appealing enough.
101) Build a product that leaves out the details of the use cases which the customers would find most relevant to their needs and find out afterwards how much cost it would have saved.
102) Build a product that does not explain what it cannot do but keep finding that they try it anyway. .
103) Build a product with little or no support for administrators to configure against inappropriate usages.
104) Build a product that produces more power brokers than users.
105) Build a product that reimagines functionality tantalizing new users at the cost of habitual old users.
106) Build a product that styles the user interface so drastically it looks designer rather than commercial.
107) Build a product with translated text in local languages that reads funny.
108) Build a product that doesn’t take into consideration of reading right to left in certain regional languages.
109) Build a product that improves the quality but cuts down the features at crunch time
110) Build a product that starts to sound discrepant in the ads as releases slip.

Tuesday, October 8, 2019

This post tries to discover the optimum access patterns for the stream storage. The equivalent for web accessible storage has proven to be most efficient with a batched consistently uniform periodic writes such as those for object storage. The streams are read from the beginning to the end and this has generally not been a problem because the workers doing the read can scale out with low overhead. The writes are always at the end. Temporary storage with the help of Bookkeeper and coordination with zookeeper helps with the writes to be append only

However statistics collected by Bookkeeper and Zookeeper can immensely improve the discoverability of the access pattern. These statistics are based on the table of stream and their accessors count with attributes. As the workload varies, it may show that some streams are read heavy such as for analysis and others are write heavy such as for IoT traffic. The statistics may also show that even some segments within the stream are more popular than others. While this may have any distribution, the idea that the heavily used segments can benefit with a cache is not disputed

The cache for streams is a write through cache because the writes are at the end of the stream. These writes are hardly a concern for cache and their performance is mitigated by Bookkeeper. Batched writes from Bookkeeper are not different from a periodic backup schedule.

The cache layer for stream store benefits from being close to the stream store. The segments can have a naming convention or a hash.

Reference to a case study for event stream :
https://1drv.ms/w/s!Ashlm-Nw-wnWu2hMRA7zp__mivoo