Cluster computing

Sunday, October 27, 2019

The benefits of Async stream readers in cache for stream storage:
The cache essentially projects portions of the stream so that the entire stream does not have to scanned from begin to end repeatedly. This mode of operation is different from finding popular stream segments across stream readers.
The cache may use a queue to hold contiguous segments from a stream.
The choice of queue is extremely important for peak throughput. The use of a lock free data structure as opposed to an ArrayBlockingQueue can do away with lock contention. The queues work satisfactorily until they are full. Garbage free Async stream readers have the best response time.
The benefits here are similar to LogAppenders except that they are on write path whole these are in read path.
Asynchronous read together with lock free access and skip level access boost performance. The cache may encounter significant size for each segment and a writer that transfers a segment over the network may take time in the order of hundreds of milliseconds. Instead, having a continuous background import for adjustable window of steam segments tremendously improves the load on the stream store while responding to seek requests on segments
Another approach is to have a continuous reader read the stream from beginning to end in a loop and have the cache decide which of the segments it needs to read. This approach requires the cache to discard segments that are not necessary for the clients. The cache can fork as many readers as necessary to replenish the window of segments it needs. The cache can decide which windows if segments it needs based on overlapping intervals of segments requested by clients. These segments are no longer bound to clients and they are managed exclusively by the cache.
The skip level access continues to serve the readers in this case although it doesn’t need to read the streams directly. The readers present the segments and the cache then uses its statistics and algorithm to prefetch the segments that will serve the clients best. These cache entries can be evicted by any algorithm that suits the cache. Storage is not a concern for the cache because we have distributed ring cache with consistency check points
The maximum number of segments in a window is directly dependent on the cache. The segment size may even be a constant in which case the number of segments depends on how many streams and how many windows per stream the cache wants to accommodate. This window size is not the same as a continuously sliding window used to analyze the stream which is usually one per stream.
When there is only one reader to replenish the segments for the cache, the clients are freed from performing the same task over and over again on the actual stream. The segments that the client need will most likely be found in the cache. If the segments are not found, the cache can fetch it from the stream on demand. Otherwise the scheduled reading from the stream by the cache will replenish it. The cache itself alleviates the load from the stream store.
The cache can also fork multiple readers at fixed duration offsets from the start of the first. Multiple readers at fixed periodic intervals from the start of the first will continue to read the steam segments and reduce the latency in replenishing the segments that the cache needs. When the cache advertises it needs segment number n, the readers that have not yet read n will read and make it available. With multiple readers, the chance that one of them is closest to segment n is higher.
There are ways in which the cache can become smart about loading segments using these readers. For example, instead of waiting for the nearest reader to the segment to read it into the cache, the cache can have a dedicated reader fetch the segment or a set of segments via skip-level access.
There are also techniques to make read and writes to go together. Typically reads and writes are not in the same stream and the cache behaves as a write-through, hence the clients can interact directly with the cache rather than the store. When a segment is not found in the cache, it is also efficiently handled with skip level access.

When readers read a stream from beginning to end, they become easy to scale. This is simple strategy and works well for parallelizing computations. However, it introduces latency as streams can be quite long.

A cache served to provide the stream segments do that clients don’t have to go all the way to the store. It could employ a group of readers that could replenish the segments that are in most demand. There is always the option for clients to reach the store if the cache does not have what it needs. Generally s cache will bring in the segment on behalf of the client if it goes not have it.

The techniques for providing stream segments do not matter to the client and the cache can use any algorithm. The cache also provides the benefits of alleviating load from the stream store without any additional constraints. In fact the cache will also use the same stream reader as the client and with the only difference that there will be fewer stream readers on the stream store than before.
We have not compared this cache layer with a message queue server but there are interesting problems common to both. For example, we have a multiple consumer single producer pattern in the periodic reads from the stream storage. The message queue server or broker enables this kind of publisher-subscriber pattern with retries and dead letter queue. In addition, it journals the messages for review later. Messaging protocols are taken up a notch in performance with the use of a message queue broker and their reliance on sockets with steroids. This leaves the interaction between the caches and the storage to be handled elegantly with well-known messaging framework. The message broker inherently comes with a scheduler to perform repeated tasks across publishers. Hence it is easy for the message queue server to perform as an orchestrator between the cache and the storage, leaving the cache to focus exclusively on the cache strategy suitable to the workloads. Journaling of messages also helps with diagnosis and replay and probably there is no better store for these messages than the object storage itself. Since the broker operates in a cluster mode, it can scale to as many caches as available. Moreover, the journaling is not necessarily available with all the messaging protocols which counts as one of the advantages of using a message broker. Aside from the queues dedicated to handle the backup of objects from cache to storage, the message broker is also uniquely positioned to provide differentiated treatment to the queues. This introduction of quality of service levels expands the ability of the solution to meet varying and extreme workloads The message queue server is not only a nice to have feature but also a necessity and a convenience when we have a distributed cache to work with the stream storage.
The number of workers for the cache or the store does not matter and they can scale.
Please refer to the discussion on caching context for stream managers
https://1drv.ms/w/s!Ashlm-Nw-wnWvBXkU9jz_Z2EXWLp

and sample implementation with: https://github.com/ravibeta/JavaSamples/tree/master/logging

Saturday, October 26, 2019

This is a continuation of the earlier posts to enumerate funny aspects of software engineering practice :

195) Build a product that does not keep history of all actions, assets and activities only to find significant effort in rebuilding it.

196) Build a product that does not maintain registry for all users and find that there is no blacklist or whitelist capability

197) Build a product that assumes secure communication via tunnels and find that the tunnels need not be continuous through a proxy.

198) Build a product that does not gain statistics of its usage and find that the pain grows among users

199) Build a product that has no telemetry and not have any insight into how customers are doing with the product.

200) Build a product that has built in dial home capabilities and find that privacy gurus will find it suspect.

201) Build a product that has little or no test tools shipped with its product and find that customers are knocking

202) Build a product that leverages APIs for all its workflows and find that the product becomes part of more and more workflows.

203) Build a product with enough bread crumbs and sitemap to let the user navigate the pages and see that the usability goes up and so does the customer appeal.

204) Build a product that ships on mobile devices and find the audience skyrocket

205) Build a product with rich mobile enhanced experience across a variety of devices and find that the customers are able to use the product even before going to sleep

206) Build a product with mobile integration of payment methods such as personal wallet and the chore of paying becomes smooth.

207) Build a product that can use digital cards and refill them at will and the product becomes usable at all participating stores increasing adoption via partner networks

208) Build a product with partner sponsored incentives to customers and see the appeal grow as partner network endears itself to the customer more than it would have individually

209) Build a product with little or no ecosystem and the product fails to strike popularity in conferences, exhibitions and symposiums

210) Build a product with little user education and find that the sales booths at an exhibition are not visited.
211) Build a product with very little wording on consequences and see the amazement when users click a button.
212) Build a product that does not warn on consequences and find that the user has just deleted his data.
213) Build a product that does not take any notes the user wants to leave and find that the user does not remember why he took some actions
214) Build a product that does lets users erase their activities and find no record to come to their aid
215) Build a product that works with some users and not with others and find that they have to scratch their heads to find out why.

Friday, October 25, 2019

This is a continuation of the earlier posts to enumerate funny aspects of software engineering practice :

190) Build a product with ambitious charter and find the release timelines reducing the plan.
191) Build a product that does not preserve data between restarts, retries or disaster recovery
192) Build a product that does not retain user settings only to have the user apply them again
193) Build a product by putting together functionality via dependencies and find that the massive surface area attracts all kinds of vulnerabilities.
194) Build a product that creates sdks, command-line clients and other artifacts and find that the versioning for each becomes a maintenance issue.
195) Build a product that does not keep history of all actions, assets and activities only to find significant effort in rebuilding it.
196) Build a product that does not maintain registry for all users and find that there is no blacklist or whitelist capability
197) Build a product that assumes secure communication via tunnels and find that the tunnels need not be continuous through a proxy.
198) Build a product that does not gain statistics of its usage and find that the pain grows among users
199) Build a product that has no telemetry and not have any insight into how customers are doing with the product.
200) Build a product that has built in dial home capabilities and find that privacy gurus will find it suspect.
201) Build a product that has little or no test tools shipped with its product and find that customers are knocking
202) Build a product that leverages APIs for all its workflows and find that the product becomes part of more and more workflows.
203) Build a product with enough bread crumbs and sitemap to let the user navigate the pages and see that the usability goes up and so does the customer appeal.
204) Build a product that ships on mobile devices and find the audience skyrocket
205) Build a product with rich mobile enhanced experience across a variety of devices and find that the customers are able to use the product even before going to sleep
206) Build a product with mobile integration of payment methods such as personal wallet and the chore of paying becomes smooth.
207) Build a product that can use digital cards and refill them at will and the product becomes usable at all participating stores increasing adoption via partner networks
208) Build a product with partner sponsored incentives to customers and see the appeal grow as partner network endears itself to the customer more than it would have individually
209) Build a product with little or no ecosystem and the product fails to strike popularity in conferences, exhibitions and symposiums
210) Build a product with little user education and find that the sales booths at an exhibition are not visited.

Thursday, October 24, 2019

Discussion on cache for streams continued:
A cache served to provide the stream segments do that clients don’t have to go all the way to the store. It could employ a group of readers that could replenish the segments that are in most demand. There is always the option for clients to reach the store if the cache does not have what it needs. Generally s cache will bring in the segment on behalf of the client if it goes not have it.

The techniques for providing stream segments do not matter to the client and the cache can use any algorithm. The cache also provides the benefits of alleviating load from the stream store without any additional constraints. In fact the cache will also use the same stream reader as the client and with the only difference that there will be fewer stream readers on the stream store than before.
We have not compared this cache layer with a message queue server but there are interesting problems common to both. For example, we have a multiple consumer single producer pattern in the periodic reads from the stream storage. The message queue server or broker enables this kind of publisher-subscriber pattern with retries and dead letter queue. In addition, it journals the messages for review later. Messaging protocols are taken up a notch in performance with the use of a message queue broker and their reliance on sockets with steroids. This leaves the interaction between the caches and the storage to be handled elegantly with well-known messaging framework. The message broker inherently comes with a scheduler to perform repeated tasks across publishers. Hence it is easy for the message queue server to perform as an orchestrator between the cache and the storage, leaving the cache to focus exclusively on the cache strategy suitable to the workloads. Journaling of messages also helps with diagnosis and replay and probably there is no better store for these messages than the object storage itself. Since the broker operates in a cluster mode, it can scale to as many caches as available. Moreover, the journaling is not necessarily available with all the messaging protocols which counts as one of the advantages of using a message broker. Aside from the queues dedicated to handle the backup of objects from cache to storage, the message broker is also uniquely positioned to provide differentiated treatment to the queues. This introduction of quality of service levels expands the ability of the solution to meet varying and extreme workloads The message queue server is not only a nice to have feature but also a necessity and a convenience when we have a distributed cache to work with the stream storage.
The number of workers for the cache or the store does not matter and they can scale.
Please refer to the discussion on caching context for stream managers
https://1drv.ms/w/s!Ashlm-Nw-wnWvBXkU9jz_Z2EXWLp

Wednesday, October 23, 2019

Using caches for stream readers:
When readers read a stream from beginning to end, they become easy to scale. This is simple strategy and works well for parallelizing computations. However, it introduces latency as streams can be quite long.
A cache served to provide the stream segments do that clients font have to go all the way to the store. It could employ a group of readers that could replenish the segments that are in most demand. There is always the option for clients to reach the store if the cache does not have what it needs. Generally s cache will bring in the segment on behalf of the client if it goes not have it.
The techniques for providing stream segments do not matter to the client and the cache can use any algorithm. The cache also provides the benefits of alleviating load from the stream store without any additional constraints. In fact the cache will also use the same stream reader as the client and with the only difference that there will be fewer stream readers on the stream store than before.
The cache essentially projects portions of the stream so that the entire stream does not have to scanned from begin to end repeatedly. This mode of operation is different from finding popular stream segments across stream readers.
The cache may use a queue to hold contiguous segments from a stream.
The choice of queue is extremely important for peak throughput. The use of a lock free data structure as opposed to an ArrayBlockingQueue can do away with lock contention. The queues work satisfactorily until they are full. Garbage free Async stream readers have the best response time.
The benefits here are similar to LogAppenders except that they are on write path whole these are in read path.
Asynchronous read together with lock free access and skip level access boost performance. The cache may encounter significant size for each segment and a writer that transfers a segment over the network may take time in the order of hundreds of milliseconds. Instead, having a continuous background import for adjustable window of steam segments tremendously improves the load on the stream store while responding to seek requests on segments

Tuesday, October 22, 2019

This is a continuation of the earlier posts to enumerate funny software engineering practice:

180) Build a product that tries to be ahead of the game in emerging trends and find that a significant legacy from partners still requires older technology integration.
181) Build a product that tries to squeeze different channels into the same stream
182) Build a product that tries to distribute a stream to separate channels when the user is expecting one
183) Build a product that tries to promote a partner application from the same maker to the deteriment of competitors
184) Build a product that tries to emulate open source and invite the world to build with it duing its early stages only to find it shutting doors in its growth stage with bars for acceptance
185) Build a product that does not choose partners to foster growth and find that it cannot sustain innovation by itself
186) Build a product that gives users a tonne of information to go through when they want a summary
187) Build a product to implement a use case and leave it for solution integrators to find bugs
188) Build a product to satisfy some customers only to irk others
189) Build a product with over generalization when some customization would have served neatly
190) Build a product with ambitious charter and find the release timelines reducing the plan.
191) Build a product that does not preserve data between restarts, retries or disaster recovery
192) Build a product that does not retain user settings only to have the user apply them again
193) Build a product by putting together functionality via dependencies and find that the massive surface area attracts all kinds of vulnerabilities.
194) Build a product that creates sdks, command-line clients and other artifacts and find that the versioning for each becomes a maintenance issue.
195) Build a product that does not keep history of all actions, assets and activities only to find significant effort in rebuilding it.
196) Build a product that does not maintain registry for all users and find that there is no blacklist or whitelist capability
197) Build a product that assumes secure comunication via tunnels and find that the tunnels need not be continuous through a proxy.
198) Build a product that does not gain statistics of its usage and find that the pain grows among users
199) Build a product that has no telemetry and not have any insight into how customers are doing with the product.
200) Build a product that has built in dial home capabilities and find that privacy gurus will find it suspect.

Monday, October 21, 2019

This is a continuation of the earlier posts to enumerate funny software engineering practice:

170) Build a product that does not allow users to explore broad or deep without administrator privilege and find users dissatisfied.
171) Build a product that does not allow authoring policies and find that the usage of the product is almost a chaos.
172) Build a product that does not allow differentiation to users either with policies authored by the administrator or out of the box and find that 10% of the users may matter more than the remaining 90%.
173) Build a product that does not let frequent users form their own customizations or dashboards and find that the product is pushed behind those that can.
174) Build a product that does not allow hands off operations with alerts and notifications and find that the product is not up for renewal.
175) Build a product that does not show captivating charts and graphs and the users migrate to applications that do.
176) Build a product that does not allow users to track their activities especially on shared resources and have the administrator be flooded with calls for lost or missing resources.
177) Build a product that does not allow operations to be distributed and have the planners complain about capabilities.
178) Build a product that does not protect data at rest or in transit and have all the watchdogs show up at the door.
179) Build a product that fails to meet the industry compliance and government regulations and find that the product cannot be sold in certain regions.
180) Build a product that tries to be ahead of the game in emerging trends and find that a significant legacy from partners still requires older technology integration.
181) Build a product that tries to squeeze different channels into the same stream
182) Build a product that tries to distribute a stream to separate channels when the user is expecting one
183) Build a product that tries to promote a partner application from the same maker to the deteriment of competitors
184) Build a product that tries to emulate open source and invite the world to build with it duing its early stages only to find it shutting doors in its growth stage with bars for acceptance
185) Build a product that does not choose partners to foster growth and find that it cannot sustain innovation by itself
186) Build a product that gives users a tonne of information to go through when they want a summary
187) Build a product to implement a use case and leave it for solution integrators to find bugs
188) Build a product to satisfy some customers only to irk others
189) Build a product with over generalization when some customization would have served neatly
190) Build a product with ambitious charter and find the release timelines reducing the plan.