Monday, April 13, 2020

Data traffic generators:
Storage products require a few tools to test how the products behave under load and duress. These tools require varying types of load to be generated for read and write. Standard random string generators can be used to create such data to store in files, blobs or streams given a specific size of content to be generated.  The tool has to decide what kind of payload aka events to generate and employ different algorithms to come up with such load.
These algorithms can be enumerated as:
1) Constant size data traffic: The reads and writes are of uniform size and they are generated in burst mode where a number of packets follow in quick succession filling the pipeline between the source and destination.
2) Hybrid size data traffic: The constant size events are still generated but there are more than one constant size generators for different sizes and the events generated from different constant size generators are serialized to fill the data pipeline between the source and the destination. The different size generators can be predetermined for t-shirt size classification.
3) Constant size with latency: There is a delay introduced between events so that the data does not arrive at predictable times. The delays need not all be uniform and can be for random duration. While 1) allows spatial distribution of data, 3) allows temporal distribution of data.
4) Hybrid size with latency: There is a delay introduced between events from different generators as it fills the pipeline leading to both the size and the delay to vary randomly simulating the real-world case for data traffic. While 2) allows spatial distribution of data, 4) allows temporal distribution of data.
The distribution of size or delay can use a normal distribution which leads to the middle values of the range to occur somewhat more frequently than the outliers and a comfortable range can be picked for both the size and the delay to vary. Each event generator implements it strategy and the generators can be switched independently by the writer so that different loads are generated. The tool may run forever which means they do not need to stop unless interrupted. 
The marketplace for tools already has quite a few examples for this kind of load generation and are referred to as packet generators, T-Rex for data traffic, or torture tools for driving certain file system protocols. Most of these tools however do not have an independent or offloaded load generator and are tied to the tool and purpose they are applied for limiting their usage or portability to other applications
One of the best advantages of separating event generation into its own library is that they can be used in conjunction with a log appender so that the target can vary at runtime. The target can be console if the data merely needs to appear on the screen without any persistence, or it can be a file, blob or stream. The appender also allows events to be written simultaneously to different targets leading to a directory of different sized files or a bucket full of different sized objects and so on. This allows other tools to work in tandem with the event generators as upstream and downstream systems. For example, duplicity may take the events generated as input for subsequent data transfer from a source to destination.
Sample code for event generator is included here: https://github.com/ravibeta/JavaSamples/tree/master/EventGenerator

No comments:

Post a Comment