Cluster computing

Sunday, April 21, 2019

The serialization and deserialization of sequences

Sequences can be stored as literals requiring no serialization and deserialization. However associated data structures such as B-Tree and Radix Tree can be serialized and deserialized. We look at some of the usages here.

Range of sequences are efficiently represented in B-Trees. This data is generally not exported and kept local to each node. It is easy to send the data across on the wire as full representation of each sequence with additional metadata. The size of the sequence on the wire does not matter when the clients can take as much latency as necessary to transmit the sequence.

When large sets of sequences need to be transferred, they can be compressed in archives and sent across the wire. This too does not have to treat each sequence separately. However, when the archives start exceeding a threshold in size, they are no longer efficient to scale to the number of sequences. In such cases, efficient packing and unpacking become necessary.

Serializing and deserializing of data does have to be per sequence but packing and unpacking of data does not. There are techniques beyond compression algorithms for shortening sequences with the help of representations that can significantly improve the space used. If we group the sequences, then it is easy for more efficient techniques for storage which work by removing redundant elements and encoding them in a way that they can be deconstructed later.

Cluster computing

Sunday, April 21, 2019

No comments:

Post a Comment