Saturday, June 2, 2018

Recently I came across an interesting calculation problem and I want to share it in this blog post.
Let us say we are given a series of byte range sequences in the form of <offset, length> pair which have been used up. We now want to calculate the used bytes size.
How do we do this ?
There may be overlap in the byte range sequences.
One approach is that we calculate positive summations and negative summations.
For example, if we have
offset 5, length 10
offset 10, length 10
then the used space size is 10 + 10 - overlap  = 20 - 5 = 15
The idea of using summations is that we can leverage the summation model as a neat aggregation that works in both batch and stream processing.
All we need to do is make an entry for (offset 10, length -5)
As long as these entries are made in the system, the query processing becomes much simpler.
Let us now consider another approach, where we try to find the used space size without making negative references.
We calculate it on the fly. For example, we take overlapping intervals and translate them to non-overlapping contiguous intervals.
One example of this is to adjust the above as follows:
offset 5, length 5
offset 10, length 10
In this case we still keep the summation as a useful aggregation in the query but now we have to edit the original entries in the table.
Let us take a third approach and shift the emphasis on querying without modifying the entries.
Here we sort the byte ranges by offsets and as we encounter subsequent entries with overlaps, we massage them into non-overlapping ranges to cumulate the used space
For example,
LIst<Interval> merge(List<Interval> slices)
{
Var s = new Stack<Interval> ();
Slices.sort(); // by starting time
If (slices.count == 0) return new List<Interval>{Interval(0,0)};
If (slices.count == 1) return new List<Interval>{slices[0]};
S.push(slices[0]);
For(int i =1; i < slices.count; i++)
{
  var top = slices.top();
  If (top.end()< slices[i].start)
      S.push(slices[i]);
  Else{
     Top.end = slices[i].end;
      S.push(top);
}
Return s.toList():
}
offset 5, length 10
offset 10, length 10
implies offset 5, length 15

offset 5, length 10
offset 20, length 5
implies same





No comments:

Post a Comment