Cluster computing: July 2018

Tuesday, July 31, 2018

Let us take a break to discuss web application routes. In Java Spring application, we can use the @Path("/resource") identifier to define the routes. These paths become exceedingly easy to use since they are defined only once. The http/https traffic coming to the server may need to be routed to the service via nginx or other web configurations. Setting up traffic to the paths therefore easy. However, setting up access requires a filter to be invoked. This filter takes the username and the role and raises exception if the looked up role does not match the expected role. Typically this is the same criteria for all of the paths. Therefore applying the filter to the base class from which all other services derive may be sufficient for securing access to all the endpoints. There are annotations we can apply to individual paths to bless the role for that path
#codingexercise
Check if a number has the same count of set and unset bits.
boolean hasEqualSetAndUnsetBits(int padded)
{
int set = 0;
int len = 32; // or suitable fixed-length
while (padded)
{
if (padded & 0x1) {
set++;
}
padded = padded >> 1;
}
return (set == (len-set));
}
#codingexercise
Find the number of subsets with sum divisible by m in an array of positive integers.
Solution: We maintain a dp table of size sum x n where n is the number of elements in the array.
We initialize the dp array with 1 for sum 0 since it is always possible. For each of the elements by itself, we increment the dp table with the sum as that element. For all possible values from 1 to sum, if the current element had its dp value incremented for this value of sum, we increment the dp entry for the next element as well at this sum value. Similarly we also increment the dp value for the next element at the cumulated value of the sum and the current element. The dp table increment for the sum plus the element may need to be determined to be within the bounds.

When we fill the dp table this way, we just have to cumulate all those dp entries where the sum value is divisible by m.

Monday, July 30, 2018

Have you ever wondered whether we need to upgrade the format of our images from ,jpg to say something newer ? Images typically store RGB values per pixel. The file itself may have a small metadata attached that gives some information about the image. That's all we have to go with. Image processing algorithms try to use the data directly and work with it. This is very different from most metadata based query processing systems. Since the processing is based directly on the data, there is very little information to go with except the RGB value.
Instead if there can be an additional dimension to each pixel such that we not only store the color value but also depth information in the image and an irrefutable authority of the pixel, then we have a lot more to go by. Images taken first hand from devices might have this value. Images that are touched and saved again may have a different value. Recently, there have been newer algorithms that can detect whether there are tamperings to images. However, these can be significantly improved if there is additional data per pixel. Images like files may come with some metadata such as timestamps and versions but they could do a lot more such as publisher information, CRC check, timestamp, unique universal identifier and device to image mapping on the file. These then provide much more information to the image processing algorithms.
Storage has never been a concern for images. Although JPG formats are savvy about storage, devices and cameras have been pushing for higher and higher resolution. The only thing that has not improved on a per pixel basis is the RGB coding associated with the pixel. Instead if this could be a vector with other dimensions, an image may speak more eloquently than ever before. While we may not be able to change erstwhile file formats, we can certainly generate newer file formats. This means we continue to work with existing image formats without any disruption while leveraging the benefits from newer file format. I also want to separate the encoding from the data captured. The file format is essentially a manifestation of the data captured. In this regard, the JBIG and JBIG2 are file formats that encode the bi-level images in newer ways but. The providing of additional data is independent of the encoding. If we capture more information on the pixels, we can certainly improve not just the encoding, the processing but also the information that we can revisit later.
The class of operations on images usually work with pixel blocks. In the case of binary images, these are 3 x 3 blocks. Since we are looking to record information additionally per pixel, we can also progressively connect information between pixel groups at the time of storage. Processing offloaded to image capturing devices will result in massive information capture at little or no cost to capturing time. Thus the entire image processing field of study could benefit with more information from the images.
#codingexercise
Check if a number has the same count of set and unset bits.
boolean hasEqualSetAndUnsetBits(int n)
{
int set = 0;
int unset = 0;
while (n)
{
if (n & 0x1) {
set++;
} else {
unset++;
}
n = n >> 1;
}
return (set == unset);
}
If we were to start with fixed length binary representations then we only need to count one of them.

Sunday, July 29, 2018

Find the nearest next prime to a given integer:
for example: int n = 45, result = 47;
int n = 8, result = 11
int GetNearestNextPrime(int n)
{
int result = -1;
for (int I = n+1; I < INT_MAX; I++)
{
   if (isPrime(i)){
   result = I;
   break;
   }
}
return result;
}

bool isPrime(int n)
{
var primes = new List<int>();
for (int I = 2; I <= n; I++)
{
bool composite = false;
for (int k =0 ; k < primes.Count; k++)
{
   if (I % primes[k] == 0){
   composite = true;
   break;
}
}
   if (composite == false){
   primes.Add(I);
   }
}
return primes.Contains(n);
}

Another way is to use the sieve of Eratosthenes to filter out prime:

This is a very simple way to filter out the prime numbers upto n.

for i ranging from 2 to square root of n:

for j ranging from multiples of i upto n

mark the number as not prime

those that remain unmarked are the primes.

int count = 0;

for (int i = 0; i < primes.Count; i++)

for (int j = 0; j <= i; j++)

if (primes[i] != primes[j] && primes.Contains(primes[i] + primes[j]) && primes[i] + primes[j] <= n)

{

Console.WriteLine("{0},{1}", primes[j], primes[i]);

count++;

}

Console.Write("Count={0}", count);

Saturday, July 28, 2018

One of the benefits of using object storage is the ability to use namespaces, buckets and objects in a way that suits the requirement at hand. For example, if we want horizontal scalability, we provide a way to do so with objects. Since each of the objects can have a path specifier all the matching can be done based on path. There are only forward references from namespace to buckets to objects. Relationship between objects need not be found by traversal. These are mere storage entities. There is no restriction from keeping metadata per object or dedicating an object specifically to metadata. The latter is much more efficient for programs that want to maintain their own metadata for their objects so they are not limited to what they want to achieve with their objects. Even a simple adjacency matrix is sufficient to describe the graph overlaid on these objects. This means object storage can form the base of different query processing engines

#codingexercise
We were performing inversion count on an array of unsorted positive integers.
Given an array such as {1, 20, 6, 4, 5 } the number of inversions is 5. We could also count non-inversions by simply flipping the comparision
One way to do this is to count it by traversal with order of O(N^2)

Int GetCountNonInversions(List<int> A)

{

Int count = 0;

For (int I =0; I < A.Count(); i++)

For (int j = i+1; j < A.Count(); j++)

If (A[i] < A[j])

count++;

return count;

}

The same applies to merge-sort and AVL tree based methods:

Node Insert(ref Node root, Node z, ref count)
{
assert(z);
z.height = -1;
if (!root)
root = z;
else if (z.data < root.data)
root.left = Insert(root.left, z, ref count);
else
root.right = Insert(root.right, z, ref count);
count = count + root.right + 1

root.height = 1 + max(height(root.left), height(root.right));
Balance(root);
}

Node Balance(Node root)
{
FixHeight(root); // 1 more than left or right
if (Bfactor(root) == 2) // Bfactor is h(right)-h(left)
{
if (Bfactor(root.right) < 0)
root.right = RightRotate(root.right); // returns root.right.left
return LeftRotate(root);
}
if(BFactor(root) == -2)
{
if (Bfactor(root.left) > 0)
root.left = LeftRotate(root.left);
return RightRotate(root);
}
return root;
}

Balance is based on rotation after fixing height.

Friday, July 27, 2018

#codingexercise
Performing inversion count on an array of unsorted positive integers.
Given an array such as {1, 20, 6, 4, 5 } the number of inversions is 5.
One way to do this is to count it by traversal with order of O(N^2)

Int GetCountInversions(List<int> A)

{

Int count = 0;

For (int I =0; I < A.Count(); i++)

For (int j = i+1; j < A.Count(); j++)

If (A[i] > A[j])

count++;

return count;

}

Another way is to use merge sort where we used the merge of two sorted sub-arrays to discover the inversion count as those from individual sub arrays and that from the merge. The inversion count from the merge is merely comes with the benefit that the position in the sorted left sub-array for an inverted element as i contributes (mid-i) inversions.

Another way to do this would be to use a self-balancing AVL tree that keeps track of inversions for every node. The count is initialized to zero and as the elements are inserted into the AVL tree, the count is updated. The insertion operation also updates the result because the insertion requires traversal from root to leaf. The count is increased by 1 and the number of nodes in the right subtree of the current node which gives the inversions up to this element in the array from index 0 to current position.

Node Insert(ref Node root, Node z, ref count)
{
assert(z);
z.height = -1;
if (!root)
root = z;
else if (z.data < root.data)
root.left = Insert(root.left, z, ref count);
count = count + root.right + 1
else
root.right = Insert(root.right, z, ref count);

root.height = 1 + max(height(root.left), height(root.right));
Balance(root);
}

Node Balance(Node root)
{
FixHeight(root); // 1 more than left or right
if (Bfactor(root) == 2) // Bfactor is h(right)-h(left)
{
if (Bfactor(root.right) < 0)
root.right = RightRotate(root.right); // returns root.right.left
return LeftRotate(root);
}
if(BFactor(root) == -2)
{
if (Bfactor(root.left) > 0)
root.left = LeftRotate(root.left);
return RightRotate(root);
}
return root;
}

Balance is based on rotation after fixing height.

Thursday, July 26, 2018

We were discussing the use of object storage as a time series data store. The notion of buckets in a time-series database translates well to object storage. As one gets filled, data can start filling another. With the help of cold, warm and hot labels, it is easy to maintain progression of data. This data can then serve all the search queries over times series just like events in a time series database.

#codingexercise
Performing inversion count utilizing merge sort:

int GetCountInversionsByMergeSort(ref List<int> A, ref List<int> B, int left, int right)

{

int count = 0;

if (right > left) {

int mid = (right+left)/2;

count = GetCountInversionsByMergeSort(A, B, left, mid);

count += GetCountInversionByMergeSort(A, B, mid +1, right);

count += GetCountByMerge(A, B, left, mid+1, right);

}

return count;

}

Int GetCountByMerge(ref List<int> A, ref List<int> B, int left, int mid, int right)

{

int count = 0;

int I = left;

int j = mid;

int k = right;

while( (i<=mid-1) && (j <=right)){

if (A[i] <= A[j]) {

B[k] = A[i];

k++;

l++;

} else {

B[k] = A[j];

k++;

j++;

count = count + (mid-i);

}

While (I <= mid-1){

B[k] = A[i];

k++;

i++;

}

While(j <= right) {

B[k] = A[j];

j++;

k++;

}

For (int m = left; m <=right; m++)

A[m] = B[m];

return count;

}