Cluster computing

Wednesday, July 6, 2016

#codingexercise

Remove duplicate characters from a list

void removeDuplicates(List<char> input)

{

return input.distinct();

}

void removeDuplicatesInPlace(ref List<char> input)

{

for(int i = 0; i < input.Count; i++)

{

int rep = nextindex(input, i);

while (rep != -1)

{

shift_chars_left_and_end_with_null(ref input, rep);

rep = nextindex(input,i);

}

}
void removeDupsSorted(ref List<char> str)
{
int result = 1;
int index = 1;
If(str.count ==0) return;
while (str[index])
{
if (str[index] != str[index-1])
{
str[result] = str[index];
result++;
}
index++;
}
str[result] = '/0';
}

We were discussing deduplication with data differencing:

Let us introduce the operator d for deduplication

If the original file can be represented by alpha, the deduplication results in

Beta = d (alpha)

After a point of time when alpha has undergone edits to become alpha'

the deduplication results in

Beta' = d(alpha')

Let the differencing operator be delta

Differencing now proceeds with delta(Beta' - Beta)

It can be shown that delta operation is lossless. This follows from the definition for deduplication being lossless in that the original file Alpha can be reconstructed with alpha = d-inverse(beta) where d-inverse is the reconstitution function.

We don't attempt to generate all possible sequences with manipulation of every byte of a full length segment because that can be of the order of 255 power 2048. It's not the size that is the deterrent for such a lookup table because that lookup table can become universal but the fact that most occurrences of real data are going to be in a fairly small subset with large variations between consecutive edits over time. Instead we rely on previously encountered patterns that are repeated.

Coming back to the discussion on lossless operation, the differencing operation can detect all the differences between the original files using their deduplicated transformations because the segments into which the files are divided are each read for comparision without omission.

Furthermore, the differencing proceeds segment by segment just as it would if the original file was read in byte ranges.

Now let us consider that we can generate the differencing codes by separating the original dataset into two equal sized groups x and y for which we compute two sets of differencing coefficients k and r. The purpose of separating them into two new groups of data fragments is that it helps generates the differences both globally as well as locally so that even if we consider some of the data fragments as missing and skip over these, the result is sufficient to helps reconstruct the target. In a way this works similar to error correction codes in that the transformations ensure that some of the original data can be tolerated as missing or deliberately skipped. Again 'data' here refers to deduplicated segments and not the actual file content. The idea is that we can aid differencing by tolerating upto m number of skipped differencing and use these to skip differencing of data fragments which we may guess to be low-entropy.

#codingexercise
static List<int> Distinct(this List<int> nums)
{
var h = new HashSet();
for (int i = 0; i < nums.Count; i++)
{
if (h.Contains(nums[i]) == False)
h.Add(nums[i]);
}
return h.ToList();
}

Tuesday, July 5, 2016

#codingexercise
Find the longest repeated substring in a string.
We build a suffix tree to keep track of substrings encountered so far
int suffixesleft = 0;
int leafEnd = -1;
lastNewNode = null;
void extendSuffixTree(int pos)
{
leafEnd = pos;
suffixesleft ++;
lastNewNode = null;
while (suffixesleft > 0){
if (activeLength == 0)
activeLength = pos;
if (activeNode.children[text[activeEdge]] == NULL)
activeNode.children[text[activeEdge]] = newNode(pos, &leafEnd);
if (lastNewNode != null){
lastNewNode.suffixLink = activeNode;
lastNewNode = null;
}
}
else{
var next = activeNode.children[text[activeEdge]];
if (walkDown(next)) continue;
if (text[next.start + activeLength] == text[pos])
{
if (lastNewNode != null && activeNode != root)
{
lastNewNode.suffixLink = activeNode;
lastNewNode = null;
}
activeLength++;
break;
}
var splitEnd = next.start + activeLength -1;
splitEnd = next.start + activeLength -1;
var split = newNode(next.start, splitEnd);
activeNode.children[text[activeEdge]] = split;
split.children[text[pos]] = newNode(pos, ref leafEnd);
next.start += activeLength;
split.children[text[next.start]] = next;
if (lastNewNode != null){
lastNewNode.suffixLink = split;
}
lastNewNode = split;
}
suffixesleft--;
if (activeNode == root && activeLength > 0)
{
activeLength--;
activeEdge = pos - remainingSuffixCount + 1;
}else if (activeNode != root){
activeNode = activeNode.suffixLink;
}
}
}

Build a binary search tree from postorder traversal
node treebuilder(List<int> post, ref int index, int key, int min, int max, int size)
{
if (index < 0)
return null;
node root = null;
if (key > min && key < max)
{
root = newNode(key);
index = index - 1;
if (index > 0)
{
root.right = treebuilder(post, index, post[index], key, max, size);
root.left = treebuilder(post, index, post[index], min, key, size);
}
}
return root;
}
courtesy : geeksforgeeks

Given a set of integers, negative as well as non negative, rearrange them such that negative and non negative integers are at alternate positions.
void rearrange(ref List<int> nums)
{
int neg = -1;
int pos = -1;
for (int i =0; i < nums.count; i++)
{
if (i%2 == 0){
neg = get_next_neg(nums, i);
if (neg != -1)
{
temp = nums[neg];
shift_right(nums, i, neg);
nums[i] = temp;
}else{
break;
}
} else{
pos = get_next_pos(nums, i);
if (pos != -1)
{
temp = nums[pos];
shift_right(nums, i, pos);
nums[i] = temp;
}else{
break;
}
}
}
}

int binary_search(List<int> nums, int start, int end, int val)

{

int mid = (start + end)/2;

if (nums[mid] == val) return mid;

if (start == end && nums[mid] != val) return -1;

if (nums[mid] < val)

return binary_search(nums, mid+1, end, val);

else

return binary_search(nums, start, mid, val);

}
Check whether any permutation of the inout string is a palindrome.
Bool isPalindrome(strimg input)
{
Var h = new Hashtable();
For (int i =0; i < input.length; i++)
{
If h.keys.contains(input[i]){
h[input[i]] += 1;
}
Else{
h.Add(input[i], 1);
}
}
Var counts = h.values.toList().distinct().sort();
if counts.count == 1 && counts[0] == 2 return true;
If (h.values.count.all ( x => x %2 == 0) return true;
If (h.values.count( x => x % 2 == 1) == 1 ) return true;
Return false;
}

Monday, July 4, 2016

Data differencing with deduplication

Data differencing is the technique of producing the difference between two sets of data - a source and a target. The resulting difference data can be used together with the source data to reconstruct the target data. Data compression can be considered a special case of data differencing where the source data is empty and the resulting compressed file corresponds to a “difference from nothing”. Compression usually results in a single file and can be uncompressed to get back a target.

Deduplication is about finding segments of data that may appear repeated and packing them once with their aliases used to reconstruct the target. It differs from compression in the sense that it does not look at short repeated substrings but large sections of files that are identical, finds these unique chunks of data and uses their references instead of large data. Therefore deduplication works with larger sections of files or whole files itself. Certain backup systems attempt to use this optimization by using file references if the files have not changed but neither backup nor compression effectively reduces the storage footprint of data as much as deduplication.

Deduplication is generally peformed on all kinds of data including backups and compressed files as an after effect by certain deduplication and archival appliances. However deduplication can be intrinsic to data differencing or can even be done prior to differencing.

Consider the other way around where deduplication is done on a snapshot of a file. In this case, the entire file is available after the deduplication. Subsequent snapshots will increase storage even with deduplication. This is where differencing helps

Deduplication is widely understood to be done in the form fixed length segments that can be used to traverse data extents giving greater articulation of redundancies than either backup or compression alone.

One concern about using deduplication directly on primary storage is that there is a significant cost involved in the full reconstitution of files – a process also known as rehydration. However, backups and data differencing are generally not applied to data that needs to be packed and unpacked frequently.

Moreover differencing applied to deduplicated data can be scaled to achieve faster and more efficient storage because we are not treating them in terms of changes to files but in segments or even with collections of segments.

This scaling could possibly be done in the form of scaling equations that uses a set of coefficients to determine the groups.

Remove spaces from a string
Void removeSpaces(stringbuilder str)
{
Int count = 0;
For(int i =0; i < str.count; i++)
{
If (str[i] != ' '){
Str[count] = str[i];
Count++;
}
}
Str[count] ='/0';
}

Find the longest repeated substring in text
Use traversal by building trie
Void doTraversal(node root, int h, ref int max, ref int start)
{
If (root==null) return;
Int i =0;
If root.isInternalNode(){
For(int i =0; i < max_char; i++)
{
If (root.children[i])
DoTraversal( root.children[i], h +edge(root.children[i]), ref max. Ref start);
}
}
Else if (root.isLeaf&& max < h-edge(root)){
Max = h-edge(root);
Start = root.index;
}
}

We extend the suffix tree as we encounter chars
Void extendTrie(char n, ref node root){
while( next_char to add){
If ( there is no outgoing edge) {
Add new}
Else{
Extend
}
Reduce remaining char to add
}

}

Sunday, July 3, 2016

#codingexercise
Find an element in a sorted and rotated array
Int getindex(list<int> a, int start, int end, int val)
{
If (end < start) return -1;
If (a[start] <= a[end])
Return binary_chop(a, start, end, val);
If (end == start +1){
If (a[start] == val) return start;
If (a[end] == val) return end;
Return -1;
}
Int mid = (start+end)/2;
int left = getindex(a, start,mid, val);
If (left != -1) return left;
Int right = getindex(a, mid+1, end, val);
If (right != -1) return right;
Return -1;
}
Find the minimum element in a sorted rotated array
Int getmin(list<int> a, int start, int end)
{
If (end < start) return a[0];
If (end == start) return a[start];
Int mid = (start +end)/2;
If (a[mid+1]<a[mid]) return a[mid+1];
If (a[mid]<a[mid-1]) return a[mid];
If (a[start] < a[mid]) return getmin(a,start,mid);
Return getmin(a, mid+1,end);
}
// alternative getindex using getmin
Int findindex(list<int> a, int val);
{
Int min = getmin(a, 0, a.count-1);
Int mindex = a.indexOf(min);
Int left = binary_chop(a,0, mindex,val);
If (left != -1) return left;
Int right = binary_chop(a, mindex+1,end, val);
If (right != -1) return right;
Return -1;
}
Int binary_chop(List<int> a, int start, int end, int val)
{
If (val < a[start] || val > a[end]) return -1;
If (a[start] == val) return start;
If(a[end] == val) return end;
Int mid = (start+end)/2;

Int left = binary_chop(a,0, mid,val);
If (left != -1) return left;
Int right = binary_chop(a, mid+1,end, val);
If (right != -1) return right;
Return -1;
}

Saturday, July 2, 2016

#codingexercise
Remove spaces from a string
Void removeSpaces(stringbuilder str)
{
Int count = 0;
For(int i =0; i < str.count; i++)
{
If (str[i] != ' '){
Str[count] = str[i];
Count++;
}
}
Str[count] ='/0';
}
Given a string that can be formed from the letters arranged as follows:
A B C D E
F G H I J
K L M N O
P Q R S T
U V W X Y
Z
and with only the following positions

Void getPath(string s)
{
Int len = 0;
Int curx = 0;
Int cury = 0;
for( int i = 0; i < s.length; i++)
{
Int nextx = (str[i] - 'A')/5;
Int nexty = (str[i] - 'A')%5;
While (curx > nextx){
Curx--;
len++;
}
While (curx < nextx){
Curx++;
len++;
}
While (cury < nexty){
Cury++;
len++;
}
While (cury > nexty){
Cury--;
len++;
}
}
Return len;
}
Get longest Common Prefix in strings
String walkTrie(ref TrieNode root, int k)
{
Node cur = root;
Int index=0;
String prefix;
While(countChildren(cur, ref index) > k && isleaf(cur) == false){
Cur = cur.children[index];
Prefix+= cur.data;

}
Return prefix;

}

Convert a binary tree into a binary search tree

void ConvertBTtoBST(ref Node root)
{
if (root == null) return null
var inorder = new List<int>();
InorderTraversal(root, ref inorder);
inorder.sort();
InorderToTree(inorder, ref root);
}

Friday, July 1, 2016

Given stock price of a company for some consecutive days. Need to find the maximum span of each day’s stock price. Span is the amount of days before the given day where the stock price is less than or equal to that of given day

void GetSpan(List<int> price, int n, ref List<int> span)
{
var st = new Stack<int>(); // stores indexes
st.push(0);
span[0] = 1;
for (int i = 1; i < n; i++)
{
while (!st.empty() && price[st.top()] < =price[i])
st.pop();
span[i] = (st.empty()) ? (i+1) : (i - st.top());
st.push(i);

}
}

#Data day presentation topic

Backup services in a private cloud

A private cloud has several valuable assets in the form of virtual machine instances. A backup of these VMs helps with restore to point of time, disaster recovery and such other benefits. Backups are different from snapshots because they may include storage volumes and application data. Both backup and snapshots involve large files to the tune of several hundred Gigabytes.

Many cloud providers offer their own backup products. However, a private cloud may be made up with more than one cloud provider. Moreover the private cloud may shift weight on its cloud vendors. It doesn't seem flexible to use one or the other of these cloud specific products. A combination of these products could work but each would introduce a new variable to the clients. Some commercial backup products successfully claim to provide a central solution against heterogenous clouds but they may impose their own restrictions on the backup activities. Consequently a Do-it-yourself backup product that can do all that we want ib a private cloud becomes the only satisfactory solution.

As is conventional with offerings from the private cloud, a service seems better than a product. In addition if this is written in a microservice model, it becomes much more manageable and tested given that this is a dedicated services

Moreover clients can niw call via command line, user interface or direct api calls. Each client offers a two kinds of back up service. Backup service via scheduled jobs and an infrequent on demand push button service. They differ in differentiating the workloads from a customer and make it easier customer to realize their backups. A

frequent schedule alleviates the onus on the customer to take repeated actions and provides

differentiation to the workloads that can be siphoned off to automation.

#codingexercise
The GetSpan also has a simple compute only processing:
void GetSpan(List<int> price, int n, ref List<int> span)
{

for (int i = 0; i < n; i++)
{
for (int j =i; j >= 0; j--)
if (price[j] <= price[i])
span[i] = j-i+1;
}

}