Cluster computing

Friday, October 10, 2014

# coding exercise
We implement yet another way to print the area of the largest rectangle in a histogram of unit width heights. In addition to the simple approaches described in the previous posts, we look at recursive approach because there are overlapping solutions and iterative approaches using data structures
One approach is to use divide and conquer.
We can use the overlapping subproblems in terms of the areas computed between a range of indices. further if we can keep track of these indices and computations we dont have to redo them. This tecnique is called memoization. Here we don't compute the maxarea for a range that we have already computed and stored the value. So we keep track of ranges and the max area computed by the increasing order of start and end.Each time we update a range we update the start or end and the corresponding max value or both. How we choose the range of indices is our choice. The goal is to exhaust all the indexes in the histogram so that we don't leave out any portion. One idea here as it has appeared on geeks for geeks website is that we find the minimum value of the heights of bars in a histogram by dividing and conquering the entire range.
Given this minimum we find the max area as the maximum of the following :
1) Maximum area to the left
2) maximum area to the right
3) number of bars multiplied by minimum value
Note that the minimum in a range of bars does not guarantee the maximum area unless the same is applied for all ranges including the one bar that may shoot out of the chart.
Another way of choosing the indexes is to find it progressively to the right as we traverse range from start to end of indexes. In this method, The range that we have already covered, we have exhausted each bar as contributing to the final answer .
Yet another approach would be to divide and conquer the indexes and combine them so we calculate the max area of (a,b,c) as
Max (a, b, c ).
Specifically, given two bars in adjacent ranges, the area is the maximum of
1) minimum common height times number of bars in the combined range
2) maximum area of one range
3) maximum area of the other range.
If we chose the latter approach,
We keep track of the areas computed in the data structure discussed in memoization.
First without memoization, the solution is
Int MaxArea (ref int [] h, int start, int end, ref int min)
{
If (start == end)
{
min = h [start];
return min × 1;
}
If (start < end)
{
Int mid = (start + end)/2;
Int minleft = 0;
Int minright = 0;
Int left = MaxArea (c, ref h, start, mid, ref minleft);
Int right = MaxArea (c,ref h, mid +1, end, ref minright);
min = min (minleft, minright) ;
Int minArea= min × (end-start+1);
Return max (left,right, minArea);
}
Return 0;
}

Thursday, October 9, 2014

I came across a method implemented in a library that gives a programmatic way to establish SSH sessions (Python pexpect library - pxssh.py). This library had a method to compute the Levenshtein distance as follows:

'''This calculates the Levenshtein distance between a and b.

'''

n, m = len(a), len(b)

if n > m:

a,b = b,a

n,m = m,n

current = range(n+1)

for i in range(1,m+1):

previous, current = current, [i]+[0]*n

for j in range(1,n+1):

add, delete = previous[j]+1, current[j-1]+1

change = previous[j-1]

if a[j-1] != b[i-1]:

change = change + 1

current[j] = min(add, delete, change)

return current[n]

As we may know already, Levenshein distance is a metric for measuring the difference between two string sequences. The distance is computed in terms of single-character edits specifically (insertions, deletions and substitutions). The method takes a shorter string and transforms it to the longer string. The delete is calculated from the position to the left in the sequence. The change is to the position on the left in the candidate. The add is at the immediate position in the candidate. The positions are iterated for both sequences. Match gets a value zero and a mismatch costs 1 by way of transformation at the specified positions.

Wednesday, October 8, 2014

#coding exercise
print the area of the largest rectangle in a histogram of unit width heights.
public static int MaxArea(ref List<int> h)
{
if (h == null || h.Length <= 0) return 0;
var areas = new List<int>();
for (int i = 0; i < h.Length; i++)
{
areas.add(i, h[i]);
for (int k = i+1; k < h.Length; k++)
{
if (h[k] >= h[i])
areas[i] += h[i];
else
{
break;
}
}
}
return areas.Max();
}
public class hc
{
// height
Public int h { get; set;}
// frequency
Public int c { get; set;}
}
Public static int MaxAreaByStack (List <int> h)
{
if (h == null || h.Length <= 0) return 0;
Int MaxArea =0;
var hts = new stack<int>();
Var f = new List<int>();
Hts.push (h [0]);
F.Add (1);
for (int i = 1; i < h.Length; i++)
{
If (h [i] > h [i-1])
{

For (int k=0; k< hts.count; k++)
{
F [k] +=1;
}
Hts.Push (h [i]);

F.Add (1);
}

Else

{

For ( int k = hts.count -1; k >=0;k--)

{

If (h [k] <= h [i]) break;

Var last = hts.pop ();

Var count = f.Last ();

F.RemoveLast ();

If (last × count > maxArea)
MaxArea = last×count;
}
}
}
Return maxArea;
}

Tuesday, October 7, 2014

A review of configuration requirements tips and tricks by Carlos Galeona for scale out NAS.
Cluster with OneFS ranging in size of filesystem from 18TB to 15.5 PB that's easy to manage with no RAIDs, LUNs, etc. and easy to grow is the target of the configuration. Minimum system configuration is a cluster with three nodes, two infiniband switches, the same version of OneFS, licenses for different modules. Common tasks involve upgrade, deployment, verification etc. cfengine deployment for host, and cfagent runs from cron for each node is the practice. Management API exists for OneFS 7.1.1 A traditional backup is an example use case. An NFS file system is mounted to a backup server or client and then a native backup is done. A large scale file system could mean multiple backups and parallel tree traversals. The latter can take a long time. 7.1.1 API allows the creation of changelist that addresses latter.

Sample program to create a web API based administration session, create an NFS export and create a snapshot.

# import relevant libraries
import json
import urllib
# for data = urllib.urlopen(url, params).read()
import httplib2
http = httplib2.Http()

# for http.request(url) instead of authenticated urlopen

# setup
api_user = 'Your api username'
api_pass = 'Your api password'

# create session

url = 'https://<cluster-ip-or-host-name>:8080/session/1/session'
params = urllib.urlencode({
'username': api_user,
'password': api_pass,
'services': ['NFS','Snapshot']
})
http.add_credentials(api_user, api_pass)
response, content = http.request(url, 'POST', params,
headers={'Content-type': 'application/x-www-form-urlencoded'}
)

# validate
url = url + '?isisessid'
http.add_credentials(api_user, api_pass)response,content = http.request(url)
data = json.loads(content.json())
if data["username"] != api_user:
raise Error('bad session')

#create nfs export
url = 'https://<cluster-ip-or-host-name>:8080/platform/1/protocols/nfs/exports'
params = urllib.urlencode({
'description': 'sample mount',
'paths': ['/path','/path1'] # under /ifs
})
http.add_credentials(api_user, api_pass)
response, content = http.request(url, 'POST', params,
headers={'Content-type': 'application/x-www-form-urlencoded'}
)
data = json.loads(content.json())
if not data["id"] :
raise Error('bad export')

# validate
id = data['id']
url = 'https://<cluster-ip-or-host-name>:8080/platform/1/protocols/nfs/exports/' + id + '?describe'
http.add_credentials(api_user, api_pass)
response,content = http.request(url)
data = json.loads(content.json())
if not data["id"] :
raise Error('bad export')

#create a snapshot
url = 'https://<cluster-ip-or-host-name>:8080/platform/1/snapshot/snapshots'
params = urllib.urlencode({
'name': 'sample snapshot',
'path': '/ifs'
})
http.add_credentials(api_user, api_pass)
response, content = http.request(url, 'POST', params,
headers={'Content-type': 'application/x-www-form-urlencoded'}
)
data = json.loads(content.json())
if not data["id"] :
raise Error('bad snapshot')

# validate
id = data['id']
url = 'https://<cluster-ip-or-host-name>:8080/platform/1/snapshot/snapshots/' + id + '?describe'
http.add_credentials(api_user, api_pass)
response,content = http.request(url)
data = json.loads(content.json())
if not data["created"] :
raise Error('bad snapshot')

from datetime import datetime, timedelta
createddate = datetime.strptime(data['created'])
if datetime.now - createddate > timedelta(hours=2):
raise Error('old snapshot')

#coding exercise
Given a non-negative number represented as an array of digits, plus one to the number.

void plusOne(ref List<int> digits, int position)

{
if (position >= digits.Length) return;
If (position < 0) return;
if (digits[position] + 1 > 9)
{
digits [position] = 0;
if (position - 1 < 0) {// check INT_MIN underflow; digits.InsertAt(0, 1);}
Else
plusOne(ref digits, position - 1);
}
else
digits[position] += 1;
}

# coding exercise
Validate BST

Bool isValidBST ( node root)
{
If (root == null) return true;
If (root.left && root.left.data >= root.data ) return false;
If (root.right && root.right.data < root.data ) return false;
Return isValidBST(root.left) && isValidBST(root.right);
}

Given a singly linked list L0->L1->L2-> Ln-1-> Ln
Return interleaved L0->Ln->L1->Ln-1...
List <Node> Interleave ( List <Node> items)
{

If ( items == null || items.length <= 2) return null;
Int n = items.length;
int mid = (n%2==0) ? n/2 : n/2+1;
Var rem = root.GetRange (mid);
Int I = 1;
Int count = rem.length;
For (int k=0; k < count;k++)
{
Node last = rem.last ();
Rem.removeLast ();
Root.insertat (i, last);
I = I +2;
}
}
Node* interleave ( node* root, int n)
{
If ( root == null || n <= 2) return;
int mid = (n%2==0) ? n/2 : n/2+1;
Node* start = root;
Node* last = start->next;

while(start && last)
{
Node* prev = start;
while(last && last->next)
{
prev = last;
last = last->next;
}
prev->next = null;
Node* next = start->next;
last->next = next;
start->next = last;
start = next != null ? next->next : null;
last = (start != null) ? start->next : null;
}
}

Monday, October 6, 2014

#coding exercise
Given a binary tree, check if its symmetric around the center
bool isSymmetric( Node root)
{
if (root == null) return false;
List<List<Node>> items= GetZigZag(root); // level wise items (already implemented earlier)
for (int i = 0; i < items.Count; i++)
   if (IsPalindrome(items[i]) == false)
        return false;
return true;
}

bool isPalindrome(List<Node> items)
{
if (items == null) return false;
int start = 0;
int end = items.Count - 1;
while ( start < end)
{
     if (items[start] != items[end])
         return false;
    start++;
    end--;
   }
   return true;
}

Given a pair of rectangles aligned along the axis in the positive quadrant and given by
class Rectangle
{
Point tl; // TopLeft;
Point br; // BottomRight;
}
class Point
{
int x;
int y;
}
Implement the following methods
static bool IsIntersect( Rectangle r1, Rectangle r2);
static Rectangle Intersecting(Rectangle r1, Rectangle r2);

static bool IsIntersect( Rectangle r1, Rectangle r2)
{
bool isdisjoint = false;
// are they disjoint along x axis ?
if (r1.tl.x < r2.tl.x)
   isdisjoint = r1.br.x < r2.tl.x;
else
isdisjoint = r2.br.x < r1.tl.x;
if (isdisjoint == true) return false;

// are they disjoint along y-axis ?
if (r1.br.y < r2.br.y)
   isdisjoint = r1.tl.y < r2.br.y;
else
isdisjoint = r2.tl.y < r1.br.y;
return isdisjoint == false;
}

static Rectangle Intersecting(Rectangle r1, Rectangle r2)
{
if (!IsIntersect(r1, r2)) return null;
Rectangle r = new Rectangle();
Rectangle left = (r1.tl.x <= r2.tl.x) ? r1 : r2;
Rectangle right = (r1.tl.x <= r2.tl.x) ? r2 : r1;
  Rectangle bottom = (r1.tl.y <= r2.tl.y) ? r1 : r2;
Rectangle top = (r1.tl.y <= r2.tl.y) ? r1 : r2;
r.tl.x = right.tl.x;
r.br.x = right.br.x <= left.br.x ? right.br.x : left.br.x;
r.tl.y = bottom.tl.y;
r.br.y = bottom.br.y <= top.br.y ? top.br.y : bottom.br.y;
return r;
}

Sunday, October 5, 2014

Ceilometer & Splunk Integration via Python SDK:

Ceilometer is OpenStack’s telemetry. It’s source code is available at https://github.com/openstack/ceilometer. It collects all kinds of measurements and metering information such that no two agents need to be written to collect the same data. It’s primary consumer is the billing system for which it acts as a unique and central point of contact to acquire all of the information collected across all OpenStack core components. The project strived to be efficient in terms of CPU and network costs. It supports both the push notifications from the existing services and the pull model by polling the infrastructure. Deployers can configure the type of data collected. Publishers are available as a Python implementation. A REST API is also available to expose the data collected by the metering system.

As an example of the events from Ceilometer, we review the PAAS event format. There are a number of PAAS services that have metering payloads. Instead of having each service define its own, the Ceilometer provides a minimum data set as described here.

Splunk is an IT tool that enable searching a needle in a haystack of information. Specifically, it forwards information collected from various sources (via its modular inputs) and indexes them in time series events which can then be searched and analyzed using a wide variety of search operators and charting tools. Splunk also comes with its own SDK and one that is available in Python.

Splunk SDK supports a variety of common operations with Splunk through its programmable APIs. In particular it supports a UDP modular input that can suit Ceilometer.

The host and the port binds the publisher to the subscriber. In this case, we can configure the host and UDP port for ceilometer to publish to Splunk.

#!/usr/bin/env python
#
# Copyright 2013 Ravi Rajamani.
import sys
from splunklib.modularinput import *
class CeilometerIntegrationScript(Script):
    def get_scheme(self):
        scheme = Scheme("Ceilometer Telemetry")
        scheme.description = "Streams events from Ceilometer."
        scheme.use_external_validation = True
        scheme.use_single_instance = True
        scheme.validation = ""
        ceilometer_argument = Argument("data_from_ceilometer")
        ceilometer_argument.data_type = Argument.data_type_string
        ceilometer_argument.description = "Telemetry data from Ceilometer to be produced by this input."
        ceilometer_argument.required_on_create = True
        return scheme
    def validate_input(self, validation_definition):
        data = str(validation_definition.parameters["data_from_ceilometer"])
        if not data:
            raise ValueError("Ceilometer data could not be read.")
    def stream_events(self, inputs, ew):
        """This function handles all the action: splunk calls this modular input
        without arguments, streams XML describing the inputs to stdin, and waits
        :param inputs: an InputDefinition object
        :param ew: an EventWriter object
        """
        for input_name, input_item in inputs.inputs.iteritems():
            # Create an Event object, and set its data fields
            event = Event()
            event.stanza = input_name
            event.data = "number=\"%s\"" % str(input_item["data_from_ceilometer"])
            # Tell the EventWriter to write this event
            ew.write_event(event)
if __name__ == "__main__":
    sys.exit(CeilometerIntegrationScript().run(sys.argv))