Cluster computing

Tuesday, May 6, 2014

In this post we cover some coding questions based on string operations, bitwise manipulation, recursion, permutations and combinations.
Suppose we were to write down all the combinations of the indices of numbers in a given list of numbers that add up to a particular sum:
The algorithm would look something like this assuming we have a sorted list of numbers and we declare a type IndexedNumber that stores the value and the index corresponding to each entry in the input list
public static void combine(ref List<IndexedNumber> numbers, ref List<IndexedNumber> candidate>, ref List<List<IndexedNumber>> sequences, int level, int start, int n)
{
for (int i = start; i < numbers.Count; i++)
{
if(candidate.Contains(numbers[i]) == false)
{
candidate[level] = numbers[i];
if (candidate.Sum() == n)
sequences.Add(new List<IndexedNumber>(candidate));
if (i < numbers.Count - 1)
combine (ref numbers, ref candidate, ref sequences, level + 1, start + 1, n);
candidate[level] = new IndexedNumber() { Number = 0, Index = -1};
}
}
}
Let's now look at a bit manipulation problem.
We define the problem this way: Write a function to convert an integer x into an integer y and find the number of bits that will need to be changed. For example, given 12 and 16, it should return 3
because 12 is 0xC or 1100 and 16 is 0x10 or 10000 and we have to change 3 bits
int ConvertXtoY(int x, int y)
{
int a = x;
int b = y;
int count = 0;
while ( a || b)
{
if ( (a & 0x1) ^ (b& 0x1)) count ++;
a >>= 1;
b >>= 1;
}
return count;
}
String operation question:
find the longest repeating substring:
char* getMatch(const char* str)
{
char* match = NULL;
int len = 0;
char* first = str;
while (first)
{
char* second = str++;

// case 2
while (second)
{
    char* lMatch = NULL; //local match
    int llen = 0; // local match len
    char* orig = first;
    char* local = second;

    //case 0
    while ( local && *local == *orig)
    {
      if (llen == 0) lMatch = orig;
      llen++;
      local++;
      orig++;
    }

    if (llen > len ) { match = lMatch; len = llen;}

    //case 1
    second++;
}
first++;
}

return match;
}

Monday, May 5, 2014

Today we discuss some more of the .Net APIs again from the Framework class library. Here the we first take a brief look at System.Timers The System.Timer namespace provides a component that allows us to raise an event on a specified interval. This is a server based timer that allows us to specify a recurring interval at which the Elapsed event is raised. Then this event can be used to provide regular processing. Where these server timers differ from windows timers is that these timers can move among threads to handle the Elapsed event. The arguments to the Elapsed event provide data for handling it.
When we look at the implementation of this, it uses the underlying System.Threading.Timer class and GetSystemTimeAsFileTime method for the time arithmetic. On each update of the enabled, it updates the existing underlying timer. If a zero value was provided, it disposes the existing timer and sets the enabled to value. By updating the timer, it changes the value of the interval.
Let us now look at the System.Threading.Timer.
The System.Threading.Timer takes as initialization parameters : callback, state, dueTime and period. For initialization it makes use of an object called the TimerHolder. On each update of the timer it changes the dueTime and period. The TimerHolder in turn takes a TimerQueueTimer with the same initialization parameters for its initialization. The TimerQueue maintains a list of active timers in the AppDomain. It uses a single native timer supplied by the VM to actively manage all the timers.
The data structure used to maintain all the active timers is a doubly linked list because it requires fast insert and delete while traversal can be linear. The FireNextTimers method on this queue traverses each timer to see whether the elapsed is greater than the dueTime and deletes it if it's not periodic. If its the first time, it will be fired on this thread otherwise it will be fired on the thread pool. If the timer hasn't fired yet, it just updates the next time the native timer fires. By updating a timer, not only is its dueTime, period and startTicks updated but also its prev and next in the list.
Given the ease of programmability in C# with the Framework APIs, writing or even rewriting an entire application in C# would be interesting.
We will continue to review the rest of the Framework APIs but I want to take a brief break to solve a puzzle.
How do we decode Kanjii characters ? Let's treat Kanji as Unicode. We know that Unicode gives a codepoint to each of the characters represented in the standard. UTF-16 encoding uses the two 8bit bytes for the regular ISO characters and upto two 16 bit units to handle each additional characters. Essentially a text represented in UTF-16 encoding could have variable number of bytes for each character. We therefore walk the text a character at a time by identifying the byte boundaries much the same way as we carefully walk t the headers of a packet. In code this could translate as an iterator over the bytes that appends to a text a character deciphered by reading one or more bytes at a time. Each character could be represented as a codepoint in the usual notation of U+hex or U+four_digit_hex or U+five_digit_hex or more depending on the blocks and planes used as defined in the Unicode standard.
We will also briefly review another problem in which given an inorder and a preorder traversal of a tree, we will construct the tree and return the root.
So given an inorder traversal of 1,2,3,4,5,6,7
and a preorder traversal of 4,2,1,3,6,5,7.
This has already been treated earlier but the idea is reinforced with the following principle.
Given a node in the preorder list we find the root and the size of the left subtree by simply counting the number of nodes that appear before it in the inorder. So take 4 and the number of nodes before it is three in inorder. Therefore the fourth node from 4 in the pre-order is the right child. To determine the left child, take the immediately adjacent node in the pre-order.
In the above example, if we take a tree with the nodes 3 and 5 missing,
we have the inorder as 1,2,4,6,7
and preorder as 4,2,1,6,7
Here the left child of 6 is null and the right child of 2 is null.
We determine the missing left child by noticing that the item before in the inroder is the parent or null
We determine the missing right child by noticing that the item after in the inorder is the parent or null.
Always review and test all code.
We now look at some more .NET APIs
System.Threading also has other primitives typically available from the operating system.
These include locking and thread c intro primitive
While we are on the topic of semaphores and threads, let us look at a producer consumer problem. A producer consumer can use a semaphore for buffering a count of jobs whereas they can use a mutex if there is only one job to mutually exclude.

Today we discuss some more of the .Net APIs from the framework class library. Specifically we discuss the task parallel library. We first look at data and task parallelism and then we discuss some of the potential pitfalls we need to watch out for.
Data parallelism refers to scenarios in which the same operation is performed concurrently on say elements in a source collection. The collection is partitioned so that multiple threads can work on it. The for loop can be partitioned with a lambda expression.
Parallel.ForEach( sourceCollection, item => Process(item));
There are options available to stop or break loop execution, monitor the state of the loop, maintain thread local state, finalize thread-local objects, control the degree of concurrency and so on.
Task Parallelism is based on the concept of a task which represents an asynchronous operation and resembles a thread or thread pool without the corresponding OS primitives but as an abstraction
This enables more efficient and more scalable use of system resources.
It also enables more programmatic control than is possible in a thread or work item.
Parallel.Invoke() method enables creating and running tasks implicitly. As an example:
Parallel.Invoke(() => DoSomeWork(), () => DoSomeOtherWork());
The number of task instances for the above statement does not have to match the number of parameters to Invoke method.
For greater control over the tasks, we can use Task.Wait and Task.Run method.
The potential pitfalls in Data and Task Parallelism are:
we cannot assume that parallel is always faster. If the parallel loops have few iterations and fast user delegates, then it is unlikely to speed up much.
We should not write to shared memory locations because they increase the race conditions and contention.
We should avoid doing over-parallelization. Each partitioning incurs a cost and there is cost to synchronize the worker threads.
We should avoid calls to non-thread safe methods because it can lead to data corruption which may go undetected.
We should also limit the calls to thread safe methods because there can be significant synchronization involved.
Delegates used with Parallel.Invoke should not have waiting because it affects the execution of others and can result in deadlocks when the tasks are inlined to the currently executing thread.
We should also not assume that iterations in the loop specified always execute in parallel.
We should avoid executing parallel tasks on UI threads.

Sunday, May 4, 2014

Revisit of some TSQL
Create Table example:
CREATE TABLE Demographics(
id INT IDENTITY (1,1),
name VARCHAR(15) NOT NULL,
country VARCHAR(2) DEFAULT 'US'
CONSTRAINT country_not_null NOT NULL
CONSTRAINT country_check
CHECK (country IN ('CA', 'US')),
indexed_name VARCHAR(15),
CONSTRAINT demographics_pk
PRIMARY KEY (id)
CONSTRAINT demographics_fk01
FOREIGN KEY (name, country)
REFERENCES parent_example (name, country) );

ALTER TABLE Demographics ADD
age INT,
CONSTRAINT age CHECK (age > 0 AND age < 150);

Regular expressions can be specified with one or more of
%, _, [...] and [^...]
Common formatting syntax use CASE, COALESCE and ISNULL()

Hierarchical queries can be written with common table expressions:
To select direct reports for a manager, we write
WITH directreports
( manager_id, employee_id, employee_name, level)
AS
(SELECT e.manager_id, e.employee_id, e.employee_name, 0 as level FROM dbo.Employee e
WHERE e.manager_id is NULL
UNION ALL
SELECT d.manager_id, d.employee_id, d.employee_name, level + 1 FROM directreports d
ON e.manager_id = d.employee_id)
SELECT manager_id, employee_id, name from directreports

SELECT d.name as department_name
COUNT(*) as employee_count
From employee e inner join department d
ON e.department_id = d.id
GROUP by d.name WITH ROLLUP
-- generates summary row

WITH CUBE to generate summaries for all possible combinations and a grand total
GROUPING(group_by_col_name) returns 1 or 0 when the associated columns in the result set contain a null.

For string functions use CHARINDEX or SUBSTRING or LENGTH or +
DATETIME functions involve GETUTCDATE(), DATEADD(datepart, interval, date) and DATEDIFF(datepart, startdate, enddate) eg. DATEADD(day, 1, GETDATE());

use CONVERT for datetime

Next BCL coverage of .Net

FileSystem Interface:
Classes:
File : methods, Create, Delete, Open, Read, Write, GetTime
Directory: methods: Create, Delete, Exists, EnumerateFiles, GetDirectories, GetFiles, GetTime
FileInfo
DirectoryInfo
FileStream
Path - static methods.
FileWatcher

System.Messaging has the following classes and methods:
MessageQueue class has different methods as follows: 1) Send 2) Receive 3) Peek 4) , GetPublicQueuesByMachine, GetPublicQueuesByLabel and GetPrivateQueuesByMachine.

MessageQueueCriteria, MessageQueueEnumerator, MessageQueuePermission and MessageQueueTransaction are other classes

We write a few more methods.
Node * Tree_Successor(Node* root)
{
Assert (root != null);
if (root->right)
{return Tree_Minimum(root->right);}
Node* x = root;
// find the lowest ancestor of x whose left child is also an ancestor of x
Node* y = root->parent;
while ( y && x == y->right)
{
x = y;
    y = y->parent;
   }
return y;
}
Node* Tree_Minimum(Node* root)
{
Node *x = root;
while (x && x->left)
   x = x->left;
return x;
}

Node* Tree_Predecessor(Node* root)
{
Assert (root != null);
if (root->left)
{return Tree_Maximum(root->left);}
Node* x= root;
Node* y =   root->parent;
while (y && x == y->left)
{
x = y;
y = y -> parent;
}
return y;
}

File - System organization
There are primarily two design considerations:
1) exposure of file system to user via definitions of file, attributes operations
2) overlay of logical filesystem on top of physical file system via data structures and algorithms
The file system stack consists of
application programs
logical file system
file-organization module
basic file system
I/O control
devices
a typical open file table will have
index, file_name, permissions, access dates, and pointer to disk block.
A file is represented by an inode (short for index node) in Unix. The in ode contains the user and group identifiers of the file, last file modification and access, count of number of hard links, and the type of the file.In addition, the inode contains 15 pointers to the disk blocks containing the data contents of the file. Inodes also contain pointers that point to direct blocks. There can be pointers to single indirect and double indirect as well as triple indirect blocks. File structures are inherited by the child process after a fork so several processes may share the same offset location for a file.

Saturday, May 3, 2014

Some notes on the book on C++.
A user defined allocator can have chunk of memory to allocate or deallocate from. The class called Pool has methods for alloc and free. Its destructor deletes all the chunks . The method for grow allocates a new chunk and organizes it as a linked list of elements.
AutoPointers support the resource acquisition is initialization technique. An auto-pointer is initialized by a pointer and can be dereferenced in the way that a pointer can. The object pointed to will be implicitly deleted at the end of the auto_ptr scope.
auto_ptr<Shape> p (new Rectangle(p1, p2));
Copy Constructor and assignment operator can be kept private for the most part.
class X {
//
X (SomeType);
X( const X&);
X& operator= (const X&);
~X();
};

class X {
// members with implicit this pointer
X* operator& ();
X operator&(X);
X operator ++ (int);
X operator&(X,X);
X operator / ();
};

Template specializations : the specialization ( argument list match ) proceeds one after the other
The order is as follows:
template <class T> class Vector; // general
template <class T> class Vector<T*>; // specialized for any pointer
template <> class Vector<void*>;

Function templates
template<class T> void sort(vector<T> & v);

Sequences and containers
list<int>::iterator p = find(li.begin(), li.end(), 42);

Standard containers :
<vector> single dimensional array : methods size(), operator [], at(), front(), back()
<list> doubly linked list iterator, reverse_iterator methods: insert(), erase(), clear()
<deque> doubly ended queue : methods : empty(), front(), back(), push(), pop()
<queue>
<stack> : methods push_back(), pop_back() on Stack
<map> associative array of T
<set> set of T
<bitset> array of booleans
size(), capacity() and reserve() the latter two are for vectors
push_back(), pop_back(), push_front() and pop_front()
priority_queue : methods top()

Hashing
template<class T> size_t Hash<T>::operator()(const T& key) const
{
size_t res = 0;
size_t len = sizeof(T);
const char * p = reinterpret_cast<const char*> (&key);
while ( len--) res = (res << 1) ^ *p++;
return res;
}

We now look at the Streams and strings.
Streams include fstream and stringstream derived from iostream. iostream derives from i stream and ostream. Stream methods include open and close
string methods include compare, insert, append, concatenation (+) find, replace, and substr(). The size and capacity are similar to vector

This weekend I'm going to cover some basics in computer programming. They will be from the Algorithms and Data Structures book.

Merge-Sort(A,p,r)
if (p < r)
then q <- (p + r) / 2
   Merge-Sort(A, p, q)
   Merge-Sort(A, q+1, r)
   Merge(A, p, q, r)

Quick-Sort(A,p,r)
if p < r
then q <- Partition(A, p, r)
Quick-sort(A, p, q - 1)
            Quick-sort(A, q+1, r)

Partition (A, p, r)
x <- A[r]
i <- p - 1
for j <- p to r - 1
do if A[j] <= x
then i = i + 1
exchange A[i] = A[j]
exchange A[i + 1] <-> A[r]
return i + 1

Bubble-Sort(A)
for i = 1 to length[A]
    do for j = length[A] down to i + 1
          do if A[j] < A[j - 1]
               then exchange A[j] -> A[j - 1]

Tree-Insert(T, z)
y <- nil
x <- root[T]
while x != NIL
do y = x
if key[z] < key[x]
then x <- left[x]
else x <- right[x]
p[z] = y
if ( y == NIL)
then root[T] <- z
else if key[z] < key[y]
then left[y] <- z
else right[y] <- z

Tree-Delete(T,z)
// Case 1 z has no children
// Case 2 z has one child => splice out
// Case 3 z has two children => substitute z
with the successor that has no left child
if left[z] = NIL or right[z] = NIL
then y = z
else y = Tree-Successor(z)
if left[y] <> NIL
then x <- left[y]
else x <- right [y]
if x <> NIL
then p[x] <- p[y]
if p[y] = NIL
then root[T] = x
else if y = left[p[y]]
then left[p[y]] = x
else right[p[y]] = x
if y <> z
then key[z] <- key[y]
copy y's satellite data into z
return y