Monday, October 24, 2016

Today we continue our discussion on natural language processing with a particular word ontology called FrameNet.
Framenet is a lexical database maintained by Berkeley.
It has the following features:
It has annotations about how words are  used in actual text.
It can be used like a dictionary.
It has over 170000 annotated sentences describing word usage
It provides a unique training data set
It is both human readable and machine readable.
The data is freely available to download.
The annotations show how words have unique combinations.
FrameNet defines the idea that words have a meaning based on the presence of a semantic frame.  This is a type of event, relation or entity and the participants in it. The example cited is that of cooking. The concept of cooking typically involves a person doing the cooking (cook) and the food that is cooked (food) and something to hold (container) along with a source of heat (heating instrument) This is represented as a frame called 'Apply_heat' where the frame elements are the cook, food, container and heating_instrument. The frame units then naturally fit as superimposed on the words that evoke them.
The frame units help with both semantic and syntactic relationships. As such they are better representations than terms themselves while allowing the models based on those terms to work as before, if not improved with the frame elements.


#codingexercise
Insert a node into a sorted circular linked list
Void insertCircular(ref node head, node target)
{
If (head == null) {head = target; target.next = target; return;}
Node prev = null;
Node curr = head;
while(curr.data < target.data && curr.next != head){
    Prev = curr;
    Curr = curr.next;
    }
 If (prev == null){
    If (curr.data < target.data){
         InsertAt(cur, target);
    }else{
         InsertAtStart(head, target);
     }
 }else{
   if (curr.data < target.data) {
        InsertAt(cur, target);
   }else{
        InsertAt(prev, target);
   }
 }
}


void InsertAt(ref node head, ref target)
{
// Insert After head
var next = head.next;
target.next = next;
head.next = target;

}


Sunday, October 23, 2016

#codingexercise
Print all nodes at a distance k from a given node
void printKDistanceDown(Node root, int k)
{
if (root == null || k < 0) return;
if (k == 0)
   Console.Write(root.data);
printKDistanceDown(root.left, k-1);
printKDistanceDown(root.right, k-1);
}

// returns distance of root from target
int PrintKDistanceNode(Node root, Node target, int k)
{
if (root == null) return -1;
if (root == target){printKDistanceDown(root, k); return 0;}
int dl = printKDistanceNode(root.left, target, k);
if (dl != -1)
{
   if (dl + 1 == k)
       Console.Write(root.data);
   else
       printKDistanceDown(root.right, k -dl-2);
   return 1 + dl;
}
int dr = printKDistanceDown(root.right, target, k);
if (dr != -1)
{
      if (dr +1 ==k)
         Console.Write(root.data);
      else
          printKDistanceDown(root.left, k-dr-2);
      return 1+dr;
}
return -1;
}

2. Given a string, recursively remove adjacent duplicate characters from string. The output string should not have any duplicates.
For eg: abcdaadhhhhhzppzl
becomes abchl
 static String RemoveDuplicate(StringBuilder str, int n)
        {
            for (int m = 0; m < int.MaxValue; m++)
            {
                bool modified = false;
                int k = 0;
                int len = str.ToString().Length;
                if (len == 0) return str.ToString();
                for (int i = 0; i < len; i++)
                {
                    if (k == 0) {
                        str[k] = str[i];
                        k++;
                        continue;
                    }
                    if (str[k] == '\0' || str[i] != str[k-1])
                    {
                        str[k] = str[i];
                        k++;
                    }
                    else
                    {
                        k--;
                        str[k] = '\0';
                        modified = true;
                    }
                }
                if (!modified)
                    break;
                str[k] = '\0';
                k++;
                for (int i = k; i < len; i++)
                    str[i] = '\0';
                Console.WriteLine("val =" + str.ToString());
            }
            return str.ToString();
        }

Saturday, October 22, 2016

#codingexercise
Flatten out a linked list with child and next pointers
void Flatten(Node head)
{
if (head == null) return;
Node temp;
Node tail = head;
while (tail.next)
     tail = tail.next;
Node cur = head;
while ( cur != tail)
{
if (cur.child && cur.child.visited == false)
{
tail.next = cur.child; // this goes to the tail
tmp  = cur.child;  // tail is updated
tmp.visited = true;
while (tmp.next)
  tmp = tmp.next;
  tmp.visited = true;
tail = tmp;
}
cur = cur.next;
}
}
#probability question
A fair die is thrown k times. What is the probability of sum of k throws to be equal to a number n?
void FairThrow(int k, int n, ref int current, ref int sum, ref int count, ref int total)
{
 if (current == k) {
      total += 1;
      if (sum == n) { count +=1;}
      return;
}
 for (int i = 1; i <=6; i++)
 {
         sum+= i;
         current += 1;
         FairThrow( k, n, current, sum, count);
         sum -= i;
         current -= 1;
 }
}
Probability = count/total;

Friday, October 21, 2016

Today we continue to discuss the conditional random fields we mentioned earlier.We have seen some generalized discussion about what random fields are, their benefits, their representations, the objective functions and their optimizations including techniques such as CG method which ties in things we have previously studied for the purposes of text analysis. We wanted to introduce sense embedding features into the word-context analysis. We can represent the conditional probability of a sense given its context using their corresponding vectors. Then we choose the parameters numbering 1..d for each of the context, semantic pairs. We would like to set the parameters such that their conditional log likelihood is maximized. We could do this with conjugate gradient method.
We saw linear chain CRFs. Now we could generalized this and the relax the requirements for previous state dependency.  Instead we define CRFs with general graphical structure. Such structures are found to be very helpful for relational learning because they relax the iid assumption between entities.
So far we have seen that models CRF included have been applied with training and testing data to be independent. However CRFs can also be used for within-network classification - one where we can model probabilistic dependencies between training and testing data. The generalization from linear chain CRFs to general CRFs is about moving from a linear chain factor graph to a more general factor-graph.
The general definition of the conditional random field is presented this way:  Let G be a factor graph over Y. Then p(y|x) is a conditional random field if for any fixed x, the distribution p(y|x) factorizes according to G. Therefore each conditional distribution is a CRF even though  it may be trivial. If F is a set of factors in G where each factor takes the exponential form, then each conditional distribution can be written in the form of exponential of weighted feature functions normalized by a partition function similar to what we have seen before except that they apply to each factor in the set F.
Models usually depend on parameter tying. For example, in the linear chain case, the same weights are used for the factors at each time step. This maintains the parameters between the transition. The same applied to a general model and not just a chain model.  To do the same, the factors of G are partitioned into C = C1, C2, .... Cp where each Cp is a clique template whose parameters are tied. A clique template is a set of factors which has a corresponding set of sufficient statistics and parameters. Then the weighted feature functions in the general definition can be replaced by a combination of the sufficient statistics and their parameters.
#codingexercise
Find the position of an element in a sorted array of infinite numbers:
int GetPosition(List<int> nums, int val)
{
int low = 0;
int high = 1;
int cur = nums[0];
while (cur < val)
{
low = high;
high = 2 * high;
val = nums[high];
}
return binarysearch(nums, low, high, val);
}

Void binarysearch(list<int> nums, int low, int high, int val)
{
Int mid = (low + high)/2;
If (nums[mid] == val) return mid;
If (low == high) return -1;
If (nums[mid] < val)
    Return binarysearch(nums,mid+1,high,val);
Else
     Return binarysearch(nums,low,mid,val);
}

Thursday, October 20, 2016

Today we discuss a specific automation issue. This pertains to authentication:

We explain a difference between a JWT digest token and an HMAC hash:

We construct a JWT token this way:
    def data_to_sign(endpoint, environment, owner):
        banner = '{"alg":"SHA256withRSA","kid":"' + environment + '"}'
        banner = base64.b64encode(banner)
        start = int(datetime.datetime.now().strftime("%s"))*1000
        end = datetime.datetime.now() + datetime.timedelta(seconds=300)
        end = int(end.strftime("%s"))*1000
        expiration = '{"iss":"' + environment + '","nbf":' + str(start)  + ',"sub":"' + owner  + '","exp":' + str(end) + ',"aud":"' + endpoint + '"}'
        expiration = base64.b64encode(expiration)
        data_to_sign = banner + "." + expiration
        return data_to_sign

    def signature(endpoint, environment):
        data = data_to_sign(endpoint, environment)
        unpacked = data.split('.')
        unpacked[0] = base64.b64decode(unpacked[0])
        unpacked[1] = base64.b64decode(unpacked[1])
        unpacked = unpacked[0]+"."+unpacked[1]
        with open('./pem/file.txt', 'w') as f:
             f.write(unpacked)
        cmd = 'openssl  dgst -sha256 -sign ./pem/privateKey.key  -out ./pem/signature.sign ./pem/file.txt'
        cmd_execute(cmd.split())
        digest = ''
        with open('./pem/signature.sign', 'rb') as f:
             digest = base64.b64encode(f.read())
        cmd = 'openssl  dgst -sha256 -verify ./pem/privateKey.pub  -signature ./pem/signature.sign ./pem/file.txt'
        output = self.cmd_execute(cmd.split())
        # output says 'Verified'
        auth = data_to_sign + "." + digest
        return auth

       PyJwt encapsulates something similar with the following syntax:
       import jwt

       jwt.encode({'some': 'payload'}, 'secret', algorithm='HS256')

       The auth goes in the Authentication header field of the https request

We construct an HMAC token this way:
        def auth(KeyId, timestamp, token, SomeSecret):
                HMAC_KEY =  SomeSecret.decode('hex')
                CanonicalizedResource = 'timestamp=' + timestamp + '&' + 'token=' + str(token)
                StringToSign = CanonicalizedResource
                Signature = base64.b64encode(
               hmac.new(
                    HMAC_KEY,
                    StringToSign,
                    hashlib.sha256
                    ).digest()
                )
                auth = KeyId + ":" + Signature
                return auth

#codingexercise
Find the longest consecutive subsequence in a sequence

Use a hash table and add all elements in the sequence.

For every element check if it is a start of subsequence, previous one does not occur.

Count the length from each subsequence.

Return the max of the counts.

int GetLMS(List<int> digits)
{
var h = new Hashtable<int>();
for (int i = 0; i < digits.count; i++)
   if h.Contains(digits[i])
      h[digits[i]]++;
   else
      h[digits[i]] = 1;
int max = 0;
for (int i = 0; i < digits.count; i++)
{
  if (h.Contains(digits[i]-1) == false)
  {
      int count = 0;
      for (int j = digits[i]; h.contains(j); j++)
            count++;
      if (count > max)
          max = count;      
   }
}
return max;
}

Trivia question:

The question is why does net ads join fail with  ads_sasl_spnego_gensec_bind ? Hint: is this a kerberos failure.

Wednesday, October 19, 2016

Local only Automation scripts

We discussed my some automation ideas yesterday by enabling platform specific api servers .
This is not complete without mentioning that APIs don't take the emphasis away from host specific actions and should remain as instance specific as possible. As an example use case, if we want to make the settings of a virtual machine to depend on the attributes of the virtual machine, we need not fetch it from an external data source but instead rely locally on the filesystem of the virtual machine such as when we read version or release. In windows, even Powershell is required to be local only and remote execution is not turned on by default. While this is done mainly for security and automation is internal such that security is not a primary concern, it is still good practice to observe local only execution.
We cite this self reliance example as a way to minimize code which is the hallmark of any good automation. In addition, it improves self sufficiency, reduces maintenance and the complicated requirements for cross platform operations.

#codingexercise
Given an initial and final arrangement of digits of a number, count the nunber of shifts. Assume all different digits

Int getShifts(list<int> initial, list<int> final)
{
Var common = final.GetLCS(initial);
Var diff = final.except(common);
Int sum = 0;
Foreach(var digit in diff)
{
Int i = initial.indexOf(digit);
Int j = final.indexOf(digit);
sum += math.abs(i-j);
}
Return sum;
}

LCS-LENGTH(X, Y)
m = X.length
n = Y.length
initialize b[m,n] and c[m,n] as new tables
for i = 1 to m
     c[i, 0] = 0
for j = 0 to n
     c[0,j] = 0
for i = 1 to m
      for j = 1 to n
            if xi == yi
               c[i,j] = c[i -1, j-1] + 1
               b[i,j] = up-left
            elseif c[i-1,j]  >= c[i, j-1]
               c[i,j] = c[i-1, j]
               b[i,j] = up
            else c[i,j] = c[i,j-1]
               b[i,j] = left
return c and b

and we print the LCS as
PRINT-LCS(b, X, i, j)
      if i == 0 or j == 0
        return
      if b[i,j] == up-left
         PRINT-LCS(b, X, i-1, j-1)
         print xi
      elseif b[i,j] == up
        PRINT-LCS(b, X, i-1, j)
      else PRINT-LCS(b,X, i,j-1)


Tuesday, October 18, 2016

APIs on Windows:


If you are aware of just how much automation you use, you will want to see more on the horizon and the impact that you can make. If not, you are welcome to skip this for now.

On the Linux world Automation relies on shell scripts often invoked with SSH. On the Windows side, the Powershell is in the works on adding SSH support. And there is very little cross platform support but organizations have inventory and core functionalities deployed on both platforms  Therefore, more and more automation now relies on microservice APIs for programmatic and shell based access (think curl commands) to features  that are not limited to the current host.   In this regard, there are several APIs available for external services but very few for the operating system. Most of the tasks on the operating systems are done via local or remote shell scripts.

Not any more, APIs serve as  a conduit to invoking scripts locally and remote and for programmatic access anywhere. A large part of any organizations  offerings are automations that are built on microservice APIs. Today we expand that to include Runtime APIs such as those available in .Net world for access and control of the entire Windows ecosystem.

Why Windows ?

Virtually every component of Windows is available for programmatic access via APIs that can be called natively in .Net languages or Powershell scripts. However, functionality to execute it has always been limited because there is usually limited support for cross platform access. For example, running an SSH server for executing client called scripts on a windows box has not been easily available on Windows.

Not any more, we now have availability to execute automation in any .Net language on the windows world with access to such things as Active Directory, DNS, Firewall, etc. We are now able to execute  scripts against centrally managed organizational infrastructure.

What does it involve ?


It involves writing .Net code automation that can run on windows server and can be called by APIs directly from clients. This is slightly different from installing an SSH server on a Windows box to be able to listen to commands over the wire. Although that is also possible to increase the impact and reach of automation, the native API servers running on Windows Boxes can make use of SDKs from all parts of Windows ecosystem as well as execute scripts and command-lets locally on the windows box.

Why do we need our own API server when there are test automation infrastructure ?
True, there's Chef, SaltStack and many other automation infrastructures that claim to give seamless experience over both platforms - Windows and Posix family however they rely on the scripts and automations being authored and executed. Its these we want to add and increase directly in more maintainable, unit-tested, and source control governed automation assets. And yes if there is some functionality out of the box, we will prefer it but often automation is about integration and steps targeted on specific platforms. This is about Windows as invoked from Posix platforms.


 As an example : https://github.com/ravibeta/csharpexamples/tree/master/WindowsExports

#codingexercise
Take n=43592169 and k=5
1st swap: 43952169
2nd swap: 49352169
3rd swap: 94352169
4th swap: 94532169
5th swap: 95432169 :- final number.
Given a number with the final and initial arrangement of its digits, we find the ones that are out of sequence from the original and sum the number of shifts each has made to get the total number of swaps done between initial and final states.