Thursday, October 20, 2016

Today we discuss a specific automation issue. This pertains to authentication:

We explain a difference between a JWT digest token and an HMAC hash:

We construct a JWT token this way:

def data_to_sign(endpoint, environment, owner):

banner = '{"alg":"SHA256withRSA","kid":"' + environment + '"}'

banner = base64.b64encode(banner)

start = int(datetime.datetime.now().strftime("%s"))*1000

end = datetime.datetime.now() + datetime.timedelta(seconds=300)

end = int(end.strftime("%s"))*1000

expiration = '{"iss":"' + environment + '","nbf":' + str(start) + ',"sub":"' + owner + '","exp":' + str(end) + ',"aud":"' + endpoint + '"}'

expiration = base64.b64encode(expiration)

data_to_sign = banner + "." + expiration

return data_to_sign

def signature(endpoint, environment):

data = data_to_sign(endpoint, environment)

unpacked = data.split('.')

unpacked[0] = base64.b64decode(unpacked[0])

unpacked[1] = base64.b64decode(unpacked[1])

unpacked = unpacked[0]+"."+unpacked[1]

with open('./pem/file.txt', 'w') as f:

f.write(unpacked)

cmd = 'openssl dgst -sha256 -sign ./pem/privateKey.key -out ./pem/signature.sign ./pem/file.txt'

cmd_execute(cmd.split())

digest = ''

with open('./pem/signature.sign', 'rb') as f:

digest = base64.b64encode(f.read())

cmd = 'openssl dgst -sha256 -verify ./pem/privateKey.pub -signature ./pem/signature.sign ./pem/file.txt'

output = self.cmd_execute(cmd.split())

# output says 'Verified'

auth = data_to_sign + "." + digest

return auth

PyJwt encapsulates something similar with the following syntax:
import jwt

jwt.encode({'some': 'payload'}, 'secret', algorithm='HS256')

The auth goes in the Authentication header field of the https request

We construct an HMAC token this way:
def auth(KeyId, timestamp, token, SomeSecret):

HMAC_KEY = SomeSecret.decode('hex')

CanonicalizedResource = 'timestamp=' + timestamp + '&' + 'token=' + str(token)

StringToSign = CanonicalizedResource

Signature = base64.b64encode(

hmac.new(

HMAC_KEY,

StringToSign,

hashlib.sha256

).digest()

)

auth = KeyId + ":" + Signature

return auth

#codingexercise
Find the longest consecutive subsequence in a sequence

Use a hash table and add all elements in the sequence.

For every element check if it is a start of subsequence, previous one does not occur.

Count the length from each subsequence.

Return the max of the counts.

int GetLMS(List<int> digits)
{
var h = new Hashtable<int>();
for (int i = 0; i < digits.count; i++)
if h.Contains(digits[i])
h[digits[i]]++;
else
h[digits[i]] = 1;
int max = 0;
for (int i = 0; i < digits.count; i++)
{
if (h.Contains(digits[i]-1) == false)
{
int count = 0;
for (int j = digits[i]; h.contains(j); j++)
count++;
if (count > max)
max = count;
}
}
return max;
}

Trivia question:

The question is why does net ads join fail with ads_sasl_spnego_gensec_bind ? Hint: is this a kerberos failure.

Wednesday, October 19, 2016

Local only Automation scripts

We discussed my some automation ideas yesterday by enabling platform specific api servers .
This is not complete without mentioning that APIs don't take the emphasis away from host specific actions and should remain as instance specific as possible. As an example use case, if we want to make the settings of a virtual machine to depend on the attributes of the virtual machine, we need not fetch it from an external data source but instead rely locally on the filesystem of the virtual machine such as when we read version or release. In windows, even Powershell is required to be local only and remote execution is not turned on by default. While this is done mainly for security and automation is internal such that security is not a primary concern, it is still good practice to observe local only execution.
We cite this self reliance example as a way to minimize code which is the hallmark of any good automation. In addition, it improves self sufficiency, reduces maintenance and the complicated requirements for cross platform operations.

#codingexercise
Given an initial and final arrangement of digits of a number, count the nunber of shifts. Assume all different digits

Int getShifts(list<int> initial, list<int> final)
{
Var common = final.GetLCS(initial);
Var diff = final.except(common);
Int sum = 0;
Foreach(var digit in diff)
{
Int i = initial.indexOf(digit);
Int j = final.indexOf(digit);
sum += math.abs(i-j);
}
Return sum;
}

LCS-LENGTH(X, Y)
m = X.length
n = Y.length
initialize b[m,n] and c[m,n] as new tables
for i = 1 to m
c[i, 0] = 0
for j = 0 to n
c[0,j] = 0
for i = 1 to m
for j = 1 to n
if xi == yi
c[i,j] = c[i -1, j-1] + 1
b[i,j] = up-left
elseif c[i-1,j] >= c[i, j-1]
c[i,j] = c[i-1, j]
b[i,j] = up
else c[i,j] = c[i,j-1]
b[i,j] = left
return c and b

and we print the LCS as
PRINT-LCS(b, X, i, j)
if i == 0 or j == 0
return
if b[i,j] == up-left
PRINT-LCS(b, X, i-1, j-1)
print xi
elseif b[i,j] == up
PRINT-LCS(b, X, i-1, j)
else PRINT-LCS(b,X, i,j-1)

Tuesday, October 18, 2016

APIs on Windows:

Why does this matter to you ?

If you are aware of just how much automation you use, you will want to see more on the horizon and the impact that you can make. If not, you are welcome to skip this for now.

On the Linux world Automation relies on shell scripts often invoked with SSH. On the Windows side, the Powershell is in the works on adding SSH support. And there is very little cross platform support but organizations have inventory and core functionalities deployed on both platforms Therefore, more and more automation now relies on microservice APIs for programmatic and shell based access (think curl commands) to features that are not limited to the current host. In this regard, there are several APIs available for external services but very few for the operating system. Most of the tasks on the operating systems are done via local or remote shell scripts.

Not any more, APIs serve as a conduit to invoking scripts locally and remote and for programmatic access anywhere. A large part of any organizations offerings are automations that are built on microservice APIs. Today we expand that to include Runtime APIs such as those available in .Net world for access and control of the entire Windows ecosystem.

Why Windows ?

Virtually every component of Windows is available for programmatic access via APIs that can be called natively in .Net languages or Powershell scripts. However, functionality to execute it has always been limited because there is usually limited support for cross platform access. For example, running an SSH server for executing client called scripts on a windows box has not been easily available on Windows.

Not any more, we now have availability to execute automation in any .Net language on the windows world with access to such things as Active Directory, DNS, Firewall, etc. We are now able to execute scripts against centrally managed organizational infrastructure.

What does it involve ?

It involves writing .Net code automation that can run on windows server and can be called by APIs directly from clients. This is slightly different from installing an SSH server on a Windows box to be able to listen to commands over the wire. Although that is also possible to increase the impact and reach of automation, the native API servers running on Windows Boxes can make use of SDKs from all parts of Windows ecosystem as well as execute scripts and command-lets locally on the windows box.

Why do we need our own API server when there are test automation infrastructure ?

True, there's Chef, SaltStack and many other automation infrastructures that claim to give seamless experience over both platforms - Windows and Posix family however they rely on the scripts and automations being authored and executed. Its these we want to add and increase directly in more maintainable, unit-tested, and source control governed automation assets. And yes if there is some functionality out of the box, we will prefer it but often automation is about integration and steps targeted on specific platforms. This is about Windows as invoked from Posix platforms.

As an example : https://github.com/ravibeta/csharpexamples/tree/master/WindowsExports

#codingexercise
Take n=43592169 and k=5
1st swap: 43952169
2nd swap: 49352169
3rd swap: 94352169
4th swap: 94532169
5th swap: 95432169 :- final number.
Given a number with the final and initial arrangement of its digits, we find the ones that are out of sequence from the original and sum the number of shifts each has made to get the total number of swaps done between initial and final states.

Monday, October 17, 2016

#codingexercise
Detect number of three-node cycles in graph
We use depth first search to traverse the graph and generate paths.
DFS ( V, E, paths)

For each vertex v in V

V.color=white

V.d = nil

Time = 0

For each vertex v in V:

If v.color == white:

path = new Path();
DFS-Visit (V, E, v, path, paths)

DFS-VISIT (V,E, u, path, ref paths)

time = time + 1

u.d = time

u.color = gray

path.Add(u);
foreach vertex v adjacent to u

If v.color == white

DFS-VISIT(V,E,v)

Else

If v.d <= u.d < u.f <= v.f :
paths.Add(path);
u.color = black

time = time + 1

u.f = time

filter(paths):

var d = new Dictionary<string, int>();

foreach path in paths:

foreach vertex v in path:

if v.next and v.next.next and v.next.next.next and v.next.next.next == v:

pathidentifier = stringify(v,3)

if d.contains(pathidentifier) == false:

d.Add(pathidentifier, 1);

Find the greatest number we can get by rearranging the digits of a number n involving k shifts only as shown below:
Take n=43592169 and k=5
1st swap: 43952169
2nd swap: 49352169
3rd swap: 94352169
4th swap: 94532169
5th swap: 95432169 :- final number.

int GetMaxWithKShifts(int n, int k)
{
var digits = n.todigits();
int rem = k;
for (int i = 0; i < digits.Length && rem > 0; i++)
{

int index = GetMaxDigitIndexNext(digits, i);

if (index == i || digits[i] == digits[index]) continue;

if (index-i <= rem){

Shift(digits, i, index);

rem -= index - i;

}

return digits.ToInt();

}

Alternatively the maximum number may be obtained recursively by considering next higher element. The result for the above and below could vary based on which rule to follow:
void GetMaxWithKSwaps(int k, ref digits, ref candidate)
{

if (k == 0)

return;

for (int i = 0; i < digits.length - 1; i++)

{

for (int j = i+1; j < digits.length; j++)

{

if (digits[i] < digits[j])

{

Swap(digits, i, j);

if (digits.compare(candidate) > 0)

candidate = digits;

GetMaxWithKSwaps(int k, ref digits, ref candidate);

Swap(digits, i, j);

}

}

Or we could enumerate all arrangements

Void Permute (int k, String digits, StringBuilder candidate, bool[] used)

{

If ( b.Length == a.Length) {

if (digits.compare(candidate) > 0 && CountSwaps(digits, candidate) == k)

Console.Write(candidate.ToString();

return;

}

For (int I = 0; I < digits.Length ; i++)

{

If (used[i]) continue;

used[i] = true;

candidate += digits[i];

Permute(digits, candidate, used);

candidate [candidate.Length – 1] = ‘/0’;

used[i] = false;

}

}

Also given an original sequence and a final sequence, it is possible to say how many swaps are required.

For example, we can say 43592169 and 95432169 has five swaps because

There are four positions out of order and we can get two right with two swaps

Sunday, October 16, 2016

We have seen some generalized discussion about what random fields are, their benefits, their representations, the objective functions and their optimizations including techniques such as CG method which ties in things we have previously studied for the purposes of text analysis. We will now see how to apply it. At this point, we are merely forming hypothesis. We realize that the PMI matrix helped find the word relationships with word vectors. We said that the semantic network embedded within the word-space can be used to find candidates for positive and negative sampling. We said that the log exponential form of the PMI matrix defines the undirected graphical model. We said we could replace this model with feature functions where features are defined in terms of both transitions from previous state to current state and observations for current state. Then we said we could generalize this with parameter vectors and minimize the objective function with Conjugate Gradient method.
As we move from an undirected graphical model, to linear chain CRF, let us take a look at reworking not just the PMI matrix but also the semantic embedding. First we modify the recommendations for creating the semantic embedding in the word space discussed earlier from the paper by Johannson and Pina by using linear chain CRF and optimization instead of the steepest gradient descent that they perform. We do this by introducing a set of weights which equal 1 for a given state and zero otherwise. In other words, we define feature functions The reason we choose this problem to apply the CRF is that it has a direct minimization problem which we attempt to solve with CRF instead. There we were using sense vectors from a choice of hypernyms and hyponyms and we applied distance function to compute similarities between senses. The sense vectors were similar to the word vectors except that they drew their features from the thesaurus. Now we use the same features as states for the underlying sequence model and say that the words are based on some sequence of states where the previous state determines the current state and the current state has independent observations. We could do this merely by associating each word with their state. By drawing a sequence model, although less than perfect from the generative directed models, we are able to form the CRF directly. This just serves as an example to illustrate how to apply the sequence model directly. The joint distribution that we used to form the sense vectors in the first place is taken as the log probability So p(y)p(y/x) becomes lambda-y + sum lambda-y. x where the first term is the bias weight and the second term is the feature weight. The two together when raised exponentially and normalized becomes the conditional distribution. Therefore it is possible to rewrite the joint distributions in the form of bias and feature weights after which we already know how to proceed by defining feature functions that are non-zero for a single class label, representing an objective function and applying Conjugate Gradient method.

The Steps for introducing semantic CRF are being considered as follows:
1) Generate the sense embedding network in the same word space
2) Generate a PMI word-context matrix that has both positive and negative samples. Both the samples should include semantic content such as hyponyms and hypernyms.
3) We establish a conditional distribution from the top PMIs and the top senses.
4) Then we maximize the conditional log likelihood of this distribution.

Saturday, October 15, 2016

We were discussing parameter vectors with conditional random fields. A random field is a particular distribution in a family of distributions. It it discriminative which means it models the conditional distributions directly which avoids the dependencies of the observations.This makes it less prone to violations from the assumptions of independence between the features.A CRF is written in the form of feature functions. Each feature function has a feature for state transition and a feature for state observation. Feature functions are not limited to indicator functions where a set of weights take the value one for a class instance and zero otherwise. We could exploit other richer features of the input and replace the weights with parameter vectors. The parameter vectors are estimated with the same sum and log notations as we have seen for what is called penalized maximum likelihood. To do this, we take the log likelihood which is a conditional probability of the prediction summed over each of the input. However we showed the CRF as an improvement over the conditional probabilities, so we can write the log likelihood in terms of the state transition and state observation real valued feature functions and parameter vector. To avoid overfitting when the weight vector norms is too large, we penalize large weights and this is called regularizing it. A regularization parameter determines the strength of the penalty. The choice of this parameter can ideally be found by iterating over the range of parameters. In practice, however, typically only a few are tried out by varying the regularization parameter by a factor of 10. There can be other choice of regularization parameters but this kind of tuning is very well known in statistics and therefore applied directly.
The conditional log likelihood thus formed can be maximized by taking partial derivatives. In this case, it results in three components. The first term is the expected value of the feature function when the distribution is observed. An expected value is the sum of all possible values of a variable each multiplied by the probability of its occurrence. In this case we use product of feature weights and bias weights. directly for an average. The second term is the derivative of the log of the normalization constant which sums over all the possible state sequences.This second term is the expectation of the feature function under the model distribution, not the observed one. The optimum solution is when the gradient is zero and the two expectations are equal. This pleasing interpretation is a standard result about maximum likelihood. The third term is merely a regularization of the parameter vectors and can therefore be treated as insignificant. A number of optimization techniques can be chosen to perform this optimization. Because the original objective function we started out with is a logarithm of the exponentials, it is concave which facilitates these techniques. The simplest one would be the steepest ascent but it would take too many iterations. The difficulty is that there are millions of parameters so computation steps are expensive. As a result, current techniques now involve approximations for interim steps such as in quasi-Newton method. Conjugate Gradient method also performs approximations and are suited for these kind of probems.
So far we have seen some pretty generalized discussion about what random fields, their benefits, their representations, the objective functions and their optimizations including techniques such as CG method which ties in things we have previously studied for the purposes of text analysis.

Friday, October 14, 2016

Today we continue discussing the conditional random field. A random field is a particular distribution in a family of distributions. We were discussing the discriminative and generative modeling.By modeling the conditional distributions directly, we avoid the dependencies of the observations. This makes the discriminative modeling less prone to violations from the assumptions of independence between the features.The conditional random fields are discriminative. A CRF is written in the form of feature functions. Each feature function has a feature for state transition and a feature for state observation. In an undirected graphical model, we had an exponential form of vectors. Instead we now use a set of weights that are shared across all the class. When they take a non zero value for a single class and zero otherwise, they become feature functions. We have now expressed the exponentials in the form of feature functions.
In the semantic network, this applies directly. We model the conditional probability p(c|w;theta) using p(c|w, theta) = exp(vector_for_c.vector_for_w) divided by sum of all such exponentials. where c is the semantic class and w is the word from the text or vocabulary encountered. This is an undirected graphical model and we can refine it with feature functions just the same way. When we express it in terms of feature functions, we are using it to articulate the current word's identity. But we are not restricted to using indicator functions. We could exploit other richer features of the input, such as prefixes and suffixes, context surrounding the word, and even named entities. This leads to the general definition of linear chain CRFs where the set of weights are replaced by parameter vectors.
In CRF, we allow the score of the tranisititon (i,j) to depend on the current observation vector by adding a feature that chains the transition and the observations and this is commonly used in text applications. Note that in interations involving time steps, the observation argument is written as a vector xt which contains all the components of all global observations x that are needed for computing features at time t. If the CRF uses the next word as a feature, then the vector for the last word would have included the identity of the next word.

The parameter vectors are estimated the same way as we would any conditional distributions. We take the sequence of inputs and the sequence of predictions and estimate using the penalized maximum likelihood which is a way of saying we take the sum of the log of the conditional distribution of the prediction to the input.

The independence of each sequence is something we can relax later.

Courtesy : Sutton and McCallum
#codingexercise
Int ResetKthBitFromLastInBinaryRepresentation(int n, int k)

{

String binary;

ToBinary(n, ref binary);

Assert(binary.count > n);

If (n-k >= 0 && binary[n-k] == 0)

binary[n-k] = 1;

Int num;

ToInt(binary, ref num, 0);

Return num;

}