Cluster computing

Saturday, November 1, 2014

To continue on the coding exercise, there are several other ways to find a number in the matrix like we have described. The one we described had a binary chop method to search for the number in each row of the matrix. The binary chop operation on a single row takes log N time and if there are n rows we are repeating to this to a total complexity of O(N log N) times. We make no use of the sorted order of numbers in the column in this approach and the convergence is linear to the number of rows.
We could go binary chopping across rows as well because the elements are sorted there too and iterate down the columns. That would however put us at no better benefit over the one we have already. It would be a pure chance if one were to work better than the other unless we knew which one of the two - row or column is smaller.
Another technique that does not recognize the sorted order of the numbers and instead uses the divide and conquer approach - is to partition the matrix. One example of this would be to split the matrix into two or more sub-matrices and then recursively apply the solution to each of the sub-matrices. We could easily eliminate a matrix if the number we are looking for lies outside the right bottom entry in the matrix since this is the largest per the description of the matrix.
If we choose for example a point in the center of the matrix, say mid = (row /2, col/2) and then get four sub matrices of different sizes, then we can easily exclude regions of matrices pretty quickly. By excluding submatrices we attempt to converge onto the number in the bigger matrix logarithmically. This is way better but it involves parallel and recursive computation and consequently more resources. The approach is still log N. However note the termination condition of this matrix. It is not that we can say that the bottom right number is greater than the search number and the top left number is smaller than the search number, the number can be checked along the diagonal. To the contrary, if we take the example [ 53 78
87 92 ] and looking for a number 62, we see that this is not sufficient to go diagonally. We have to exhaust the rows one after another anyways. While it is easier to exclude the matrix by looking at the top left and bottom right numbers in the matrix, the same cannot be said for when a number is bound within them. In such a case there is additional time complexity of going linearly from row to row which may take O(N) time anyways.
This leads us to a potentially faster solution by enumerating the elements of the array in sorted order until we reach the search number. In the example above, we traverse row order or read column order which ever gives us the next higher number and compare it with the previous position. For example,
if we took a matrix
23 34 45 46 68 89
27 35 47 69 97 110
32 46 48 65 98 112
then we start from the top left and progress down the next higher number as if we were snaking our way up the progression. In the above example, we progress down the number from the top left and compare the row order as well as the column order and keeping track of the starting point we return to the number adjacent to the starting point. For example we progress down from 23, 27, 32 and come back to 34 after we have exhausted column order. Similarly we start from 34 and note the next number 45 and traverse down the column to read 34, 35 until we encounter a number that is higher than 45. Notice that the front for the propagation is a diagonal which in this case is 46, 47 and 46. If we keep track of the elements coming out of the front, we can easily sort all the entries in the matrix in a linear order. Each row will contribute one number to the front and we update the front on this row as we include the numbers in our sorted list.. We therefore burn the matrix by the contents across the front and this is another way to solve the problem in O(N) time. Note that if all the numbers in the front are higher than the desired number, we stop right then instead of burning through the rest of the matrix. Thus there is some optimization possible as well
Lastly, I want to add that there is perhaps modifications possible where instead of keeping track of the front, we simply update the start and the end for each row based on the findings from the preceding row. Thus we optimize our original solution to not repeat the entirety of each row but to take triangularly smaller portions as we go down the rows one after another.

bool Exists ( int[,] matrix, int number)
{
// assuming parameter validation
// we go row wise and do a binary chop

int start = 0;
int end = matrix.GetLength(0);
for (int row = 0; row < matrix.GetLength(0); row++)
{
start = 0;
If ( BinaryChop(matrix, row, number, start, ref end))
Return true;
}
return false;
}

bool BinaryChop(int[,] matrix, int row, int number, int start, ref int end)
{
while (start < end)
{
Int mid = checked ( (start + end) / 2);
if (matrix[row, mid] == number) return true;
if (matrix[row, mid] < number)
start = mid + 1
else
end = mid - 1
}
if (matrix[row, start] == number) return true;
if (matrix[row, end] == number) return true;
return false;
}

In today's post we will continue our discussion on matrix factorization. We reviewed the method of steepest descent earlier. If we carefully note the iteration step, we were computing the cost of steepest descent with two matrix vector products - one to compute the cost of the current residual and another to compute the step length. We can eliminate one because we can write the equation in terms of the residual and not in terms of the steps. And although we have to compute the starting residual, each subsequent iteration depends only on the previous value. However the disadvantage with this translation is that we lose the feedback from the value of the current step.
At this stage it's probably helpful to review eigenvector. An eigenvector is a non-zero vector that does not rotate when B is applied to it (except perhaps to point in precisely the opposite direction). Since there is scaling and not skewing, we can take the effect of eigenvector as some scalar constant applied to B. This scalar constant denoted by Lambda is called the eigenvalue of B.
As we will see, this is now relevant to our iterations. In each iteration, we depend on applying B to a vector over and over again which either grows it or shrinks it as we proceed along the search line. In particular if we use an eigenvalue of -0.5, the vector converges to zero which we found useful in our zigzag path.
Moreover, if B is symmetric, then there exists a set of n linearly independent eigenvectors of B, denoted by v1, v2, .. vn. This set is not unique and each of them has a corresponding eigenvalue.
Note that the vector applied to B need not be an eigenvector. But we can take a set of n eigenvectors to form the basis and any n-dimensional vector can be expressed as a linear combination of these eigenvectors. If B were to be unsymmetric, then finding a set of n independent eigenvectors is difficult. These matrices are known as defective and we won't speculate why they are called so. The eigenvalues of a positive-definite matrix are all positive.

#codingexercise
Given a symmetric matrix of increasing numbers from left to right and top to bottom, find the presence of a number
// there are many ways, i just want to find something easy
bool Exists ( int[,] matrix, int number)
{
// assuming parameter validation
// we go row wise and do a binary chop
for (int row = 0; row < matrix.GetLength(0); row++)
{
If (BinaryChop(matrix, row, number)) return true;
}
return false;
}

bool BinaryChop(int[,] matrix, int row, int number)
{
int start = 0;
int end = matrix.GetLength(0);

while (start < end)
{
Int mid = checked ( (start + end) / 2);
if (matrix[row, mid] == number) return true;
if (matrix[row, mid] < number)
start = mid + 1
else
end = mid - 1
}
if (matrix[row, start] == number) return true;
if (matrix[row, end] == number) return true;
return false;
}

Friday, October 31, 2014

In my previous post, I mentioned an example of LDAP login. There are quite a few open source OpenID, OAuth packages as well. I was looking at one in particular called passport-oauth which is written in Node.js. As we know OAuth 2.0 is used for authentication and authorization with the use of scope for resource access granularity. This package eases authorization by making it provider specific instead of using the OAuth framework directly. It's interface is cleaner to use in that it uses a provider and OAuth strategy and returns a token.
var passport = require('passport') , OAuthStrategy = require('passport-oauth').OAuthStrategy; passport.use('provider', new OAuthStrategy({ requestTokenURL: 'https://www.provider.com/oauth/request_token', accessTokenURL: 'https://www.provider.com/oauth/access_token', userAuthorizationURL: 'https://www.provider.com/oauth/authorize', consumerKey: '123-456-789', consumerSecret: 'shhh-its-a-secret' callbackURL: 'https://www.example.com/auth/provider/callback' }, function(token, tokenSecret, profile, done) { User.findOrCreate(..., function(err, user) { done(err, user); }); } ));

Note that the provider need not be a third party referral site, it can be a corporate OTP provider. However the service implementation could unify both LDAP and OTP so that the auth is in one place
This will simply be
Def login (user, password): # ldap
Def login (user, token): # OTP
Now let's look at the OTP provider. That grants a token as well. It takes a request and generates a secret. The secret along with a PIN is used as the credential to be validated. The site's request and response is important here for the service implementation to decide what to look up.
The request initiated by the service provider to the identity provider and the SAML contains both information the type of request and the callback.
as in https://openam.example.com:8443/opensso/SSORedirect/metaAlias/internal/externalidpv2?SAMLRequest=&relaystate=<callbackURL>
The SAMLRequest is base64 encoded SAML message usually including both a <saml:AuthnStatement> which describes the act of authentication at the identity provider and a <saml:AttributeStatement> which asserts a multivalued attribute associated with this principal (aka subject)
The same could perhaps be done via OAuth using password grant
curl -i -k -X POST -u user:password -d {"grant_type"="password", "username="whoisthis", "password"="OTPCode", "scope"="what"}
https://openam.example.com:8443/openam/oauth2/access_token
{
"expires_in": 599,
"token_type": "Bearer",
"refresh_token": "f6dcf133-f00b-4943-a8d4-ee939fc1bf29",
"access_token": "f9063e26-3a29-41ec-86de-1d0d68aa85e9"
}

$ curl https://openam.example.com:8443/openam/oauth2/tokeninfo\
?access_token=f9063e26-3a29-41ec-86de-1d0d68aa85e9
{
"mail": "demo@example.com",
"scope": [
"mail",
"cn"
],
"cn": "demo",
"realm": "/",
"token_type": "Bearer",
"expires_in": 577,
"access_token": "f9063e26-3a29-41ec-86de-1d0d68aa85e9”

}

Thursday, October 30, 2014

We continue our discussion on conjugate gradient methods. First we will discuss the steepest descent method because in our description earlier, we hinted towards excluding certain descent but we were not clear about the choice and the step length. When a matrix is positive definite, the function f(x) is a paraboloid. We want to reduce the function. In the method of the steepest descent, we start at an arbitrary point x0 and slide down to the bottom of the paraboloid. We take a series of steps by choosing a direction and a step length. We choose the direction in which f decreases most quickly, which is the direction opposite f'(x) - the gradient. When we take this step, there will be an error which is the distance from the solution. If we convert this quantity into a vector with the matrix, then this vector called the residual gives the direction of the steepest descent.
Next we choose the step length This is where we see an interesting observation. We want to choose the step length that minimizes f along a line. We are restricted to choosing a point on the intersection of the vertical plane and the paraboloid. We choose a point based on new = old + step-length-alpha scaled residual. We can pick a residual but we don't know what step length to take on it. At the bottom of the paraboloid, the gradient is zero and the function cannot be minimized any more. So the step length is zero and the iteration terminates. The step length minimizes the function when the directional derivative on this search line is equal to zero. The directional derivative can be expressed in terms of the gradient transpose and the residual taken together. Setting it to zero which is saying their dot product is zero means that the two vectors are orthogonal. As we progress down the search line the rate of increase of f are the projections of the gradient onto the search line and the projection is zero at the minimum. We can conveniently pick the next direction as the orthogonal at the minimum along a search line. Thus we find a zig zag path during the iteration which appears because each gradient is orthogonal to the previous gradient.

#codingexercise:

LDAP is the triangle in which we want to put all devices and applications in. How do we authenticate a user using LDAP?

def load_user(id):
ld = ldap.initialize('ldap://localhost:1389')
name = id
ld.simple_bind_s()
basedn = "ou=people,dc=example,dc=com"
filter = "(|(cn=\*" + name + "\*)(sn=\*" + name + "\*))"
results = ld.search_s(basedn, ldap.SCOPE_SUBTREE, filter)
import io
from contextlib import redirect_stdout
with io.StringIO() as buf, redirect_stdout(buf):
ldif_writer = ldif.LDIFWriter(buf)
for dn, entry in results:
ldif_writer.unparse(dn, entry)
output = buf.getvalue()
uname=""
uid=""
if id in output:
for line in output.splitlines(True):
if "dn:" in line:
kvp = line.split().split(',')
uid = str([i.replace("uid=", "") for i in kvp if 'uid=' in i])
if "cn:" in line:
uname = line.replace("cn: ", "")
if uname and uid:
return Users(uname, uid, True)
ld.unbind_s()
return Users('Anonymous', 'unknown', False)

Wednesday, October 29, 2014

In today's post we discuss conjugate gradient methods. This is a very popular method for solving systems of the form Ax = b where x is an unknown vector, b is a known vector and A is a known square matrix. Iterative methods like CG are suited for use with sparse matrices. If A is dense, then it is better factored and then the equation solved by back substitution. This is because the factorization and iteration are somewhat same cost for dense matrices. On the other hand, factorization of sparse matrices is much more expensive. And so is the back-solving step which involves trying with different values of b.
Equations of the form Ax = b are linear so we can solve for their point of intersection in n hyperplanes where each has dimension n-1. The equation can also be written in a quadratic form. which we can view as a surface whose minimum point of this surface is the solution to Ax = b. When the quadratic form is expressed in contours, they are concentric ellipsoidal curves each of which has a constant f(x). The gradient of the quadratic form points in the direction of the steepest increase of f(x), and is orthogonal to the contour lines. Gradients come in useful when they are set to zero in which case the function is minimized.
If we wonder why we transform a simpler looking equation Ax = b into the tougher looking quadratic form, then its because the methods we are looking at were developed and intuitively understood in terms of the minimization problems and not in terms of intersecting hyperplanes.
It also helps to think of these problems as Eigenvectors and Eigenvalues because they serve as analysis tools. The steepest descent and CG methods don't calculate eignevector as part of their algorithms. We might recall Eigenvector v from previous posts as a nonzero vector that does not rotate when B is applied to it, except perhaps to point it in precisely the opposite direction. v may change length or reverse its direction but it won't turn sideways. In other words there is some scalar constant lambda such that Bv = lambda v and this lambda is the eigen value.

Courtesy: Shewchuk 1994 An introduction to conjugate gradient method without the agonizing pain.

Tuesday, October 28, 2014

In today's post I will return to the matrix factorization and optimization techniques. We were discussing Newton's method. This is a method that finds matrix factors by iterations using Taylor series and converging near the roots. Keeping the Taylor series to the first order, this method finds the tangent line to the curve (x, f(x)) at xo where the tangent line intersects the x-axis.
Assuming that the xk+1 converges towards the x, we can define the error in the k-th step. which when we expand, we see that the method converges quadratically. We will next discuss line search and trust region which we use in the computation of the incremental stuff. In line search, we pick a direction and then decide how far to move along that direction. 1
Each iteration of the line search is computed this way:
Xk+1 = Xk + alpha-k Pk
Where the incremental step depends both on Xk and alpha-k. Generally that is chosen which makes the descent to reduce the function. The search is done in two phases the bracketing phase where a range is found to bound the step length and an extrapolation phase where a good one is picked within the range. The iteration terminates when there is said to be sufficient reduction of the function.

The trust region is more robust. It recognizes that the approximation model chosen to compute the next iteration is only applicable locally. Therefore there is a trust region in which the model applies. The line search could be an approximation model too. As the model is revised for each iteration, the trusted region should reduce as the function converges. This is the termination condition for the iteration.

The trust region methods are considered newer as compared to the line search methods. The difficulty with the line search methods was that there is not a sufficient reduction in f at each step. The condition we maintain is that the reduction in f should be proportional to both the step length alpha-k and the directional derivative. If the reduction in f denoted by phi decreases, increases consecutively and the second decrease dips below zero, the step length and directional derivative part is a straight line through this curve. which separates the graph into three regions - the first and the last portion where the actual reduction is less than the computed and is therefore acceptable and the middle portion which isn't. Thus the sufficient decrease condition is not enough by itself to ensure that the algorithm makes reasonable progress, because we see that it is satisfied for all small values of step-length where there is a steep descent.
To rule out unacceptably short steps, a second requirement was introduced, called the curvature condition where we check for the slope of the actual reduction at step length is a constant times the initial slope. In other words, if the slope is strongly negative, we have an indication that we can reduce f significantly by moving further along the chosen direction. So we choose a desired slope and we don't accept those that are too low or when the reduction is small or even positive. This helps us get a better progress with very small skips. Together the sufficient decrease and curvature conditions are known collectively as Wolfe conditions.

For sparse matrices, conjugate gradient methods are better suited. we will discuss this next.

Monday, October 27, 2014

In today's post I want to discuss a topic different from the previous few posts and perhaps one time only.
This is the support for windows file system like ACLs using row level security in database
Let us say resources are saved in a table where we want to specify an access control list for this resource so that it can be validated against policies for different access. We will leave out the case where the resources can be hierarchical.
In this case, we want to add a column to the resource that describes this control. It could be a user name or it could be a label or a foreign key to a set of markings that represent the sensitivity or the permissions of the data.
Keeping track of usernames, for example, is helpful in the case where all accesses are directly isolated per user. In other words users don't have to share the ownership of the folders and that their accesses are essentially to manage the lifetime of their folders
We can also create a label for each resource if we know that the labels are distinct and symbolic. In most cases where the operations are predefined but their actors are not, we can use different labels. For example, we can say whether the resource is read-only or read-write.
However users and their access are both dynamic and they can be hierarchical. In this case, hierarchical means that entries can inherit top down or gain incremental privileges bottom up. In addition, a user may grant or revoke permission, take ownership. etc.
In such a case, its better to view the labels as an output resulting from a schema of categories and markings.
The categories and markings are predefined in a system. The categories represent a domain of markings. This can be a classification that is hierarchical or compartment that are distinct, they can be one-to-one or zero-to-many in how many values can be applied, and may have a comparison rule - such as any or all. The markings are the compartment or classification values we would like to division the resources in. Resources can belong to one or more of these. They can have a name.
A particular combination of markings could translate to a unique label for the resource. Usually this table is populated on demand by say a stored procedure. If there is no corresponding row to the combination of markings, then a new row is added and a unique ID is returned as the label.
We join the label and marking in a LabelMarking table to facilitate the translation of the labels.
Having described row level security, let us now look at populating it for file system ACLs.
Here we begin by listing markings such as full control, modify, readwrite, read, read and execute. A combination of these markings can make up a permission ( we select those that are granted)