Cluster computing: November 2014

Saturday, November 29, 2014

Today we talk about timestamps in http header values. Caching mechanisms for example are based on header fields that carry absolute timestamp values. We review the possible errors in using this field from an old WRL research report.
First of this is date skew. When compared with the NTP server with an accuracy up to milliseconds, the date headers often show that the clocks from the originating servers are not synchronized.
Second is impossible last modified value. The last-modified response header gives the timestamp at which the resource was most recently modified. A cache can validate a stored response by asking the server using the if-modified-since request header. This value therefore should not be any newer than the response's Date value but many often are not.
Third is the apparent coherency failures in using Last-Modified. The use of this field implies that the server responding to the URL is caching and that the contents of the URL has not changed. When this is not the case, we have a coherency failure. Many of the servers seem to demonstrate this anomaly.
Fourth is the effect of the limited timestamp resolution. The Last modified header has a resolution of 1 second. Therefore even a correctly implemented server that changes its content more than once in a second may have inaccurate timestamps.
Fifth is the effect of apparent premature expirations. An origin server may assign an expiration time after which the cache has to revalidate a stored response. Servers usually assign expiration times heuristically, guessing how long it might be before the resource is changed. By comparing the maximum age value, one can tell whether a cached response expires before the response value changes.
Sixth is the over optimistic expirations. When we compared a conservative expiration time effect above, we could also complement a similar caveat with generous expiration times that may cause incorrect caching.
To summarize, server operators may need to keep their clocks adjusted and Caches have to be consistent and correct in their use of timestamps.

#codingexercise
print time in RFC 1123 format
from email.utils import formatdate
print formatdate(timeval=None, localtime=False, usegmt=True)
Sun, 30 Nov 2014 16:43:49 GMT

generate random alphanumeric
import random
print ''.join([random.choice(string.digits + string.ascii_uppercase) for i in range (0,12)])
D8KCQ5WT6TLN

How do you package for debian?

cd apisecurity-1.0
dh_make --native
echo "data/* usr/share/apisecurity/" > debian/apisecurity.install
# edit rules, menu, copyright, changelog

gpg --list-secret-keys

dpkg-buildpackage -us -uc

debsign -k 1FB5E9A2 apisecurity_1.0_amd64.changes

dupload apisecurity_1.0_amd64.changes

#codingexercise
Int[] GetDistinctRange(int [] A)
{
if (A == null) return null;
return A.DistinctRange();
}

After the previous post on TOTP tokens just now, let us get back to our discussion on CG method.
We were discussing the non-linear conjugate gradient method. CG could be used to find the minimum point of a quadratic form. It could also be used to minimize any continuous function for which the gradient can be computed. If we choose the residual as a negative gradient, we overcome the limitation that we cannot use a recursive formula for the residual. With the residual, we can use Gram Schmidt conjugation to find the search directions. This is the same as in the linear search. We find the step-length which is different from the linear case. This is difficult because we cannot do a line search. We have to approximate it by finding the minimum point of a quadratic form. We use the linear expression for the step length but we minimize the given function at the next step. Next to find the constants, we could use any method that finds the zero of the transpose of gradient at next step applied to the current search direction. Nonlinear CG can be preconditioned by choosing a pre-conditioner M that approximates the second derivative and has the property that M-inverse applied to residual is easy to compute. In the linear search, with the pre-conditioner, we could transform the quadratic form so that it is similar to a sphere. In a non-linear CG pre-conditioner performs this transformation for a region near the current position. We could use the diagonal as the pre-conditioner if the second derivative is too expensive to compute. However, this only works if we are close to the current position. Moreover, a pre-conditioner must be positive definite. Non-positive diagonal elements cannot be allowed. A conservative solution is to not pre-condition when the second derivative is not positive-definite.

#codingexercise
how do you minify and obfuscate the code with python ?
perhaps use pyminifier --obfuscate --pyz=/tmp/myprogram.pyz /tmp/program.py
or use BitBoost to fight reverse-engineering
Now with the previous code for TOTP tokens, we can have a script that can be distributed for each API caller to generate TOTP tokens for themselves.

#codingexercise
Convert hex to string:
'AD829C89031FD281'.decode('hex')

'\xad\x82\x9c\x89\x03\x1f\xd2\x81

Friday, November 28, 2014

In this post, I continue to talk about the Ruby on Rails MVC. We will look at a few examples:
Here's the view to list all shares:

<h1>Listing shares</h1>

<%= link_to 'New share', new_share_path %>

<table>

  <tr>

    <th>Id</th>

    <th>Path</th>

    <th>Rowguid</th>

    <th>Action</th>

  </tr>

  <% @shares.each do |share| %>

    <tr>

      <td><%= share['id'] %></td>

      <td><%= share['path'] %></td>

      <td><%= share['rowguid'] %></td>

      <td><%= link_to 'Destroy', share_path(share),

                    method: :delete, data: { confirm: 'Are you sure?' }

      %></td>

    </tr>

  <% end %>

</table>

and here is a way to add new share:

<%= form_for :share, url: shares_path do |f| %>

<p>

    <%= f.label :sharepath %><br>

    <%= f.text_area :sharepath %>

  </p>

<p>

    <%= f.submit %>

  </p>

<% end %>
The form submits the data to the controller as parameters.

The data forwarded from the controller to the view is usually a model instance or an object. If the object is a dictionary the elements are accessed by their keys. For model instances, they are accessed by members.
#codingexercise
Int GetDistinctMode(int [] A)
{
if (A == null) return 0;
return A.DistinctMode();
}

class OneTimePasswordAlgorithm:
       doubleDigits = [ 0, 2, 4, 6, 8, 1, 3, 5, 7, 9 ]
       def calcChecksum(self, num, digits):
           doubleDigit = True
           total = 0
           for i in range(len(digits)-1) :
               digits -= 1
               digit = num % 10
               num /= 10
               if doubleDigit:
                   digit = self.doubleDigits[digit]
               total += digit
               doubleDigit = not doubleDigit
           result = total % 10
           if result > 0:
               result = 10 - result
           return result

DIGITS_POWER = [1,10,100,1000,10000,100000,1000000,10000000,100000000 ]

def generateTOTP(self, key, time, returnDigits, crypto):

codeDigits = int(returnDigits)

result = ''

#Using the counter

#First 8 bytes are for the movingFactor

#Compliant with base RFC 4226 (HOTP)

while (len(time) < 16 ):

time = "0" + time

#Get the HEX in a Byte[]

msg = bytearray(time.decode("hex"))

k = bytearray(key.decode("hex"))

import hmac

hash = bytearray(hmac.new(k,msg,crypto).hexdigest())

#put selected bytes into result int

offset = int(int(hash[len(hash) - 1]) & 0xf)

binary = ((hash[offset] & 0x7f) << 24) | ((hash[offset + 1] & 0xff) << 16) | ((hash[offset + 2] & 0xff) << 8) | (hash[offset + 3] & 0xff)

otp = int(binary % self.DIGITS_POWER[codeDigits])

result = str(otp);

while (len(result) < codeDigits):

result = "0" + result

return result

>>> otp = OneTimePasswordAlgorithm()

>>> import hashlib

>>> otp.generateTOTP('90AB891C', '123465539', '6', hashlib.sha256)

'857652'

>>>

Thursday, November 27, 2014

In this post I cover a simple web application written in Ruby on Rails which provides an MVC framework. The rails framework lays out the views, models, and controllers by the name we define.
In the example below, I use it for webAPI based application and I don't use a database. I generate the controller with 'rails generate shares index' and this lays out the model view controller files in the appropriate folders to be looked by the naming convention.

The controller itself looks like the following:

require_relative 'webapi'
# defines a resource for a file share

class SharesController < ApplicationController

def new
render json: Share.create(params[:path])
end

def destroy
render json: Share.delete(params[:path], params[:id])
end

def edit
prmhash = Hash.new { |hash, key| hash[key] = Array.new }
prmhash["location"] << params[:location]
params[:path].to_s.split('/').each( |x| prmhash["path"] << x)
render json: Share.modify(prmhash, params[:id])
end

def index
results = Share.list()
$exports = JSON.parse(results['exports'])
# the view renders this automatically
end

end

As we see from the above that the views are passed the model or an object to render. This is the case with the list all items view. Optionally plaintext, or json or xml can also be rendered onto html.
The Controller takes parameters passed in from the views with a form. The form can be elaborated on the views in terms of the controls they use, their markup and the associated workflow. Links can also be added on the views to enable forward and backward navigation between the items.

#codingexercise
Int GetDistinctMedian(int [] A)
{
if (A == null) return 0;
return A.DistinctMedian();
}

Wednesday, November 26, 2014

#codingexercise
Int GetDistinctSum(int [] A)
{
if (A == null) return 0;
return A.DistinctSum();
}

We were reviewing the non-linear conjugate gradient method. We set the residual to the negation of the gradient. We compute the Gram Schmidt conjugation of the residuals to find the search directions. We find the step length by ensuring that the gradient is orthogonal to the search direction. We solve for the component gradient of next step transposed applied to current search direction to be zero. We set the Gram Schmidt constants by one of two techniques : Fletcher Reeves (FR) formula and Polak Ribiere (PR) formula. FR was used in linear CG method and is easy to compute.PR is similar but uses the difference of the next residual from the current instead of just using the next residual. FR converges if the starting point is close to the desired minimum. PR may or may not converge but when it does it is quicker than FR. To force the convergence in the PR case, we ignore negative values of the constant and choose zero instead. This is equivalent to restarting the CG search where we forget the previous search directions and start it in the direction of the steepest descent.
We see that in the non-linear CG case, there are fewer convergence guarantees. If the function of the next step is chosen such that it deviates from the quadratic form, the search directions tend to lose conjugacy. Moreover, the global minimum may not be found and even a local minimum may not be found if the function has no lower bound.
A useful technique to improve convergence is to restart CG after every n iterations. This is particularly true when n is small. This follows from the fact that CG can only generate n conjugate vectors in n-dimensional space.
In the non-linear CG, we could also use a fast algorithm to find the zero of the gradient transposed search vector. If the gradient is a polynomial in step length, two iterative methods for zero-finding are the Newton Raphson method and the Secant method. Newton Raphson (NR) method also requires that the second-derivative of the function on initial step and step-length in the direction of the search vector be computable. NR method relies on Taylor series approximation in terms of the function, its first derivative and second derivative. By setting this approximation to zero and finding the steplength, we get a truncated Taylor series which approximates the function with a parabola. Using these, NR method constructs a quadratic approximation of the function and a new point is chosen at the base of the parabola. This procedure is iterated until convergence is reached. To perform an exact line search of the non-quadratic form, repeated steps may need to be taken along the line until the gradient transposed search direction is zero and therefore the CG iteration may include several Newton-Raphson iterations. This is inexpensive or expensive depending on the second derivative. To perform an exact line search without computing the second derivative, the secant method approximates the function by evaluating the first derivative at distinct points where the step length is zero and the step-length is an arbitrary small non-zero number. Like the NR method, the secant method also approximates the function with a parabola but instead of using the first and second derivative at a point, it finds the first derivative at two different points. Both the Newton-Raphson and the Secant methods iterations are done when the answer is reasonably close to the exact solution.
#codingexercise
Int GetDistinctMin (int [] A)
{
if (A == null) return 0;
return A.DistinctMin ();
}

Tuesday, November 25, 2014

We continue our discussion on Conjugate Gradient method. We review the non-linear conjugate gradient method. CG can be used not only to find the minimum point of a quadratic form but to minimize any continuous function where the gradient can be computed. CG for the non-linear forms have the following challenges:
1) the recursive formula for residual cannot be calculated.
2) the step size alpha is difficult to calculate although we still use a linear expression using alpha to compute the next step but a function of it and we try to minimize it.
3) and the Gram-Schmidt constants have many choices
We set the residual to be the negative of the gradient. The search directions are computed by the Gram-Schmidt conjugation of the residuals as with linear CG.
To overcome 1 we set the residual to be the negative of the gradient.
To overcome 2 we minimize the function
To overcome 3 we use one of two different techniques to choose the constants

Int GetDistinctMax (int [] A)

{

if (A == null) return 0;

return A.DistinctMax ();

}

Monday, November 24, 2014

In today's post we continue our discussion on Conjugate Gradients method. We looked at the effectiveness of a preconditioner in solving the linear matrix equation. CG can also be used to solve equations where the matrix A in the linear equation is not symmetric, not positive-definite, and even not square. In such a case we minimize the sum of squares of the errors because there may not be a solution. To find this minimum, we set the derivative of the linear expression to zero. We do this by applying A-transpose on both sides of the linear equation . When the matrix A is not a square, there are possibly more linearly independent equations than variables which may not have a solution. This is called an over-constrained equation. But it is always possible to find a value of x that minimizes sum of squares. A-transpose A is symmetric and positive. In such cases, Steepest Descent and CG can be used to solve the linear equation. The only nuisance is that the condition number of A-transpose A is the square of A and the convergence is much slower. Also, the linear equation may be underconstrained when there are less equations than variables, then using the transpose doesn't help and CG cannot be applied. That said, when we use this technique, we apply the A-transpose A, we never form it explicitly because it is less sparse than A. Instead it is formed by finding the product Ad, and then the transpose is applied to the product.
#codingexercise
Int GetDistinct (int [] A)
{
if (A == null) return 0;
return A.Distinct ();
}

Sunday, November 23, 2014

Sample python flask server to issue tokens based on OTT mentioned in earlier post:

# coding: utf-8

from datetime import datetime, timedelta

from flask import Flask

from flask import session, request

from flask import render_template, redirect, jsonify

from flask_sqlalchemy import SQLAlchemy

from werkzeug.security import gen_salt

from flask_oauthlib.provider import OAuth2Provider

from tokenprovider import TokenProvider

from json import dumps

import random

import string

app = Flask(__name__, template_folder='templates')

app.debug = True

app.secret_key = ''.join([random.choice(string.digits +

                   string.ascii_uppercase) for x in range(0, 32)])

app.config.update({

    'SQLALCHEMY_DATABASE_URI': 'mysql://root:thepassword@127.0.0.1/token',

})

db = SQLAlchemy(app)

oauth = OAuth2Provider(app)

class User(db.Model):

    id = db.Column(db.Integer, primary_key=True)

    username = db.Column(db.String(40), unique=True)

    salt = db.Column(db.String(4))

class Token(db.Model):

    id = db.Column(db.Integer, primary_key=True)

    user_id = db.Column(

        db.Integer, db.ForeignKey('user.id')

)

    user = db.relationship('User')

    # currently only bearer is supported

    token_type = db.Column(db.String(40))

    access_token = db.Column(db.String(6))

@app.route('/', methods=('GET', 'POST'))

def home():

    if request.method == 'POST':

        username = request.args.get('username','')

        user = User.query.filter_by(username=username).first()

        if not user:

            user = User(username=username,

             salt=''.join([random.choice(string.digits + string.ascii_uppercase)

            for x in range(0, 4)]

))

            db.session.add(user)

            db.session.commit()

        session['id'] = user.id

        print str(session)

        return redirect('/')

    user = None

    if 'id' in session:

         uid = session['id']

         user= User.query.get(uid)

    return jsonify({'msg': 'welcome,'+ str(user), 'success':'success'}), 200

@ott.tokengetter

def check_token(access_token=None):

    if access_token:

        return Token.query.filter_by(access_token=access_token).first()

    return None

@app.route('/session/', methods=('GET'))

def session():

     sessions = list()

     [sessions.append(i) for i in db.session]

     return jsonify ({'sessions': json.dumps(sessions)}), 200

@ott.tokensetter

def save_token(token, request, *args, **kwargs):

    id = User.query.filter_by(username=request.args.get('username','')).first().id

    toks = Token.query.filter_by(

       user_id = id

    ).all()

    # make sure that every client has only one token connected to a user

    db.session.delete(toks)

    tp = TokenProvider()

    tok = Token(user=request.args.get('username',''),

                token_type='bearer', access_token=tp.set_token())

    tok.save()

    tok.user_id = request.user.id

    db.session.add(tok)

    db.session.commit()

    return tok

if __name__ == '__main__':

    db.create_all()

    app.run(debug=True,port=8787)

In today's post we continue our discussion on how an OTP provider can replace the tokens used by an API for checking authorization. In this case, the we will call it an OTT provider. An OTT provider converts all users/ digests/sessions/cookies to track users into tokens. A TOTP 6 digit number as usually provided by OTP providers. These tokens not only replace the credentials but also serve to maintain sessions. With the API only accepting tokens, there is no need to track anything else. In a way there is no repository required fro the API implementations to see if there is a token associated with a user or a client. The token itself has both pieces - 'what you know' and 'what you have' and the API implementation can validate the token the same way as when the caller did after requesting it.
There are only three players:
A token provider that issues and validates token
An API client that requests tokens from the token provider and validates it
An API server that accepts only tokens and validates it from the same token provider.
Tamper proofing the request or encrypting the transport layer is not part of this discussion but considered necessary.
Although the OTT is enclosed with the request and sent over the wire, it is temporary and gives the api what is needed.
OTP is covered in RFC 2289 which states that there are two entities in the operation of the OneTimePassword system. The generator must produce the appropriate one-time password from the user's secret pass phrase, and from the information provided in the challenge from the server. The server must send a challenge that includes the appropriate generation parameters to the generator, must verify the one time password received, must store the last one time password it received, and must store the corresponding one-time password sequence number. The server must also facilitate the changing of the users' secret passphrase in a secure manner.
Int GetDistinctMean (int [] A)
{

if (A == null) return 0;

return A.GetDistinctMean ();

}

Saturday, November 22, 2014

Int GetDistinctMedian (int [] A)

{

if (A == null) return 0;

return A.GetDistinctMedian ();

}
Here's what a token provider server would implement:

Class TokenProvider ()
Def login(self, request):
Token = OneTimeToken ()
Session.Add (token)
Return token
Def logout (self, request, token):
token= Token.lookup (request)
If token:
Token.remove (token)
Session.remove (token)
Def refresh_token (token):
Return Token.refresh (token)
Def validateToken (token):
Return OTT.validateToken
OTT defined at https://github.com/ravibeta/pythonexamples/tokenprovider.py
and https://github.com/ravibeta/PythonExamples/blob/master/tokenserver.py