Cluster computing

Monday, December 1, 2014

Today we discuss another WRL report. This is titled Long Address Traces from RISC machines: Generation and analysis by Borg, Kessler, Lazana and Wall. This talks about cache design when processor speed was picking up. If the processor speed was shown to increase by a factor of ten with the memory system remaining the same, the overall execution speed improved merely by a factor of two. The difference was mainly attributed to servicing cache misses.
To analyze caches, they used simulation to trace memory addresses of running systems. The accuracy of the results depends both on the simulation model and the trace data. Some of the limitations included complexity, inaccuracy, lack of system references, short length, inflexibility or applicability only to CISC machines. They built a system to overcome these limitations where the trace generation is based on link time code modification. This makes generation of new trace easier and the system was flexible to allow control of what is traced and when it is traced. For large fast systems, very long trace is required. This system posed no limitations on the length of the trace.
The traces generated by this system was used to analyze the behavior of multi-level cache hierarchies. The link time code modification was possible because the programs were compiled to an intermediate language called Mahler. An option to the linker enables branches to trace code to be inserted into the program wherever a referenced address is to be recorded. Instruction addresses are computed and written into a trace buffer. This addresses are those where the instructions would have been located were it not for the trace branches. The trace buffer is shared among the kernel and all running processes but only those that have been linked actually use the buffer. The buffer is mapped to the high end of every user's virtual address space and the user trace code can write directly to the trace buffer referencing its virtual address. The trace code is written in an assembly language proprietary to the WRL system. Since the kernel execution is uninterruptible unlike user execution, additional code to ensure correct synchronization among the users of the trace buffer is required. A user trace entry could therefore be split with an arbitrary amount of trace data generated by kernel.
Trace extraction and analysis are time-consuming and cannot be performed together with tracing. Hence tracing is interrupted for those activities. The challenge is to ensure that the trace is seamless. A partial trace is implemented where the trace data is analyzed immediately by a high priority analysis process and the trace is not required to be saved.
We will also review another paper for combining branch predictors by McFarling from Rajeev's website after this one.
#codingexercise
Int GetDistinctRangeMin(int [] A)
{
if (A == null) return 0;
return A.DistinctRangeMin();
}

Int GetDistinctRangeMax(int [] A)

{

if (A == null) return 0;
Return A.DistinctRangeMax ();

}

Saturday, November 29, 2014

Today we talk about timestamps in http header values. Caching mechanisms for example are based on header fields that carry absolute timestamp values. We review the possible errors in using this field from an old WRL research report.
First of this is date skew. When compared with the NTP server with an accuracy up to milliseconds, the date headers often show that the clocks from the originating servers are not synchronized.
Second is impossible last modified value. The last-modified response header gives the timestamp at which the resource was most recently modified. A cache can validate a stored response by asking the server using the if-modified-since request header. This value therefore should not be any newer than the response's Date value but many often are not.
Third is the apparent coherency failures in using Last-Modified. The use of this field implies that the server responding to the URL is caching and that the contents of the URL has not changed. When this is not the case, we have a coherency failure. Many of the servers seem to demonstrate this anomaly.
Fourth is the effect of the limited timestamp resolution. The Last modified header has a resolution of 1 second. Therefore even a correctly implemented server that changes its content more than once in a second may have inaccurate timestamps.
Fifth is the effect of apparent premature expirations. An origin server may assign an expiration time after which the cache has to revalidate a stored response. Servers usually assign expiration times heuristically, guessing how long it might be before the resource is changed. By comparing the maximum age value, one can tell whether a cached response expires before the response value changes.
Sixth is the over optimistic expirations. When we compared a conservative expiration time effect above, we could also complement a similar caveat with generous expiration times that may cause incorrect caching.
To summarize, server operators may need to keep their clocks adjusted and Caches have to be consistent and correct in their use of timestamps.

#codingexercise
print time in RFC 1123 format
from email.utils import formatdate
print formatdate(timeval=None, localtime=False, usegmt=True)
Sun, 30 Nov 2014 16:43:49 GMT

generate random alphanumeric
import random
print ''.join([random.choice(string.digits + string.ascii_uppercase) for i in range (0,12)])
D8KCQ5WT6TLN

How do you package for debian?

cd apisecurity-1.0
dh_make --native
echo "data/* usr/share/apisecurity/" > debian/apisecurity.install
# edit rules, menu, copyright, changelog

gpg --list-secret-keys

dpkg-buildpackage -us -uc

debsign -k 1FB5E9A2 apisecurity_1.0_amd64.changes

dupload apisecurity_1.0_amd64.changes

#codingexercise
Int[] GetDistinctRange(int [] A)
{
if (A == null) return null;
return A.DistinctRange();
}

After the previous post on TOTP tokens just now, let us get back to our discussion on CG method.
We were discussing the non-linear conjugate gradient method. CG could be used to find the minimum point of a quadratic form. It could also be used to minimize any continuous function for which the gradient can be computed. If we choose the residual as a negative gradient, we overcome the limitation that we cannot use a recursive formula for the residual. With the residual, we can use Gram Schmidt conjugation to find the search directions. This is the same as in the linear search. We find the step-length which is different from the linear case. This is difficult because we cannot do a line search. We have to approximate it by finding the minimum point of a quadratic form. We use the linear expression for the step length but we minimize the given function at the next step. Next to find the constants, we could use any method that finds the zero of the transpose of gradient at next step applied to the current search direction. Nonlinear CG can be preconditioned by choosing a pre-conditioner M that approximates the second derivative and has the property that M-inverse applied to residual is easy to compute. In the linear search, with the pre-conditioner, we could transform the quadratic form so that it is similar to a sphere. In a non-linear CG pre-conditioner performs this transformation for a region near the current position. We could use the diagonal as the pre-conditioner if the second derivative is too expensive to compute. However, this only works if we are close to the current position. Moreover, a pre-conditioner must be positive definite. Non-positive diagonal elements cannot be allowed. A conservative solution is to not pre-condition when the second derivative is not positive-definite.

#codingexercise
how do you minify and obfuscate the code with python ?
perhaps use pyminifier --obfuscate --pyz=/tmp/myprogram.pyz /tmp/program.py
or use BitBoost to fight reverse-engineering
Now with the previous code for TOTP tokens, we can have a script that can be distributed for each API caller to generate TOTP tokens for themselves.

#codingexercise
Convert hex to string:
'AD829C89031FD281'.decode('hex')

'\xad\x82\x9c\x89\x03\x1f\xd2\x81

Friday, November 28, 2014

In this post, I continue to talk about the Ruby on Rails MVC. We will look at a few examples:
Here's the view to list all shares:

<h1>Listing shares</h1>

<%= link_to 'New share', new_share_path %>

<table>

  <tr>

    <th>Id</th>

    <th>Path</th>

    <th>Rowguid</th>

    <th>Action</th>

  </tr>

  <% @shares.each do |share| %>

    <tr>

      <td><%= share['id'] %></td>

      <td><%= share['path'] %></td>

      <td><%= share['rowguid'] %></td>

      <td><%= link_to 'Destroy', share_path(share),

                    method: :delete, data: { confirm: 'Are you sure?' }

      %></td>

    </tr>

  <% end %>

</table>

and here is a way to add new share:

<%= form_for :share, url: shares_path do |f| %>

<p>

    <%= f.label :sharepath %><br>

    <%= f.text_area :sharepath %>

  </p>

<p>

    <%= f.submit %>

  </p>

<% end %>
The form submits the data to the controller as parameters.

The data forwarded from the controller to the view is usually a model instance or an object. If the object is a dictionary the elements are accessed by their keys. For model instances, they are accessed by members.
#codingexercise
Int GetDistinctMode(int [] A)
{
if (A == null) return 0;
return A.DistinctMode();
}

class OneTimePasswordAlgorithm:
       doubleDigits = [ 0, 2, 4, 6, 8, 1, 3, 5, 7, 9 ]
       def calcChecksum(self, num, digits):
           doubleDigit = True
           total = 0
           for i in range(len(digits)-1) :
               digits -= 1
               digit = num % 10
               num /= 10
               if doubleDigit:
                   digit = self.doubleDigits[digit]
               total += digit
               doubleDigit = not doubleDigit
           result = total % 10
           if result > 0:
               result = 10 - result
           return result

DIGITS_POWER = [1,10,100,1000,10000,100000,1000000,10000000,100000000 ]

def generateTOTP(self, key, time, returnDigits, crypto):

codeDigits = int(returnDigits)

result = ''

#Using the counter

#First 8 bytes are for the movingFactor

#Compliant with base RFC 4226 (HOTP)

while (len(time) < 16 ):

time = "0" + time

#Get the HEX in a Byte[]

msg = bytearray(time.decode("hex"))

k = bytearray(key.decode("hex"))

import hmac

hash = bytearray(hmac.new(k,msg,crypto).hexdigest())

#put selected bytes into result int

offset = int(int(hash[len(hash) - 1]) & 0xf)

binary = ((hash[offset] & 0x7f) << 24) | ((hash[offset + 1] & 0xff) << 16) | ((hash[offset + 2] & 0xff) << 8) | (hash[offset + 3] & 0xff)

otp = int(binary % self.DIGITS_POWER[codeDigits])

result = str(otp);

while (len(result) < codeDigits):

result = "0" + result

return result

>>> otp = OneTimePasswordAlgorithm()

>>> import hashlib

>>> otp.generateTOTP('90AB891C', '123465539', '6', hashlib.sha256)

'857652'

>>>

Thursday, November 27, 2014

In this post I cover a simple web application written in Ruby on Rails which provides an MVC framework. The rails framework lays out the views, models, and controllers by the name we define.
In the example below, I use it for webAPI based application and I don't use a database. I generate the controller with 'rails generate shares index' and this lays out the model view controller files in the appropriate folders to be looked by the naming convention.

The controller itself looks like the following:

require_relative 'webapi'
# defines a resource for a file share

class SharesController < ApplicationController

def new
render json: Share.create(params[:path])
end

def destroy
render json: Share.delete(params[:path], params[:id])
end

def edit
prmhash = Hash.new { |hash, key| hash[key] = Array.new }
prmhash["location"] << params[:location]
params[:path].to_s.split('/').each( |x| prmhash["path"] << x)
render json: Share.modify(prmhash, params[:id])
end

def index
results = Share.list()
$exports = JSON.parse(results['exports'])
# the view renders this automatically
end

end

As we see from the above that the views are passed the model or an object to render. This is the case with the list all items view. Optionally plaintext, or json or xml can also be rendered onto html.
The Controller takes parameters passed in from the views with a form. The form can be elaborated on the views in terms of the controls they use, their markup and the associated workflow. Links can also be added on the views to enable forward and backward navigation between the items.

#codingexercise
Int GetDistinctMedian(int [] A)
{
if (A == null) return 0;
return A.DistinctMedian();
}

Wednesday, November 26, 2014

#codingexercise
Int GetDistinctSum(int [] A)
{
if (A == null) return 0;
return A.DistinctSum();
}

We were reviewing the non-linear conjugate gradient method. We set the residual to the negation of the gradient. We compute the Gram Schmidt conjugation of the residuals to find the search directions. We find the step length by ensuring that the gradient is orthogonal to the search direction. We solve for the component gradient of next step transposed applied to current search direction to be zero. We set the Gram Schmidt constants by one of two techniques : Fletcher Reeves (FR) formula and Polak Ribiere (PR) formula. FR was used in linear CG method and is easy to compute.PR is similar but uses the difference of the next residual from the current instead of just using the next residual. FR converges if the starting point is close to the desired minimum. PR may or may not converge but when it does it is quicker than FR. To force the convergence in the PR case, we ignore negative values of the constant and choose zero instead. This is equivalent to restarting the CG search where we forget the previous search directions and start it in the direction of the steepest descent.
We see that in the non-linear CG case, there are fewer convergence guarantees. If the function of the next step is chosen such that it deviates from the quadratic form, the search directions tend to lose conjugacy. Moreover, the global minimum may not be found and even a local minimum may not be found if the function has no lower bound.
A useful technique to improve convergence is to restart CG after every n iterations. This is particularly true when n is small. This follows from the fact that CG can only generate n conjugate vectors in n-dimensional space.
In the non-linear CG, we could also use a fast algorithm to find the zero of the gradient transposed search vector. If the gradient is a polynomial in step length, two iterative methods for zero-finding are the Newton Raphson method and the Secant method. Newton Raphson (NR) method also requires that the second-derivative of the function on initial step and step-length in the direction of the search vector be computable. NR method relies on Taylor series approximation in terms of the function, its first derivative and second derivative. By setting this approximation to zero and finding the steplength, we get a truncated Taylor series which approximates the function with a parabola. Using these, NR method constructs a quadratic approximation of the function and a new point is chosen at the base of the parabola. This procedure is iterated until convergence is reached. To perform an exact line search of the non-quadratic form, repeated steps may need to be taken along the line until the gradient transposed search direction is zero and therefore the CG iteration may include several Newton-Raphson iterations. This is inexpensive or expensive depending on the second derivative. To perform an exact line search without computing the second derivative, the secant method approximates the function by evaluating the first derivative at distinct points where the step length is zero and the step-length is an arbitrary small non-zero number. Like the NR method, the secant method also approximates the function with a parabola but instead of using the first and second derivative at a point, it finds the first derivative at two different points. Both the Newton-Raphson and the Secant methods iterations are done when the answer is reasonably close to the exact solution.
#codingexercise
Int GetDistinctMin (int [] A)
{
if (A == null) return 0;
return A.DistinctMin ();
}