Cluster computing

Wednesday, December 3, 2014

We continue our discussion on the WRL system to generate and analyze trace data. We looked at the formatting of the trace entries. We now look at some special considerations for the kernel. The kernel can also be traced because it is relinked. This is true for nearly all kernel code except for the part written in Assembly language because the linker does not modify it. On the WRL system, this part is responsible for determining the cause of traps and interrupts, saving and restoring co-processor state and register contents, manipulating the clock, translation look aside buffer, and managing return from the interrupts. It also includes idle loop. In these cases, the trace routines are inserted by hand.
The trace can grow pretty quickly. The trace buffers are nearly half the size of the memory on the system and yet it might represent only two seconds of untraced execution. This trace data has to be extracted so that tracing can continue without too much disruption or affecting accuracy.
The interruption may entail extracting the partial trace and writing it or analyzing it so that it does not have to be saved. The latter is preferred because the former doesn't scale with long traces. When the trace buffer is nearly full, the operating system turns off tracing and runs a very high priority analysis process.. Execution of this analysis is controlled by a read system call on a special file. The read returns the virtual address space of the beginning of the trace buffer and the number of available entries only when the buffer is full or nearly full.The data can be analyzed in any manner and the tracing is stopped for this duration.When the data has been processed, the read is once again executed, tracing is turned back on and traced user programs can once again execute.
Storing the trace data helps make the simulation results to be reproducible and thereby help with study of different cache organizations in each simulation. On the fly analysis does not help with that. This is possible for a trace data corresponding to a single trace. For multiple processes or for kernel traces, it will not be identical from run to run. There are a couple of potential solutions to this problem. It might be possible to simulate more than one cache organization at a time but this does not allow the comparison of new and old results. Another possibility is to do a sufficient number of controlled runs to determine the variance in the results for each cache organization so that any statistically significant difference between any two cache can be determined.
#codingexercise
decimal GetDistinctRangeVariance(int [] A)
{
if (A == null) return 0;
Return A.DistinctRangeVariance ();
}
https://github.com/ravibeta/PythonExamples/blob/master/totp.py

#codingexercise

decimal GetDistinctRangeClusterCenter(int [] A)

{

if (A == null) return 0;

Return A.DistinctRangeClusterCenter()
}

Tuesday, December 2, 2014

Today we will continue our discussion on the WRL system to generate and analyze trace data. The trace code itself was written in proprietary assembly language using registers to avoid memory references. 8 of these registers are used for trace generation. The kernel uses a separate register set from user process Only the trace index and the trace pointer values needed to be copied from user to kernel mode. In the switch from kernel to user they are always calculated. All processes have the shared trace buffer mapped to the same virtual address space. The linker inserts trace branches as part of the automatic generation of address references. In addition, it is possible to add custom trace routines by user or kernel. Only the kernel does it for this study. The addresses inserted in the trace buffer by the kernel are physical addresses, whereas those inserted by user addresses are virtual addresses. It is essential for the analysis code to be able to tell one from the other.Similarly, it is important to tell sequences of virtual addresses as belonging to a particular process when more than one is being traced. This is done by making a change mode entry in the trace buffer on every transition from kernel to user or back.The entry indicates the direction of the switch and the process id.
Tracing is done only when a trace flag is set and stored in a trace register.All writes to the trace buffer check for this value. When a write occurs, at the first interrupt following a write to a location in /dev/kmem, tracing is turned on. The set of user programs that are traced are determined by the ones that are specially linked. The kernel is traced only when it is trace-linked. When tracing is off, 4 to 6 additional instructions are executed at every trace point. The user code does not incur any cost but the kernel does. Even when the kernel is not linked for tracing, it is slower because it has to execute instructions to keep the trace buffer consistent between every switch
The trace buffer entries are formatted such that they can be divided into two types based on whether they fill one or two 32 bit words in the buffer. Data reference entries are one word long. Data load/store entries take up a single 32-bit word. The Load/Store is differentiated by the first two high order bits of the word. The remaining bits are used for address which is guaranteed to be less than 1GB. Any entry with a non zero high order bits is a data entry. On the other hand, an instruction entry represents the beginning of a basic block and has the first word denoting the type and count. Count is the number of instructions in the basic block starting at the address in the second word.
#codingexercise
Int GetDistinctRangeMode(int [] A)
{
if (A == null) return 0;
Return A.DistinctRangeMode ();
}
Int GetDistinctRangeMedian(int [] A)

{

if (A == null) return 0;

Return A.DistinctRangeMedian ();

}

decimal GetDistinctRangeStdDev(int [] A)

{

if (A == null) return 0;

Return A.DistinctRangeStdDev ();

}

Monday, December 1, 2014

Today we discuss another WRL report. This is titled Long Address Traces from RISC machines: Generation and analysis by Borg, Kessler, Lazana and Wall. This talks about cache design when processor speed was picking up. If the processor speed was shown to increase by a factor of ten with the memory system remaining the same, the overall execution speed improved merely by a factor of two. The difference was mainly attributed to servicing cache misses.
To analyze caches, they used simulation to trace memory addresses of running systems. The accuracy of the results depends both on the simulation model and the trace data. Some of the limitations included complexity, inaccuracy, lack of system references, short length, inflexibility or applicability only to CISC machines. They built a system to overcome these limitations where the trace generation is based on link time code modification. This makes generation of new trace easier and the system was flexible to allow control of what is traced and when it is traced. For large fast systems, very long trace is required. This system posed no limitations on the length of the trace.
The traces generated by this system was used to analyze the behavior of multi-level cache hierarchies. The link time code modification was possible because the programs were compiled to an intermediate language called Mahler. An option to the linker enables branches to trace code to be inserted into the program wherever a referenced address is to be recorded. Instruction addresses are computed and written into a trace buffer. This addresses are those where the instructions would have been located were it not for the trace branches. The trace buffer is shared among the kernel and all running processes but only those that have been linked actually use the buffer. The buffer is mapped to the high end of every user's virtual address space and the user trace code can write directly to the trace buffer referencing its virtual address. The trace code is written in an assembly language proprietary to the WRL system. Since the kernel execution is uninterruptible unlike user execution, additional code to ensure correct synchronization among the users of the trace buffer is required. A user trace entry could therefore be split with an arbitrary amount of trace data generated by kernel.
Trace extraction and analysis are time-consuming and cannot be performed together with tracing. Hence tracing is interrupted for those activities. The challenge is to ensure that the trace is seamless. A partial trace is implemented where the trace data is analyzed immediately by a high priority analysis process and the trace is not required to be saved.
We will also review another paper for combining branch predictors by McFarling from Rajeev's website after this one.
#codingexercise
Int GetDistinctRangeMin(int [] A)
{
if (A == null) return 0;
return A.DistinctRangeMin();
}

Int GetDistinctRangeMax(int [] A)

{

if (A == null) return 0;
Return A.DistinctRangeMax ();

}

Saturday, November 29, 2014

Today we talk about timestamps in http header values. Caching mechanisms for example are based on header fields that carry absolute timestamp values. We review the possible errors in using this field from an old WRL research report.
First of this is date skew. When compared with the NTP server with an accuracy up to milliseconds, the date headers often show that the clocks from the originating servers are not synchronized.
Second is impossible last modified value. The last-modified response header gives the timestamp at which the resource was most recently modified. A cache can validate a stored response by asking the server using the if-modified-since request header. This value therefore should not be any newer than the response's Date value but many often are not.
Third is the apparent coherency failures in using Last-Modified. The use of this field implies that the server responding to the URL is caching and that the contents of the URL has not changed. When this is not the case, we have a coherency failure. Many of the servers seem to demonstrate this anomaly.
Fourth is the effect of the limited timestamp resolution. The Last modified header has a resolution of 1 second. Therefore even a correctly implemented server that changes its content more than once in a second may have inaccurate timestamps.
Fifth is the effect of apparent premature expirations. An origin server may assign an expiration time after which the cache has to revalidate a stored response. Servers usually assign expiration times heuristically, guessing how long it might be before the resource is changed. By comparing the maximum age value, one can tell whether a cached response expires before the response value changes.
Sixth is the over optimistic expirations. When we compared a conservative expiration time effect above, we could also complement a similar caveat with generous expiration times that may cause incorrect caching.
To summarize, server operators may need to keep their clocks adjusted and Caches have to be consistent and correct in their use of timestamps.

#codingexercise
print time in RFC 1123 format
from email.utils import formatdate
print formatdate(timeval=None, localtime=False, usegmt=True)
Sun, 30 Nov 2014 16:43:49 GMT

generate random alphanumeric
import random
print ''.join([random.choice(string.digits + string.ascii_uppercase) for i in range (0,12)])
D8KCQ5WT6TLN

How do you package for debian?

cd apisecurity-1.0
dh_make --native
echo "data/* usr/share/apisecurity/" > debian/apisecurity.install
# edit rules, menu, copyright, changelog

gpg --list-secret-keys

dpkg-buildpackage -us -uc

debsign -k 1FB5E9A2 apisecurity_1.0_amd64.changes

dupload apisecurity_1.0_amd64.changes

#codingexercise
Int[] GetDistinctRange(int [] A)
{
if (A == null) return null;
return A.DistinctRange();
}

After the previous post on TOTP tokens just now, let us get back to our discussion on CG method.
We were discussing the non-linear conjugate gradient method. CG could be used to find the minimum point of a quadratic form. It could also be used to minimize any continuous function for which the gradient can be computed. If we choose the residual as a negative gradient, we overcome the limitation that we cannot use a recursive formula for the residual. With the residual, we can use Gram Schmidt conjugation to find the search directions. This is the same as in the linear search. We find the step-length which is different from the linear case. This is difficult because we cannot do a line search. We have to approximate it by finding the minimum point of a quadratic form. We use the linear expression for the step length but we minimize the given function at the next step. Next to find the constants, we could use any method that finds the zero of the transpose of gradient at next step applied to the current search direction. Nonlinear CG can be preconditioned by choosing a pre-conditioner M that approximates the second derivative and has the property that M-inverse applied to residual is easy to compute. In the linear search, with the pre-conditioner, we could transform the quadratic form so that it is similar to a sphere. In a non-linear CG pre-conditioner performs this transformation for a region near the current position. We could use the diagonal as the pre-conditioner if the second derivative is too expensive to compute. However, this only works if we are close to the current position. Moreover, a pre-conditioner must be positive definite. Non-positive diagonal elements cannot be allowed. A conservative solution is to not pre-condition when the second derivative is not positive-definite.

#codingexercise
how do you minify and obfuscate the code with python ?
perhaps use pyminifier --obfuscate --pyz=/tmp/myprogram.pyz /tmp/program.py
or use BitBoost to fight reverse-engineering
Now with the previous code for TOTP tokens, we can have a script that can be distributed for each API caller to generate TOTP tokens for themselves.

#codingexercise
Convert hex to string:
'AD829C89031FD281'.decode('hex')

'\xad\x82\x9c\x89\x03\x1f\xd2\x81

Friday, November 28, 2014

In this post, I continue to talk about the Ruby on Rails MVC. We will look at a few examples:
Here's the view to list all shares:

<h1>Listing shares</h1>

<%= link_to 'New share', new_share_path %>

<table>

  <tr>

    <th>Id</th>

    <th>Path</th>

    <th>Rowguid</th>

    <th>Action</th>

  </tr>

  <% @shares.each do |share| %>

    <tr>

      <td><%= share['id'] %></td>

      <td><%= share['path'] %></td>

      <td><%= share['rowguid'] %></td>

      <td><%= link_to 'Destroy', share_path(share),

                    method: :delete, data: { confirm: 'Are you sure?' }

      %></td>

    </tr>

  <% end %>

</table>

and here is a way to add new share:

<%= form_for :share, url: shares_path do |f| %>

<p>

    <%= f.label :sharepath %><br>

    <%= f.text_area :sharepath %>

  </p>

<p>

    <%= f.submit %>

  </p>

<% end %>
The form submits the data to the controller as parameters.

The data forwarded from the controller to the view is usually a model instance or an object. If the object is a dictionary the elements are accessed by their keys. For model instances, they are accessed by members.
#codingexercise
Int GetDistinctMode(int [] A)
{
if (A == null) return 0;
return A.DistinctMode();
}

class OneTimePasswordAlgorithm:
       doubleDigits = [ 0, 2, 4, 6, 8, 1, 3, 5, 7, 9 ]
       def calcChecksum(self, num, digits):
           doubleDigit = True
           total = 0
           for i in range(len(digits)-1) :
               digits -= 1
               digit = num % 10
               num /= 10
               if doubleDigit:
                   digit = self.doubleDigits[digit]
               total += digit
               doubleDigit = not doubleDigit
           result = total % 10
           if result > 0:
               result = 10 - result
           return result

DIGITS_POWER = [1,10,100,1000,10000,100000,1000000,10000000,100000000 ]

def generateTOTP(self, key, time, returnDigits, crypto):

codeDigits = int(returnDigits)

result = ''

#Using the counter

#First 8 bytes are for the movingFactor

#Compliant with base RFC 4226 (HOTP)

while (len(time) < 16 ):

time = "0" + time

#Get the HEX in a Byte[]

msg = bytearray(time.decode("hex"))

k = bytearray(key.decode("hex"))

import hmac

hash = bytearray(hmac.new(k,msg,crypto).hexdigest())

#put selected bytes into result int

offset = int(int(hash[len(hash) - 1]) & 0xf)

binary = ((hash[offset] & 0x7f) << 24) | ((hash[offset + 1] & 0xff) << 16) | ((hash[offset + 2] & 0xff) << 8) | (hash[offset + 3] & 0xff)

otp = int(binary % self.DIGITS_POWER[codeDigits])

result = str(otp);

while (len(result) < codeDigits):

result = "0" + result

return result

>>> otp = OneTimePasswordAlgorithm()

>>> import hashlib

>>> otp.generateTOTP('90AB891C', '123465539', '6', hashlib.sha256)

'857652'

>>>

Thursday, November 27, 2014

In this post I cover a simple web application written in Ruby on Rails which provides an MVC framework. The rails framework lays out the views, models, and controllers by the name we define.
In the example below, I use it for webAPI based application and I don't use a database. I generate the controller with 'rails generate shares index' and this lays out the model view controller files in the appropriate folders to be looked by the naming convention.

The controller itself looks like the following:

require_relative 'webapi'
# defines a resource for a file share

class SharesController < ApplicationController

def new
render json: Share.create(params[:path])
end

def destroy
render json: Share.delete(params[:path], params[:id])
end

def edit
prmhash = Hash.new { |hash, key| hash[key] = Array.new }
prmhash["location"] << params[:location]
params[:path].to_s.split('/').each( |x| prmhash["path"] << x)
render json: Share.modify(prmhash, params[:id])
end

def index
results = Share.list()
$exports = JSON.parse(results['exports'])
# the view renders this automatically
end

end

As we see from the above that the views are passed the model or an object to render. This is the case with the list all items view. Optionally plaintext, or json or xml can also be rendered onto html.
The Controller takes parameters passed in from the views with a form. The form can be elaborated on the views in terms of the controls they use, their markup and the associated workflow. Links can also be added on the views to enable forward and backward navigation between the items.

#codingexercise
Int GetDistinctMedian(int [] A)
{
if (A == null) return 0;
return A.DistinctMedian();
}