Cluster computing

Friday, November 21, 2014

In today's post I continue discussing the method of Conjugate Gradients. We talked about preconditioning which is a technique for improving the condition number of a matrix. We tried to do this by first approximating A with M. M is a symmetric positive definite matrix which when inverted and applied to A helps indirectly solve the linear matrix equation we started with. However applying the inverse of M to A doesn't always result in either a symmetric or a definite matrix. Hence we brought in transformation where we substitute M with the E.E-Transposed. Such an E can be obtained by say Cholesky Factorization. In this case the substitution helps with making the component symmetric and positive definite. This process of using the CG is called the Transformed Preconditioned Conjugate Gradient Method. This method has an undesirable characteristic that E must be computed. Instead if we do a few careful variable substitutions, it can eliminate E. If we choose the new residual as E-inverse applied to the original residual and if we choose the new search vector as E-transpose applied to the original search direction, then we can derive what is called the untransformed preconditioned conjugate gradient method. The derivation works out like this: we start by finding the initial residual based on its definition and a starting position x0. we also set the initial search direction again based on its definition and using the initial residual. We write down the step length in terms of the above and we write down the next position and the next residual in terms of this step length and their iterative definition. This lets us calculate the factor to use with the iterative defintion for the search vector. The matrix E does not appear in this method.
Since it works for the CG, we can use the same technique to derive a pre-conditioned Steepest Descent Method that does not use E.
Let us take a quick look how.
In the steepest descent method, we use the iterative definition involving the residual and the step-length. We start by finding the intial residual and a starting postion. We write down the step-length in terms of the residual and then we calculate the next position.
The effectiveness of a preconditioner M is determined by the condition number of Minverse A and occassionally by its clustering of eigenvalues. The problem is to find a preconditioner. that approximates A well enough such as to make up for the cost of calculating the product M-inverse current residual once per iteration. The simplest pre-conditioner is a diagonal matrix. whose diagonal entries are identical to those of A. The process of applying this pre-conditioner, known as diagonal pre-conditioning or Jacobi pre-conditioning, is equivalent to scaling the quadratic form along the coordinate axes.
Another trivial pre-conditioner is to take a matrix M as A which gives us the quadratic form to be perfectly spherical. and so the solution takes only one iteration. The catch is that we aren't solving the linear equation we begin with so this isn't helpful.
There are many other pre-conditioners that have been studied. Some have been much more sophisticated.

#codingexercise
int GetCountOfLinesInSourceCode(string text)
{
var lines = text.split('\r\n');
int count = 0;
bool skip = false;
foreach ( var line in lines )
{
if (String.IsWhiteSpace(line)) continue;
if (line.startswith(single_line_comment)) continue;
if (line.startswith(multi_line_comment_begin)) { skip = true; continue;}
if (line.startswith(multi_line_comment_end)) {skip=false; continue;}
if (!skip) count++;
}
return count;
}

Today we continue to discuss some more of the authentication and authorization frameworks. In particular, we look at information rights management process such as in Oracle IRM. The process involved is as follows:
An author creates a document or an email on her desktop or notebook.
The document is sealed by the author when she publishes it. The document is then sent to the receiver
The receiver click on the document. The document is encrypted and cannot be opened without an installed IRM application. If the application is available, it can then be decrypted and shown to the receiver.
The components involved in this process include:
- an identity manager to centrally provision IRM users and entitlements.
- a virtual directory to synchronize IRM users and groups
- a Single sign on for desktop single sign on.
All user interfaces are consistent through out the users use of the applications and built on the same platform framework. The users have a seamless and familiar interface for access management. The identity management is provided by the Oracle Identity Manager. The identity management architecture is built on a data tier that supports an operational db, an ID store, and an audit DB. The Oracle web logic server is the layer above the data tier. This layer is responsible for all of domain template management, configurations, and installing, upgrading and patching. The weblogic server maintains a unified, consolidated and consistent view over the ID store which includes membership providers such as Active Directory. The layer above the web logic server is the Fusion middleware services provides such functionality as
the ability to configure new metadata models
to provision new users or export user information to other systems
for users to administer their own profiles with fine grained entitlements
for policy management based on profiles
for creating and provisioning workflows that enables automation of IT tasks along with a Workflow visualizer and a designer.
a password management that includes self-service.
and audit and compliance management


decimal GetCountDuplicatesStandardDeviation (int [] A)

{


if (A == null) return 0;


return A.GetCountDuplicatesStandardDeviation();



}

Int GetDistinctMode (int [] A)
{



if (A == null) return 0;



return A.GetDistinctMode ();



}

Thursday, November 20, 2014

This is a writeup on how to collect and visualize data that indicates who talks to whom and represent it as a neat little graph alongside your favorite application with which you browse this communication. Let us take an example of an e-mail application that has mails sent to you. Inevitably you will be in all of the e-mails. That’s what pigeonholing might do to you. But even with this data, you may be seeing mails where you were included just to be in the loop. These from and to labeled people may give indications not only to who is chatty but also whose mail is more important than others especially when it comes from outside your organization. Although this is a complex domain and it has many subjective variables, we can begin with this use case.

Let us say we iterate through our list of mails and populate the following table:

- - - - - - - - - - - -

- From | To |

- - - - - - - - - - - -

- A B

- B C

- D E

This may start to look like a graph where we can quickly form nodes with participants such as A, B, C, D and E. We also form edges between them for each entry that we make in this table. Notice that the edges have weights and they are directed.

We can write this graph as an adjacency list or matrix. We could prefer Matrix but for our purposes we simply want to illustrate a visualization of the information we have learned. Notice though that although we have decided to layout a graph, it can be drawn in several different ways. Some can be messier than others. Perhaps a better way to draw them would be to have them spaced apart and try not to have the edges cross over too much. This will make it easier to see the graph.

Now let us try the same with this graph. We have a technique by which we can optimize this graph layout.

# The following program lays out a graph with little or no crossing lines.

# This is adapted from a sample in "Programming Collective Intelligence" by OReilly Media

from PIL import Image, ImageDraw

import math

import random

vertex = ['A','B','C','D','E']

links=[('A', 'B'),

('B', 'C'),

('C', 'D'),

('D', 'E'),

('E', 'A'),

('C', 'E'),

('A', 'D'),

('E', 'B')]

domain=[(10,370)]*(len(vertex)*2)

def randomoptimize(domain,costf):

best=999999999

bestr=None

for i in range(1000):

# Create a random solution

r=[random.randint(domain[i][0],domain[i][1]) for i in range(len(domain))]

# Get the cost

cost=costf(r)

# Compare it to the best one so far

if cost<best:

best=cost

bestr=r

return r

def annealingoptimize(domain,costf,T=10000.0,cool=0.95,step=1):

# Initialize the values randomly

vec=[float(random.randint(domain[i][0],domain[i][1]))

for i in range(len(domain))]

while T>0.1:

# Choose one of the indices

i=random.randint(0,len(domain)-1)

# Choose a direction to change it

dir=random.randint(-step,step)

# Create a new list with one of the values changed

vecb=vec[:]

vecb[i]+=dir

if vecb[i]<domain[i][0]: vecb[i]=domain[i][0]

elif vecb[i]>domain[i][1]: vecb[i]=domain[i][1]

# Calculate the current cost and the new cost

ea=costf(vec)

eb=costf(vecb)

p=pow(math.e,(-eb-ea)/T)

# Is it better, or does it make the probability

# cutoff?

if (eb<ea or random.random( )<p):

vec=vecb

# Decrease the temperature

T=T*cool

return vec

def crosscount(v):

# Convert the number list into a dictionary of person:(x,y)

loc=dict([(vertex[i],(v[i*2],v[i*2+1])) for i in range(0,len(vertex))])

total=0

# Loop through every pair of links

for i in range(len(links)):

for j in range(i+1,len(links)):

# Get the locations

(x1,y1),(x2,y2)=loc[links[i][0]],loc[links[i][1]]

(x3,y3),(x4,y4)=loc[links[j][0]],loc[links[j][1]]

den=(y4-y3)*(x2-x1)-(x4-x3)*(y2-y1)

# den==0 if the lines are parallel

if den==0: continue

# Otherwise ua and ub are the fraction of the

# line where they cross

ua=((x4-x3)*(y1-y3)-(y4-y3)*(x1-x3))/den

ub=((x2-x1)*(y1-y3)-(y2-y1)*(x1-x3))/den

# If the fraction is between 0 and 1 for both lines

# then they cross each other

if ua>0 and ua<1 and ub>0 and ub<1:

total+=1

for i in range(len(vertex)):

for j in range(i+1,len(vertex)):

# Get the locations of the two nodes

(x1,y1),(x2,y2)=loc[vertex[i]],loc[vertex[j]]

# Find the distance between them

dist=math.sqrt(math.pow(x1-x2,2)+math.pow(y1-y2,2))

# Penalize any nodes closer than 50 pixels

if dist<50:

total+=(1.0-(dist/50.0))

return total

def drawnetwork(loc):

#create the image

img = Image.new('RGB', (400,400),(255,255,255))

draw=ImageDraw.Draw(img)

#create the position dict

pos=dict([(vertex[i],(loc[i*2],loc[i*2+1])) for i in range(0, len(vertex))])

#Draw Links

for (a,b) in links:

draw.line((pos[a],pos[b]), fill=(255,0,0))

#Draw vertex

for (n,p) in pos.items():

draw.text(p,n,(0,0,0))

img.save('graph.jpg', 'JPEG')

img.show()

sol=randomoptimize(domain,crosscount)

crosscount(sol)

sol=annealingoptimize(domain,crosscount,step=50,cool=0.99)

crosscount(sol)

drawnetwork(sol)

Now this draws the graph as :

This is a neat little graph and shows that the participant A is more popular.

Wednesday, November 19, 2014

#codingexercise
Int GetCountDuplicatesMode (int [] A)
{
if (A == null) return 0;
return A.CountDuplicatesMode();
}

In today's post we continue to look at the PHP Implementation of a class based views which are not tied to model. A PHP controller that takes a single input for say a string to do CRUD operation may look something like this:


<?php

require  __DIR__ . '/crud.php';

#require_once 'crud.php'

class ExportsController extends AppController {

        public $helpers = array('Html', 'Form');

        public $exports = NULL;

        public function index() {

            $orm = new ORM();

            $result = json_decode($orm->list(), true); 

            $exports = $result['exports'];

            $ret = array();

            foreach($exports as $export) {

                   array_push($ret, $export['input']); 

            }

            $this->set('exports', $ret);

        }

        public function add($export = null) {        

             if(isset($this->request->data['input'])) {            

                $orm = new ORM();            

                $result = json_decode($orm->create($this->request->data['input']));            

                if (strpos($result, 'success') !== false){                

                           $this->Session->setFlash(__('Your export has been saved.'));

                           return $this->redirect(array('action' => 'index'));            

                }


               $this->Session->setFlash(__('Unable to add your export.'));

             }        





}

: similar for modify or delete

}

Notice we are not using ids of the items in the list and this is usually not recommended. The RESTAPI url comes out to be /exports/modify/9 for example. This is preferable way for REST based API. It doesn't mean that we cannot get the exports working the way we have it. It merely suggests that conforming to the ID based helps with keeping track of the collection when it changes.


Int GetCountDuplicatesMedian (int [] A)

{

if (A == null) return 0;

return A.CountDuplicatesMedian();

}

Tuesday, November 18, 2014

#codingexercise

Int GetCountDuplicatesAvg (int [] A)

{

if (A == null) return 0;

return A.CountDuplicatesAvg();

}

In this post, we continue to read about PHP MVC pattern. A controller handles the logic usually for a single model. In Cake, the controller names are always plural. Generally each controller extends the AppController.
Methods include the following:
<?php
class SharesController extends AppController
{
function view($id)
{
}
function create($path)
{
}
function search($query)
{
}
}
postConditions is used to format this data to use in the model.
ControllerVariables are a few special variables inside of the controller such as $name, $uses, $helpers, $layout, $autoRender, $beforeFilter , $components etc.
ControllerParameters are available via $this->params in the current controller.

Cake follows a naming convention for models, views and controllers.

The Models are usually named singular and scaffolding is available for database. The standard way of getting to the data using a model include
findAll
string $conditions
array $fields
string $order
int $limit
int $page
int $recursive

Monday, November 17, 2014

#codingexercise
Int GetCountDuplicatesSum (int [] A)
{
if (A == null) return 0;
return A.CountDuplicatesSum();
}

Today I want to discuss PHP programming language as quickly as possible. Its useful in the Model View Controller or MVC which is a widely used pattern for software applications.
The basic syntax involves
<?php
echo 'Hello World'
...
The types involved are boolean, integer, float, string, array and object. The last two are referred to as compound types.
eg,
$a_bool = TRUE;
$a_str = "foo";
$an_int = 12;
$array = [
"foo" => "bar",
"bar" => "foo"
];
$b = array_values($array)
?>
There are two special types resource and NULL.
The resource type is a special variable, holding a reference to an external resource and its type is got by get_resource_type(). A resource variable holds special handlers to opened files, database connections and images. Resources are automatically collected.

mixed, number and callback are also used. These are called pseudo -types
mixed indicates that a parameter may accept multiple types,
number indicates that a parameter can be either integer or float.
callback and callable are used to denote functions as parameters and usually passed by its name as string.

classes are defined simply as :
<? php
class MyClass{
public $property = "Hello World";
public function Call()
{
call_user_func(array($this, 'Callback'));
}

public function Callback()
{
echo $this->property;
}
}
$m = new MyClass;
$m->call();
>
keywords such as interface, implements and extends are available. Scope resolution is with :: and objects can be cloned with the _clone method.

PHP does not require explicit type definition in variable declaration. The addition operator is a good example.
$ foo = "8"
$foo += 2
$foo += 1.3
$foo = 5 + "10 some_string_ignored"
Typecasting and strict === are checked.
Variables can have local or static scope
Data from get or post requests are found as follows:
echo $_POST['username'];
echo $_REQUEST['username'];
we also get $_COOKIE['count']

To make an MVC application we use say CakePHP framework.

We clone it from git at git://github.com/cakephp/cakephp.git

and the MVC application has a folder structure like this:
/path_to_document_root
/app
/lib
/plugins
/vendors
.htaccess
index.php
README

we may have to give folder permissions with
<?php echo exec('whoami'); ?>
chown -R www-data app/tmp
A database connection is simply a configuration
public $default = array(
'datasource' => 'Database/Mysql',
'persistent' => false,
'host' => 'localhost',
'port' => '',
'login' => 'mylogin',
'password' => 'mypassword',
'database' => 'location-volume-share',
'schema' => '',
'prefix' => '',
'encoding' => 'utf8'
);
Controllers can be defined by
// File: /app/Controller/ShareController.php
class shareController extends AppController {
    public $helpers = array('Html', 'Form');
    public function index() {
         $this->set('posts', $this->Post->find('all'));
    }
    public function edit($id = null) {
        if (!$id) {
            throw new NotFoundException(__('Invalid post'));
        }
        $params = explode(':', $this->request->data, 3)
        $location = $params[0]
        $volume = $params[1]
        $share = $params[2]
        $this->set($id, $location, $volume, $share);
    }
}

Now we will continue our discussion on preconditioning in conjugate gradients. We saw that we could use transformation to make the iterations quicker. The effectiveness of a preconditioner is determined by the condition number of M inverse applied to A as discussed earlier and occassionally by clustering of its eigenvalues. The simplest preconditioner is a diagonal matrix whose diagonal entries are identical to those of A.

Sunday, November 16, 2014

#codingexercise

Int GetCountDuplicatesCount (int [] A)

{

if (A == null) return 0;

return A.CountDuplicatesCount();

}

#codingexercise

Int GetCountDuplicatesAvg (int [] A)

{

if (A == null) return 0;

return A.CountDuplicatesAvg();

}

We continue to discuss CG method. We were talking about starting and stopping the iterations. We next discuss preconditioning. Preconditioning is a technique for improving the condition number of a matrix. Let us say M is a symmetric, positive definite matrix that approximates A, but is easier to invert. Then applying M inverse to both sides of the linear equation we solve with Steepest Descent or CG method, we can solve it more quickly because the eigenvalues of Minverse applied to matrix A on the left hand side of the linear equation are better clustered than those of A. However, the problem is M-inverse applied to A doesn't always come out to be positive symmetric matrix. We could avoid this difficulty by taking a pair of matrices E and E-Transpose whose product is M because for every positive symmetric M there exists such. Now the M-inverse applied to A and Einverse applied to A applied to E-Transponse-inverse have the same eigenvalues. By transforming, the latter is now symmetric and positive definite This process of using CG to solve this system is called Transformed Preconditioned Conjugate Gradient Method.