Cluster computing

Tuesday, February 3, 2015

Today we will discuss AWS Security as read from the slides of James Bromberger at AWS Summit. In this he describes not only the security considerations for users of AWS but also what AWS undertakes. He says that their primary concern is compliance while the audiences' is account management, service isolation, visibility and auditing. AWS concerns itself with securing the physical assets such as facilities, infrastructure, network, and virtualization. The customer secures the operating system, application, security groups, OS Firewall, Network configuration and Account management. AWS compliance is actually available for everyone to see on their website. It showcases certifications and approving industry organizations. On the other hand customers have primarily been interested in securing accounts - which in a way are the keys to the kingdom. Identity and Access Management consequently enable users and groups, unique security credentials, temporary security credentials, policies and permissions, roles as well as multi-factor authentication (MFA). Recommendations for account security involve : securing our master account with MFA, creating an IAM group for our Admin team, creating IAM users for our Admin staff as members of our Admin group and even turning on MFA for these users. Enhanced password management with expiry, reuse check, and change on next log in is also available.
Next we look at temporary credentials. These are used for running an application. We remove hardcoded credentials from scripts and config files, create an IAM role and assign restricted policy, then we launch the instance into a role. AWS SDKs transparently get temporary credentials.
IAM policies are applied with the least privilege policies. This means the resources will be qualified and the actions will be incremental in privileges granted. Policies can have conditions which can restrict access even further and this is good. AWS has a policy generator tool which can generate policies but there;s even a policy simulator tool that can be used to test it.
Another set of credentials we use is the access key and secret key These are used to sign requests. and to make API calls. They are generally issued once and the next time a new set is reissued.SSL is required for all traffic because data is exchanged and we want it to be encrypted. Even database connections and data transfers over them are to be encrypted.
In addition, AWS provides server side encryption using AES 256 bit that is transparent to customers. Keys are generated, encrypted and then stored using a master key. and the generated key is used to encrypt data.

Sunday, February 1, 2015

Command line tools for object storage such as S3cmd and awscli provide almost all the functionality required to interact with objects and storage. However, the use of SDK enables integration with different types of user interfaces: For example, if we want to get ACLs, we use :

  public function get($bucket, $key = null, $accessKey=null, $accessSecret=null){

             if ($bucket == null) return array('Grants' => array());

            $client = $this->getInstance($accessKey, $accessSecret);

            if ($key != null){

             $acls = $client->getObjectAcl(array(

                  'Bucket' => $bucket,

                  'Key' => $key));

             if ($acls == null) $acls = array('Grants' => array());

             return $acls;

}

            else {

             $acls = $client->getBucketAcl(array(

                  'Bucket' => $bucket));

             if ($acls == null) $acls = array('Grants' => array());

             return $acls;

}

}
If we want to set the ACLs we could use:

 // 'private', 'public-read', 'project-private', 'public-read-write', 'authenticated-read', 'bucket-owner-read', 'bucket-owner-full-control'

  public function set($bucket, $key, $acl, $accessKey=null, $accessSecret=null){

    $client = $this->getInstance($accessKey, $accessSecret);

    $result = $client->putObjectAcl(array(

    'ACL'    => $acl,

    'Bucket' => $bucket,

    'Key'    => $key,

    'Body'   => '{}'

));

    return $result;

}

#codingexercise
Double GetAlternateEvenNumberRangeSumPower ()(Double [] A, int n, int m)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRange
SumPower(n,m);
}

Today we cover object storage vs block storage.

The rate of adoption for ObjectStorage can become very exciting both as an IT Admin and as a user. Here are some use cases where it may prove more beneficial than say block storage:

If you have static content of varying sizes and you cannot label the workload into a category except calling it miscellaneous, then you can use ObjectStorage. It lets you add metadata to each content now treated as an object with some context around the data. It doesn’t split up files into raw blocks of data but the entire content is treated as an object. Whether its a few photos, uploaded music videos, backup files or just other data from your PC, they can now be archived and retrieved at will.
No matter how many objects or their sizes, each object can be uniquely and efficiently retrieved by its ID.
When the basket of miscellaneous object grows to a few hundred terabytes or even a few petabytes, storage systems that were relying on adding block storage cannot keep up. Object Storage does not require you to mount drives, manage volumes or remap volumes. Besides, objects can store multiple copies of data which improves availability and durability. Whether its S3, Swift or Atmos, most vendors give this assurance.
ObjectStorage can work with NAS and commodity nodes where scaling out is just addition of new compute rather than new storage.
That brings to the point that if you are using data that is high up in read-write such as say databases, then block storage such as with SAN will be helpful.
You can sign a link to the object and share it with others to download with their web browser of choice.

If you have access keys to the object storage, you can upload the objects this way :

       $s3 = S3Client::factory(array(

            'base_url' => $host,

            'key'    => $access,

            'secret' => $secret,

            'region' => $region,

            'ssl.certificate_authority' => $ssl_certificate_authority,

));

        $result = $s3->putObject(array(

            'Bucket'     => $bucket,

            'Key'        => $key,

            'SourceFile' => $file,

            'Metadata'   => array(

                'source' => 'internet',

                'dated' => 'today'

)

));

        $s3->waitUntilObjectExists(array(

            'Bucket' => $bucket,

            'Key'    => $key,

));

This is written using S3 APIs

Most S3 compatible vendors of Object Storage maintain their own set of access keys so you cannot use their access keys against another vendors' endpoint or storage.

Saturday, January 31, 2015

Today we continue our discussion on the detector distribution algorithm. If there are N intelligent disks, then they are clustered into k categories as follows:
Step 1: Initialize the N disks into individual clusters of their own with the N centers pointing to the disks themselves. K = N
Step 2: stop the execution if K clusters have been formed.
Step 3: merge shortest distance clusters into a new cluster and adjust the center.
Step 4: use this new cluster and decrement K to repeat with step 2.
The shortest distance between clusters is computed as the fraction of the number of detectors stored in the top layer over K. These detectors are randomly selected separate from the K -categories and stored in the top access control module. The rest is stored in the lower access control module.
Let us take a closer look at the Access control function distribution.This is not computed by just the metadata server but also by the intelligent disks thereby avoiding a single point of failure and performance bottleneck. However the number type detectors are generated only by the metadata server.This helps with the accuracy of the access request inspection. The detectors are distributed among top and lower access control modules. The two layers can inspect access request in parallel. The lower layer stores a smaller number of detectors so that it doesn't get in the way of IO. During detector generation, all sub-string in the legal access request are extracted one time and converted to a single integer value. The numerical values are then indexed with a B-Tree. This avoids the conversion of the same substring from different sources (access requests) and improves the algorithm in detector selection. The numerical interval found this way avoids the mistake in judging a legal access request as an abnormal one. Anything that lies outside this numerical range is not a detector or a valid access request.
The move from comparing binary strings to numerical values reduces the inspection overhead considerably.Further with clustering, these detectors are distributed in such a way that it improves the accuracy of the inspection without the overhead.

Friday, January 30, 2015

Today we continue our discussion on Two layered access control in Storage Area Networks. We mentioned the matching rule used in mature detector selection and access request inspection. The substrings in the initial detector and the legal access request are compared bit by bit. This accounts for most of the cost in terms of time and space. The substring in the detector can be extracted and converted to only single integer value. The matching threshold, say r, is defined for the length of the substring. This is then used in the two main processes in access request inspection : analyzing legal access request and the integer value selection for the number type detector. The extraction and conversion to a single integer value is done using the formulae which takes the location of the substring relative to the left in the binary string and enumerates the possible choices for the length of the substring and for the next segment unto the length of the binary string enumerates the possible choices by setting the current bit and the permutations possible with the rest and then cumulating this for the length equal to that of the substring. This calculation of permutations helps us come up with a unique integer index. All these integer values are in the range of 0 to the number of permutations possible with the remainder of the binary string from r. This is a one-dimensional limited interval and can be efficiently indexed with a B-Tree. The B-Tree is looked up for an entry that is not the same as any number type detector. We now review the detector distribution algorithm. This was designed keeping in mind the discordance between the metadata server and the intelligent disk such as the processing capacity and function. They may yet co-operate on a single access request. The negative selection algorithm was improved based on this relations between the metadata server and the intelligent disk. The processing capacity of the metadata server was strong and the overhead of the access control would cause loss of I/O performance. The processing capacity of the intelligent disk was poor and the overhead of access control would make large loss of I/O performance. The strategy used was to divide the file into several segments and stored in the different intelligent disk.So that any one lower access control in these intelligent disks can complete access control function for this file. By the file segmentation information, a shortest distance algorithm is used to cluster the intelligent disk.
#codingexercise
Double GetAlternateEvenNumberRangeSumProductpower()(Double [] A, int n)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumProductPower(n);
}
One more
#codingexercise

Double GetAlternateEvenNumberRangeSqrtSumProductpower()(Double [] A, int n)
{

if (A == null) return 0;

Return A.AlternateEvenNumberRangeSqRtSumProductPower(n);

}

Now back to the discussion we were having on the detector distribution algorithm. The majority of the detectors were in the metadata server and the remaining in the access control module of the intelligent disk. Data segmentation strategy was used in the storage area network, One file would be divided into several segments and stored in different disk. The access control module for any one disk can make the access control decision for the file. Since disks share the same file descriptors for common files, they can be clustered based on a similarity measure that works out as the fraction of the common file descriptors to the total file descriptor. If the top layer access control module stores num detectors, then assuming we cluster into k categories , we select num/k detectors and store them in the top layer. The rest is distributed in the lower access control module in the corresponding category.

Thursday, January 29, 2015

Today we continue to discuss the paper Two layered access control for Storage Area Network by Tao, DeJiao, ShiGuang. In this paper they describe it in the context of an artificial immune algorithm that can efficiently detect abnormal access. The two layers consist of a central metadata server which forms the top layer and the intelligent disks that may number 1 to n which forms the lower layer Detectors are used in both layers that receive, analyze and inspect the access request. The inspection is the new addition to the tasks of the metadata server or intelligent disk and it intercepts the access request prior to the execution of the command and the return of the data. The immune algorithm first does a negative selection to generate detectors for access request inspection. If there is a match, the request is considered abnormal and denied. The algorithm also decides the generation and distribution of detector to intercept the access requests. We now look at the detector generation algorithm. It uses the notion of an antigen and a detector both are represented by binary strings. The latter represents a space vector. All non-repeating binary strings are generated as the initial detector. The initial detectors that did not match any of the illegal access request were selected to be a mature detector. There are more than one detector generation algorithms namely Enumeration generation algorithm, linear generation algorithm, and greedy generation algorithm. In these algorithms, the initial detector are enumerated and the mature detectors are randomly selected. They have large time and space overhead. For selecting mature detectors and for inspecting access requests, matching is done based on a set of matching rules. These can be r-contiguous matching rules, r-chunk matching rules and Hamming distance matching rule. Matching involves comparing binary substrings unto r bits between the detectors and the legal access request All substrings with more than r bits in legal access request was traversed and there was no index for them. This study used the Hamming distance matching rule. The binary string matching could be improved. The number type detector and the matching threshold of r bits is defined for the length of the substring. The substring in the detector is converted to a single integer value. Access request inspection then involves analyzing legal access request and the integer value selection for number type detector.
#codingexercise
Double GetAlternateEvenNumberRangePowerRtProductpower()(Double [] A, int n, int m)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangecubePowerRtProductPower(n,m);
}

Wednesday, January 28, 2015

Today we continue our discussion on Access Control. We were reviewing RBAC models specifically IRBAC2000 model. We now review MDRBAC model We noticed that it was an improvement over IRBAC2000 because it introduced the notion of time restricted mappings and minimized the role mapping restrictions. Still it suffered from the problem that each participant of the grid had to now implement this model.
We next look at the model of CBAC. This is also a mutual operation model is based on the premise that the associated multi-domain can be dynamically defined. The dynamic relationship is called an alliance. Information can be exchanged on an alliance relationship. The shortcoming of this model is that the authorization is not clear because the dynamic relationship only helps with the role mapping.
These are some of the models that were presented in the paper.
We next look at two layered access control for Storage Area Network as written by Tao et al. This is a slightly different topic from the access control models we have been discussing but it is informative to take a look at access control in storage area networking. First access control is relevant in storage area networking and second, there is an immunity algorithm in this case. However, it incurs a large space and time overhead which has performance implications for large I/O. The structure of two layered access control is already given. The top layer being the layer that maintains metadata and the lower layer maintaining the disk. The distribution strategy for two layer access control is presented. The top layer generates all the detectors and the preserves a majority of them. The lower layer maintains a small number of detectors. The network access request is inspected with the help of the top layer access control module. The problem of protecting the storage area network contains several parts such as data and communication encryption, certification and access control.To prevent the illegal request and pass the valid request are the two main functions. Numerical detectors are used where their indices are found using a B-Tree. The detectors are used to inspect the access request. If the detector matches the access request, the control module will deny access to the request. The distribution of the detector is the main concern here.
#codingexercise
Double GetAlternateEvenNumberRangecubeRtProductpower()(Double [] A, int n)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangecubeRtProductPower(n);
}