Cluster computing

Wednesday, August 8, 2018

We were discussing the cloud-first strategy for newer workloads as well as migrating older workloads to Object Storage. We did not mention any facilitators of workload migrations but there are many tools out there that help with the migration. We use the IO capture and playbook tools to study the workload profiling. This we can perform in a lab environment or in production as permitted. In addition, there are virtualizers that take a single instance of an application or service and enable it to be migrated without any concern for the underlying storage infrastructure.

It is these kinds of tools we make note of today. These tools provide what is termed as "smart availability" by enabling dynamic movements of workloads between physical, virtual and cloud infrastructure. This is an automation of all the tasks required to migrate a workload. Even the connection string can be retained when moving the workload so long as the network name can be reassigned between servers. What this automation doesn't do is perform storage and OS level data replication because the source and destination is something the users may want to specify themselves and is beyond what is needed for migrating the workloads. Containers and shared volumes come close to providing this kind of ease but they do not automate all the tasks needed on the container to perform seamless migration regardless of the compute. Also, it makes no distinction between Linux containers and docker containers. These tools are often used for high availability and for separating the read only data access to be performed from the cloud.

It should be noted that the application virtualization does not depend on the hypervisor layer. There are ways to do it but it is not required. In fact, the host can be just about any compute as long as the migration is seamless which means it can be on-premise or in the cloud. There is generally a one to one requirement for the app to have a host. One application to many hosts seamless execution is excluded unless they are running in serverless mode. Even so, different functions may be executed one on one over a spun-up host. The host is not taken to be a cluster without some automation of which nodes execute the serverless functions. An application that is virtualized this way is agnostic of the host. This is therefore an extension of server virtualization but with the added benefits of fine-grained control.

We noted that workload patterns can change over time. There may be certain seasons where the peak load may occur annually. Planning for the day to day load as well as the peak load therefore becomes important. Workload profiling can be repeated year round so that the average and the maximum are known for effective planning and estimation.
Storage systems planners know their workload profiles. While deployers view applications, services and access control, storage planners see workload profiles and make their recommendations based exclusively on the IO, costs and performance. In the object storage world, we have the luxury of comparision with file-systems. In a file-system, we have several layers each contributing to the overall I/O of data. On the other hand, a bucket is independent of the filesystem. As long as it is filesystem enabled, users can get the convenience of a file system as well as the object storage. Moreover, the user account accessing the bucket can also be setup. Only the IT can help determine the correct strategy for the workload because they can profile the workload.

Tuesday, August 7, 2018

We were discussing Object Storage and file systems and the cloud-first strategy for newer workloads as well as migrating older workloads. However, not all workloads are suited for this cloud first strategy. We determine the suitability based on performance and cost-perspective. Ideally this is determined in the production environment. There are tools that can perform workload IO capture and playback. With this IO pattern replayed on a new storage system, the suitability of the cloud first strategy becomes clear.
We noted that workload patterns can change over time. There may be certain seasons where the peak load may occur annually. Planning for the day to day load as well as the peak load therefore becomes important. Workload profiling can be repeated year round so that the average and the maximum are known for effective planning and estimation.
Storage systems planners know their workload profiles. While deployers view applications, services and access control, storage planners see workload profiles and make their recommendations based exclusively on the IO, costs and performance. In the object storage world, we have the luxury of comparision with file-systems. In a file-system, we have several layers each contributing to the overall I/O of data. On the other hand, a bucket is independent of the filesystem. As long as it is filesystem enabled, users can get the convenience of a file system as well as the object storage. Moreover, the user account accessing the bucket can also be setup. Only the IT can help determine the correct strategy for the workload because they can profile the workload.

One of the factors that escapes attention is the capacity planning when migrating workloads. It is true that object storage is more durable than file systems. But if there is one copy of data in a file system, there may be three copies in an object storage.

The formula:

X = 3 Y + metadata

Where X = object storage in GB

Y = data on native file system in GB

And metadata = some usage that can be attributed to metadata.

Holds true for replicated data in the object storage.

In other words, planners may do well to include sufficient capacity in on-premise usage of Object Storage. Keeping multiple copies of files in sync across traditional storage systems was a challenge for workloads. This is not the case for object storage where the data may be replicated on multiple nodes.

Similarly, while a file may be stored as an object, its metadata may be enhanced with custom attributes. This increases the size of the object and since there are more copies now, there may be an increase from all metadata. Custom attributes also imply that while the size contribution from the data might remain the same, the size of the overall metadata will increase with the addition of each heavy attribute.

Monday, August 6, 2018

We were discussing Object Storage and file systems.
Object storage fits the cloud-first strategy. Therefore it comes with the benefits of cloud migration of workloads. However, not all workloads are suited for this cloud first strategy. We determine the suitability based on performance and cost-perspective. Ideally this is determined in the production environment. There are tools that can perform workload IO capture and playback. With this IO pattern replayed on a new storage system, the suitability of the cloud first strategy becomes clear.
However, IT may not always have the options to measure the production workload. Instead a lab environment is created where the production like workload may be synthetically generated. This kind of workload generator or benchmarking tool is also helpful in determining the suitability of the cloud first initiative, The IO profile of a workload helps with the planning of the storage resources.
The performance expectations and the cost planning will help with the evaluation of the storage alternative.
One thing to note here is that workload patterns can change over time. There may be certain seasons where the peak load may occur annually. Planning for the day to day load as well as the peak load therefore becomes important. Workload profiling can be repeated year round so that the average and the maximum are known for effective planning and estimation.
Storage is almost always available in tiers. It is important to recognize which tier the workload is most suited for. Public cloud providers publish guidelines for their tiers. With the help of the workload profiling and the testing against the cloud configurations, the suitability of the storage system candidate can be evaluated.
Storage systems planners know their workload profiles. While deployers view applications, services and access control, storage planners see workload profiles and make their recommendations based exclusively on the IO, costs and performance. In the object storage world, we have the luxury of comparision with file-systems. In a file-system, we have several layers each contributing to the overall I/O of data. A file system can be local or remote and may not be distributing load. A file-system helps with connecting Inter-operable systems such as linux with a windows fileshare and vice versa with protocols such as CIFS. NFS helps with linux-linux file share mount:
sudo mount -t nfs -o vers=3,sec=sys,proto=tcp ip.ip.ip.ip:/namespace/my_bucket/ /home/my/share
A file-system may also be exported as an object-storage.
On the other hand, a bucket is independent of the filesystem. As long as it is filesystem enabled, users can get the convenience of a file system as well as the object storage. Moreover, the user account accessing the bucket can also be setup. Only the IT can help determine the correct strategy for the workload because they can profile the workload.

Sunday, August 5, 2018

bool isDivisibleBy8 (uint n)

{
var digits = toDigits(n);
var last3 = digits.getRange (digits.length-3, 3);
uint last = last3.toInt ();
return last%8 == 0;
}

bool is DivisibleBy7(uint n)
{
while ( n > 0) {
var digits = toDigits(n);
var lastDigit = digits.last();
digits.removeAt(digits.Length-1);
n = digits.toInt() - lastDigit*2;
if (n%7 == 0) return true;
}
return false;
}

bool isDivisibleBy6(uint n)
{
return isDivisibleBy2(n) && isDivisibleBy3(n);
}

Saturday, August 4, 2018

bool isDivisibleBy3 (uint n)

{

uint sumOfDigits = 0;

while (n > 0)

{

int digit = n %10;

n = n / 10;

sumOfDigits += digit;

}

return sumOfDigits % 3 == 0;

}
bool isDivisibleBy11(uint n)
{
var digits = toDigits(n);
int sum = 0;
for (int i = 0; i < digits.Count; i++)
{
if (i%2 == 0) {
sum += digits[i];
}else{
sum -= digits[i];
}
}
return (sum % 11 == 0);

}

Friday, August 3, 2018

We were discussing Object Storage and file systems.

In a file system we have several layers each contributing to the overall I/O of data. With object storage which can be created using a file-system, we can introduce a layer that traps and translates calls to the filesystem to that on the object storage and back to the file-system without any change in semantics of the application program. The only difference is that the Operating System could help recognize object storage as a first-class citizen just like the file system.

During the discussion of file system, we talk about NFS file system but not CIFS. This means that we don't export CIFS file system as an object store. Technically there is nothing differentiating one file system from another as long as the file system is exported as an Object Storage. The advantage of CIFS is that it allows file share connectivity on inter-operational operating systems. A file share on Linux now becomes available to use from a Windows machine. Since a file system export works just as well with an object storage, we enable more clients to connect by allowing more forms of exports.
A bucket is independent of the filesystem since it is an object storage concept. A bucket can be used similar to a filesystem share especially if it is filesystem enabled. This means the users get the convenience of using a file system while they are using an Object Storage. Moreover the user account accessing the bucket can also be setup.

sudo mount -t nfs -o vers=3,sec=sys,proto=tcp ip.ip.ip.ip:/namespace/my_bucket/ /home/my/share

#codingexercise
Determine if a given large number is divisible by 9

bool isDivisibleBy9(uint n)
{
uint sumOfDigits = 0;
while (n > 0)
{
int digit = n %10;
n = n / 10;
sumOfDigits += digit;
}
return sumOfDigits % 9 == 0;
}

Thursday, August 2, 2018

#codingexercise
Get count of subarrays with sum divisible by m
int GetCountDivByK(List<int> A, int k)
{
var mod = new int[k];
int sum = 0;
for (int i = 0; i < A.Count; i++)
{
sum += A[i];
mod[((sum%k)+k)%k] += 1;
}
int result = 0;
for (int i =0; i < k; i++)
if (mod[i] > 1)
result += (mod[i] * (mod[i]-1)) / 2;
result += mod[0];
return result;

Find the number of zeros in the binary representation of a number:

int GetCountUnsetBits(int n)

{

int set = 0;

int unset = 0;

while (n)

{

if (n & 0x1) {

set++;

} else {

unset++;

}

n = n >> 1;

}

return unset;

}