Cluster computing

Tuesday, August 7, 2018

We were discussing Object Storage and file systems and the cloud-first strategy for newer workloads as well as migrating older workloads. However, not all workloads are suited for this cloud first strategy. We determine the suitability based on performance and cost-perspective. Ideally this is determined in the production environment. There are tools that can perform workload IO capture and playback. With this IO pattern replayed on a new storage system, the suitability of the cloud first strategy becomes clear.
We noted that workload patterns can change over time. There may be certain seasons where the peak load may occur annually. Planning for the day to day load as well as the peak load therefore becomes important. Workload profiling can be repeated year round so that the average and the maximum are known for effective planning and estimation.
Storage systems planners know their workload profiles. While deployers view applications, services and access control, storage planners see workload profiles and make their recommendations based exclusively on the IO, costs and performance. In the object storage world, we have the luxury of comparision with file-systems. In a file-system, we have several layers each contributing to the overall I/O of data. On the other hand, a bucket is independent of the filesystem. As long as it is filesystem enabled, users can get the convenience of a file system as well as the object storage. Moreover, the user account accessing the bucket can also be setup. Only the IT can help determine the correct strategy for the workload because they can profile the workload.

One of the factors that escapes attention is the capacity planning when migrating workloads. It is true that object storage is more durable than file systems. But if there is one copy of data in a file system, there may be three copies in an object storage.

The formula:

X = 3 Y + metadata

Where X = object storage in GB

Y = data on native file system in GB

And metadata = some usage that can be attributed to metadata.

Holds true for replicated data in the object storage.

In other words, planners may do well to include sufficient capacity in on-premise usage of Object Storage. Keeping multiple copies of files in sync across traditional storage systems was a challenge for workloads. This is not the case for object storage where the data may be replicated on multiple nodes.

Similarly, while a file may be stored as an object, its metadata may be enhanced with custom attributes. This increases the size of the object and since there are more copies now, there may be an increase from all metadata. Custom attributes also imply that while the size contribution from the data might remain the same, the size of the overall metadata will increase with the addition of each heavy attribute.

Monday, August 6, 2018

We were discussing Object Storage and file systems.
Object storage fits the cloud-first strategy. Therefore it comes with the benefits of cloud migration of workloads. However, not all workloads are suited for this cloud first strategy. We determine the suitability based on performance and cost-perspective. Ideally this is determined in the production environment. There are tools that can perform workload IO capture and playback. With this IO pattern replayed on a new storage system, the suitability of the cloud first strategy becomes clear.
However, IT may not always have the options to measure the production workload. Instead a lab environment is created where the production like workload may be synthetically generated. This kind of workload generator or benchmarking tool is also helpful in determining the suitability of the cloud first initiative, The IO profile of a workload helps with the planning of the storage resources.
The performance expectations and the cost planning will help with the evaluation of the storage alternative.
One thing to note here is that workload patterns can change over time. There may be certain seasons where the peak load may occur annually. Planning for the day to day load as well as the peak load therefore becomes important. Workload profiling can be repeated year round so that the average and the maximum are known for effective planning and estimation.
Storage is almost always available in tiers. It is important to recognize which tier the workload is most suited for. Public cloud providers publish guidelines for their tiers. With the help of the workload profiling and the testing against the cloud configurations, the suitability of the storage system candidate can be evaluated.
Storage systems planners know their workload profiles. While deployers view applications, services and access control, storage planners see workload profiles and make their recommendations based exclusively on the IO, costs and performance. In the object storage world, we have the luxury of comparision with file-systems. In a file-system, we have several layers each contributing to the overall I/O of data. A file system can be local or remote and may not be distributing load. A file-system helps with connecting Inter-operable systems such as linux with a windows fileshare and vice versa with protocols such as CIFS. NFS helps with linux-linux file share mount:
sudo mount -t nfs -o vers=3,sec=sys,proto=tcp ip.ip.ip.ip:/namespace/my_bucket/ /home/my/share
A file-system may also be exported as an object-storage.
On the other hand, a bucket is independent of the filesystem. As long as it is filesystem enabled, users can get the convenience of a file system as well as the object storage. Moreover, the user account accessing the bucket can also be setup. Only the IT can help determine the correct strategy for the workload because they can profile the workload.

Sunday, August 5, 2018

bool isDivisibleBy8 (uint n)

{
var digits = toDigits(n);
var last3 = digits.getRange (digits.length-3, 3);
uint last = last3.toInt ();
return last%8 == 0;
}

bool is DivisibleBy7(uint n)
{
while ( n > 0) {
var digits = toDigits(n);
var lastDigit = digits.last();
digits.removeAt(digits.Length-1);
n = digits.toInt() - lastDigit*2;
if (n%7 == 0) return true;
}
return false;
}

bool isDivisibleBy6(uint n)
{
return isDivisibleBy2(n) && isDivisibleBy3(n);
}

Saturday, August 4, 2018

bool isDivisibleBy3 (uint n)

{

uint sumOfDigits = 0;

while (n > 0)

{

int digit = n %10;

n = n / 10;

sumOfDigits += digit;

}

return sumOfDigits % 3 == 0;

}
bool isDivisibleBy11(uint n)
{
var digits = toDigits(n);
int sum = 0;
for (int i = 0; i < digits.Count; i++)
{
if (i%2 == 0) {
sum += digits[i];
}else{
sum -= digits[i];
}
}
return (sum % 11 == 0);

}

Friday, August 3, 2018

We were discussing Object Storage and file systems.

In a file system we have several layers each contributing to the overall I/O of data. With object storage which can be created using a file-system, we can introduce a layer that traps and translates calls to the filesystem to that on the object storage and back to the file-system without any change in semantics of the application program. The only difference is that the Operating System could help recognize object storage as a first-class citizen just like the file system.

During the discussion of file system, we talk about NFS file system but not CIFS. This means that we don't export CIFS file system as an object store. Technically there is nothing differentiating one file system from another as long as the file system is exported as an Object Storage. The advantage of CIFS is that it allows file share connectivity on inter-operational operating systems. A file share on Linux now becomes available to use from a Windows machine. Since a file system export works just as well with an object storage, we enable more clients to connect by allowing more forms of exports.
A bucket is independent of the filesystem since it is an object storage concept. A bucket can be used similar to a filesystem share especially if it is filesystem enabled. This means the users get the convenience of using a file system while they are using an Object Storage. Moreover the user account accessing the bucket can also be setup.

sudo mount -t nfs -o vers=3,sec=sys,proto=tcp ip.ip.ip.ip:/namespace/my_bucket/ /home/my/share

#codingexercise
Determine if a given large number is divisible by 9

bool isDivisibleBy9(uint n)
{
uint sumOfDigits = 0;
while (n > 0)
{
int digit = n %10;
n = n / 10;
sumOfDigits += digit;
}
return sumOfDigits % 9 == 0;
}

Thursday, August 2, 2018

#codingexercise
Get count of subarrays with sum divisible by m
int GetCountDivByK(List<int> A, int k)
{
var mod = new int[k];
int sum = 0;
for (int i = 0; i < A.Count; i++)
{
sum += A[i];
mod[((sum%k)+k)%k] += 1;
}
int result = 0;
for (int i =0; i < k; i++)
if (mod[i] > 1)
result += (mod[i] * (mod[i]-1)) / 2;
result += mod[0];
return result;

Find the number of zeros in the binary representation of a number:

int GetCountUnsetBits(int n)

{

int set = 0;

int unset = 0;

while (n)

{

if (n & 0x1) {

set++;

} else {

unset++;

}

n = n >> 1;

}

return unset;

}

Wednesday, August 1, 2018

#codingexercise
1) We were discussing a technique to count subsets of an array divisible by m:

For {1,2,3} set, we have the dp table as :

0 1 2 3 4 5 6

0 1

1 1 1

2 1 1 1 1

3 1 1 1 2 1 1 1

count = 3

Solution: The possible subsets are {1} {2} {3} {1,2} {2,3}, {1,3} and {1,2,3} with sums as 1,2,3,3,5,4 and 6 respectively. Therefore count is 3.

int GetCountSubseqSumDivisibleBy(List<int> A, int m)

{

var dp = new int[ A.Count() + 1, A.Sum() + 1];

for (int i = 0; i < A.Count(); i++){

dp[i, 0]++;

}

for (int i = 1; i <= A.Count(); i++) {

dp[i, A[i-1]]++;

for (int j = 1; j <= A.Sum(); j++){

if (dp[i-1, j] > 0) {

dp[i, j]++;

if (j + A[i-1] <= A.Sum()) {

dp[i, j + A[i-1]]++;

}

int count = 0;

for (int j = 1; j <= A.Sum(); j++) {

if (dp[A.Count(), j] > 0 && j % m == 0)

count += dp[A.Count, j];

}

return count;

}