Cluster computing

Saturday, January 27, 2018

One of the trends in operational practice is to rely on tools that sets thresholds and raises alerts. This translates to incident response instead of active and strenuously polling. As part of the response, we search the logs. Most of these are interactive command line executions but each step may be time consuming due to the volume of the logs. One way to mitigate this is to run a sequential batch script that repeats the commands on smaller chunks of data. This however means we lose the aggregations unless we store intermediary data. Fortunately this was possible using files. However most log archive systems are read only and the files may not be read from. This also restricts parallelizing tasks using library such as celery because those require network access to message broker and the only access allowed is ssh. One way to overcome this is to scatter and gather data from multiple ssh sessions. This is easier to automate because the controller does not have to be local to the log server.
Another option is to leave the log server as-is and draw all the data into a log index. Then the search and reporting stacks can use the index. Since the index is designed to grow to arbitrary size, we can put all the logs in it. Also, the search stack enables as many search sessions as necessary to perform the task. They may even be made available via API, SDK and UI which enable applications to leverage parallelism as appropriate. For example, the SDK can be used with task parallel libraries such as Celery so that the same processing can be done in batches of partitioned data. The data can be partitioned based on historical timeline or they can be partitioned based on other attributes. The log index server also helps the application to preserve search artifacts so that the same can be used later or in other searches. The reporting stack sits over the search stack because the input to the reporting dashboard is the results of search queries. These search queries may be optimized, parallelized or parameterized so that they have near real-time performance. The presence of search and reporting stacks in new log indexing products indicates that these are separate areas of concerns which cannot be mixed with the conventional log readers into a monolithic console session.

Friday, January 26, 2018

Today we continue our discussion on the AWS papers in software architecture which suggests five pillars:

- Operational Excellence for running and monitoring business critical systems.

- Security to protect information, systems, and assets with risk assessments and mitigation strategies.

- Reliability to recover from infrastructure or service disruptions

- Performance Efficiency to ensure efficiency in the usage of resources

- Cost Optimization to help eliminate unneeded cost and keeps the system trimmed and lean.

The guidelines to achieve the above pillars include:

1. Infrastructure capacity should be estimated not guessed

2. Systems should be tested on production scale to eliminate surprises

3. Architectural experimentation should be made easier with automation

4. There should be flexibility to evolve architectures

5. Changes to the architecture should be driven by data

6. Plan for peak days and test at these loads to observe areas of improvement

We looked at the Operational Excellence, Reliability and security pillar and we reviewed the associated best practices.

Thursday, January 25, 2018

Wednesday, January 24, 2018

Today we continue our discussion on the AWS papers in software architecture which suggests five pillars:
- Operational Excellence for running and monitoring business critical systems.
- Security to protect information, systems, and assets with risk assessments and mitigation strategies.
- Reliability to recover from infrastructure or service disruptions
- Performance Efficiency to ensure efficiency in the usage of resources
- Cost Optimization to help eliminate unneeded cost and keeps the system trimmed and lean.
The guidelines to achieve the above pillars include:
1. Infrastructure capacity should be estimated not guessed
2. Systems should be tested on production scale to eliminate surprises
3. Architectural experimentation should be made easier with automation
4. There should be flexibility to evolve architectures
5. Changes to the architecture should be driven by data
6. Plan for peak days and test at these loads to observe areas of improvement
We looked at the Operational Excellence, Reliability and security pillar and we reviewed the associated best practices.
Next we review the performance-efficiency pillar which includes the ability to use computing resources efficiently even with the fluctuations in demand and as technology evolves.
It includes five design principles. These are:
Vendor aware deployments - This implies that we don't need to host and run a new technology. Databases, machine learning, encodings are best done at the cloud level by dedicated teams so that our service may simply use it.
global availability - We deploy the system in multiple regions around the world so they provide lower latency and more availability.
serverless architectures - This notion eliminates ownership of servers for the computations and storage services act as static websites. Even the event services can be used to host the code
experiment more often - with virtual and automate-able resources, we can carry out comparative , we can quickly evaluate which T-shirt size works for us
Mechanical sympathy - This calls for using the technology that best helps us to achieve what we want with our service.
The four best practice areas in this regard are:
Selection - As wirkloads vary, the solution becomes more nuanced about the choice of products and often involves a hybrid approach to overcome trade-offs. If the choices are done on a cyclical basis.the solution improves over time
Review - This is about evaluating newer technologies and retiring older technologies The cloud services for example become available in new regions and upgrade their services and features.
Monitoring - This gives continuous feedback on the systems as deployed so that alarms can be set in place for actions to be taken
Trade-offs- The initial design may have considered trade-offs such as consistency, durability and space versus time or latency to deliver higher performance but these also need to be done with subsequent change management
#codingexercise
Find the nth multiple of k in Fibonacci Series
solution 1 : iterate through the Fibonacci Series testing and counting success
solution 2: Fibonacci multiples of a number are periodic. depending on k determine the period and hence the position of the result.
int GetNthMultipleFibonacci (int k, int n)
{
int multiple = -1;
for (int I = 0; I < int_max; i++)
{
if (GetFibonacci (i) % k == 0){
multiple = i + 1;
break;
}
}
if (multiple == -1) return -1;
int position = n * multiple;
return GetFibonacci (position);
}

Tuesday, January 23, 2018

Today we continue our discussion on the AWS papers in software architecture which suggests five pillars:
- Operational Excellence for running and monitoring business critical systems.
- Security to protect information, systems, and assets with risk assessments and mitigation strategies.
- Reliability to recover from infrastructure or service disruptions
- Performance Efficiency to ensure efficiency in the usage of resources
- Cost Optimization to help eliminate unneeded cost and keeps the system trimmed and lean.
The guidelines to achieve the above pillars include:
1. Infrastructure capacity should be estimated not guessed
2. Systems should be tested on production scale to eliminate surprises
3. Architectural experimentation should be made easier with automation
4. There should be flexibility to evolve architectures
5. Changes to the architecture should be driven by data
6. Plan for peak days and test at these loads to observe areas of improvement
We looked at the security pillar and we reviewed its best practices.
They include identity and access management, monitoring controls, infrastructure protection, data protection and incident response.
Next we review the reliability pillar which includes the ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions.
It includes five design principles:
Test recovery procedures - Cloud let us simulate failures as well as test recovery procedures because the resources are elastic With simulations we can now test and rectify before a real failure
Automatic recovery from failure : Automation can be event based so that it kicks in only when a threshold is reached.Notification and tracking of failures no longer require polling. Instead recovery and repair can be automated.
Scale horizontally to increase aggregate system reliability. We replace one large resource with multiple small resources to reduce the impact of a single resource
Capacity no longer needs to be guessed. Prior to cloud the resources were saturated frequently leading to more failures. Instead cloud lets us monitor demand and react accordingly.
Manage change in automation - changes to the infrastructure are done using automation.
The best practice areas for reliability in the cloud include :
Foundations - Network topology and service limits are two key criteria to establish a foundation. Often the bandwidth compute and storage limits were run-over earlier. Cloud lets us manage foundation better. In AWS, we just have to pay attention to the topology and service limits.
Change Management - Every change may violate the SLA process, therefore change control process are necessary. How a system adapts to changes in demand, how the resources are monitored and how the change is executed determine this best practice.
Failure Management - Becoming aware of failures, responding to them and preventing them from happening again are part of this best practice.
The AWS well architected framework does not provide an example of where such pillars are used in prominence in the industry today. One such example is the management of Governance Risk and Compliance (GRC) The fianancial services industry is highly regulated and has an increasing need to break the tradeoff between compliance and innovation
#codingexercise
Given a number N, find the number of ways you can draw N chords in a circle with 2*N points such that no 2 chords intersect.
If we draw a chord between any two points the set of points get divided into two smaller sets and there can be no chords going from one set to another set. On the other hand the solution for the smaller set in now an optimal sub-problem. Therefore a recurrence involving different configurations of smaller sets is possible. These configurations range in smaller set having variations from 0 to N-1 pairs

Monday, January 22, 2018

Identity – A score to who you are, what you are and where you are
Contents
Identity – A score to who you are, what you are and where you are 1
Introduction: 1
Description: 1
Conclusion: 2

Introduction:
Identity management is a necessity for every online retail business but it involves management chores such as providing various sign-in options to the users so that they may be authenticated and authorized, complying with standards and providing utmost convenience that may prove distractful to their line of business. Federated identity management stepped in to consolidate these activities. You could now sign in to different retail domains and subsidiaries with a single account. Moreover protocols were developed so that identity may be deferred to providers. Interestingly in the recent years social network providers increasingly became a venerable identity provider by themselves. This write-up introduces the notion of score for an identity as an attribute that may be passed along with the identity to subscribing identity consumers. As more and more business participate, score becomes more meaningful metadata for the customer.
Description:
Using scores to represent consumers probably started more than half a century earlier when Fair, Isaac and Co used statistical analysis to translate financial history to a simple score. We may have come a long way in how we measure credit scores for end users but the data belonged to credit bureaus. Credit card companies became authorities in tracking how consumers spend their money and their customers veritably started carrying cards instead of cash. With the rise of mobile phones, mobile payment methods started gaining popularity. Online retail companies want a share of that spend. And the only way they can authenticate and authorize a user to do so was with identity management. Therefore they shared the umbrella of identity management while maintaining their own siloed data regardless of whether they were in the travel industry, transportation industry or the insurance industry. They could tell what the user did on the last vacation, the ride he took when he was there or the claim he made when he was in trouble but there is nothing requiring them to share this data with an identity provider. Social network and mobile applications became smarter to know the tastes the users may have or acquire and they can make ads more personalized with recommendations but there is no federation of trends and history pertaining to a user across these purchases. On the other hand, the classic problem of identity and access management has been to connect trusted users to sensitive resources irrespective of where these users are coming from and irrespective of where these resources are hosted. The term classic here is used to indicate what does not change. In contrast, business models to make these connections have changed. Tokens were invented to represent user access to a client’s resources so that the identity provider does not have to know where the resources are. Moreover, tokens were issued not only to users but also to devices and applications on behalf of the user so that they may have access to different scopes for limited time. Other access models that pertain to tokens as a form of identity are mentioned here. In the case of integrating domains and web sites with the same identity provider, the data pertaining to a customer only increases with each addition. An identity provider merely has to accumulate scores from all these retailers to make a more generalized score associated with the user. This way existing retail companies can maintain their own data while the identity provider keeps a score for the user.
Conclusion:
An identity and access management solution can look forward to more integrated collaboration with participating clients in order to increase the pie of meaningful information associated with an account holder.

Sunday, January 21, 2018

Today we continue our discussion on the AWS papers in software architecture which suggests five pillars:
- Operational Excellence for running and monitoring business critical systems.
- Security to protect information, systems, and assets with risk assessments and mitigation strategies.
- Reliability to recover from infrastructure or service disruptions
- Performance Efficiency to ensure efficiency in the usage of resources
- Cost Optimization to help eliminate unneeded cost and keeps the system trimmed and lean.
The guidelines to achieve the above pillars include:
1. Infrastructure capacity should be estimated not guessed
2. Systems should be tested on production scale to eliminate surprises
3. Architectural experimentation should be made easier with automation
4. There should be flexibility to evolve architectures
5. Changes to the architecture should be driven by data
6. Plan for peak days and test at these loads to observe areas of improvement
We looked at the security pillar and we reviewed its best practices.
They include identity and access management, monitoring controls, infrastructure protection, data protection and incident response.
The identity and access management only
allows authenticated and authorized users to access the resources. In AWS, there is a dedicated IAM service that supports multi-factor authentication.
The monitoring controls are used to identify a potential security incident. In AWS, Cloud Trail logs, AWS API calls and CloudWatch provide monitoring of metrics with alarming.
Infrastructure protection includes control methodologies which are defense in depth. In AWS, this is enforce in Compute Cloud, Container Service and Beanstalk with Amazon Machine Image.
Data protection involves techniques that involve securing data, encrypting it, and putting access controls etc.. In AWS, Amazon S3 provides exceptional resiliency.
Incident response means putting in place controls and prevention to mitigate security incidents. In AWS logging and events provide this service. AWS CloudFormation can be used to study in a sandbox kind of environment.
IAM is the AWS service that is essential security and enabled this pillar of software architecture.

#codingexercise
int GetClosest(List<int> sortedSquares, int number)
{
int start = 0;
int end = sortedSquares.Count-1;
int closest = sortedSquares[start];
while (start < end)
{
closest = Math.Abs(sortedSquares[start]-number) < Math.Abs(sortedSquares[end]-number) ? sortedSquares[start] : sortedSquares[end];
int mid = (start + end ) / 2;
if (mid == start) return closest;
if (mid == end) return closest;
if (sortedSquares[mid] == number)
{
return number;
}
if (sortedSquares[mid] < number)
{
start = mid;
}else{
end = mid;
}
}
return closest;
}