Cluster computing

Tuesday, February 14, 2023

One of the architectural patterns for application migration is about managing AWS Service Catalog products in multiple AWS Accounts and AWS Regions. AWS Service Catalog is used to create, share organize and govern the curated IaC templates. Governance and distribution of Infrastructure is simplified and accelerated. AWS uses CloudFormation Templates to define a collection of AWS resources aka stacks required for a solution or a product. StackSets extend this functionality by enabling us to create, update or delete stacks across multiple accounts and AWS Regions with a single operation.

If a CloudFormation template must be made available to other AWS accounts or organizational units, then the portfolio is typically shared. A portfolio is a container that includes one or more products.

On the other hand, this architectural pattern is an alternative approach that is based on AWS CloudFormation StackSets. Instead of sharing portfolio, we use AWS StackSet constraints to set AWS regions and accounts where the resources can be deployed and used. This approach helps to provision the Service Catalog products in multiple accounts, OUs and AWS Regions, and managed from a central location which meets governance requirements.

The benefits of this approach are the following:

1. the product is provisioned and managed from a primary account, and not shared with other accounts.

2. This approach provides a consolidated view of all provisioned products (stacks) that are based on a specific set of templates.

3. The use of a primary account makes the configuration with AWS Service management Connector easier

4. It is easier to query and use products from the AWS Service Catalog.

The architecture involves an AWS management account and a target Organizational Unit (OU) account. The CloudFormation template and the service catalog product are in the management account. The CloudFormation stack and its resources are in the target OU account. The user creates an AWS CloudFormation template to provision AWS resources, in JSON or Yaml format. The CloudFormation template creates a product in AWS Service Catalog, which is added to a portfolio. The user creates a provisioned product, which creates CloudFormation stacks in the target accounts. Each stack provisions the resources specified in the CloudFormation templates.
The steps to provision products across accounts include: 1. Creating a portfolio say with the AWS command line interface 2. Create the template that describes the resources, 3. Create a product with version title and description and 4. Apply constraints to the portfolio to configure product deployment options such as multiple AWS accounts, regions and permissions and 5. Provide permissions to users so that they can launch the products in the portfolio.

Monday, February 13, 2023

How to draw a graph image?

A simple way to do it is to use prebuilt libraries using algorithms like the Kamada-Kawai and Fructerman-Reingold layout algorithms

Sample implementation:

From igraph import *

g = Graph()

vertex_labels=[‘A’, ‘B’, ‘C’, ‘D’, ‘E’]

attributes={}

attributes[“label”] = vertex_labels

g.add_vertices(5, **attributes).add_edges([(0, 1),

(1, 2),

(2, 3),

(3, 4),

(4, 0),

(2, 4),

(0, 3),

(4, 1)]

layout = g.layout(“kamada_kawai”)

plot(g, layout=layout)

If we control spacing ourselves to spread the edges with little or no crossing lines, we could anneal the costs of overlapping.

# The following program lays out a graph with little or no crossing lines.

from PIL import Image, ImageDraw

import math

import random

vertex = ['A','B','C','D','E']

links=[('A', 'B'),

('B', 'C'),

('C', 'D'),

('D', 'E'),

('E', 'A'),

('C', 'E'),

('A', 'D'),

('E', 'B')]

domain=[(10,370)]*(len(vertex)*2)

def randomoptimize(domain,costf):

best=999999999

bestr=None

for i in range(1000):

# Create a random solution

r=[random.randint(domain[i][0],domain[i][1]) for i in range(len(domain))]

# Get the cost

cost=costf(r)

# Compare it to the best one so far

if cost<best:

best=cost

bestr=r

return r

def annealingoptimize(domain,costf,T=10000.0,cool=0.95,step=1):

# Initialize the values randomly

vec=[float(random.randint(domain[i][0],domain[i][1]))

for i in range(len(domain))]

while T>0.1:

# Choose one of the indices

i=random.randint(0,len(domain)-1)

# Choose a direction to change it

dir=random.randint(-step,step)

# Create a new list with one of the values changed

vecb=vec[:]

vecb[i]+=dir

if vecb[i]<domain[i][0]: vecb[i]=domain[i][0]

elif vecb[i]>domain[i][1]: vecb[i]=domain[i][1]

# Calculate the current cost and the new cost

ea=costf(vec)

eb=costf(vecb)

p=pow(math.e,(-eb-ea)/T)

# Is it better, or does it make the probability

# cutoff?

if (eb<ea or random.random( )<p):

vec=vecb

# Decrease the temperature

T=T*cool

return vec

def crosscount(v):

# Convert the number list into a dictionary of person:(x,y)

loc=dict([(vertex[i],(v[i*2],v[i*2+1])) for i in range(0,len(vertex))])

total=0

# Loop through every pair of links

for i in range(len(links)):

for j in range(i+1,len(links)):

# Get the locations

(x1,y1),(x2,y2)=loc[links[i][0]],loc[links[i][1]]

(x3,y3),(x4,y4)=loc[links[j][0]],loc[links[j][1]]

den=(y4-y3)*(x2-x1)-(x4-x3)*(y2-y1)

# den==0 if the lines are parallel

if den==0: continue

# Otherwise ua and ub are the fraction of the

# line where they cross

ua=((x4-x3)*(y1-y3)-(y4-y3)*(x1-x3))/den

ub=((x2-x1)*(y1-y3)-(y2-y1)*(x1-x3))/den

# If the fraction is between 0 and 1 for both lines

# then they cross each other

if ua>0 and ua<1 and ub>0 and ub<1:

total+=1

for i in range(len(vertex)):

for j in range(i+1,len(vertex)):

# Get the locations of the two nodes

(x1,y1),(x2,y2)=loc[vertex[i]],loc[vertex[j]]

# Find the distance between them

dist=math.sqrt(math.pow(x1-x2,2)+math.pow(y1-y2,2))

# Penalize any nodes closer than 50 pixels

if dist<50:

total+=(1.0-(dist/50.0))

return total

def drawnetwork(loc):

#create the image

img = Image.new('RGB', (400,400),(255,255,255))

draw=ImageDraw.Draw(img)

#create the position dict

pos=dict([(vertex[i],(loc[i*2],loc[i*2+1])) for i in range(0, len(vertex))])

#Draw Links

for (a,b) in links:

draw.line((pos[a],pos[b]), fill=(255,0,0))

#Draw vertex

for (n,p) in pos.items():

draw.text(p,n,(0,0,0))

img.save('graph.jpg', 'JPEG')

img.show()

sol=randomoptimize(domain,crosscount)

crosscount(sol)

sol=annealingoptimize(domain,crosscount,step=50,cool=0.99)

crosscount(sol)

drawnetwork(sol)

Sunday, February 12, 2023

Previous article continued

Transformers changed that and in fact, were developed for the purposes of translation. Unlike RNNs, they could be parallelized. This meant that transformers could be used to train on large data sets. GPT-3 that writes poetry and code and writes conversations was trained on almost 45 Terabytes of text data and including the entire world wide web. It scales really well with a huge data set.

Transformers work very well because of three components: 1. Positional Encoding, 2. Attention and 3. Self-Attention. Positional encoding is about enhancing the data with positional information rather than encoding it in the structure of the network. As we train the network on lots of text data, the transformers learn to interpret those positional encodings. It really helped transformers easier to train than RNN. Attention refers to a concept that originated from the paper aptly titled “Attention is all you need”. It is a structure that allows a text model to look at every single word in the original sentence when making a decision to translate the word in the output. A heat map for attention helps with understanding the word and its grammar. While attention is for understanding the alignment of words, self-attention is for understanding the underlying meaning of a word so as to disambiguate it from other usages. This often involves an internal representation of the word also referred to as its state. When attention is directed towards the input text, there can be differences understood between say “server, can I have the check” and the “I crashed the server” to interpret the references to a human versus a machine server. The context of the surrounding words helps with this state.

BERT, an NLP model make use of attention and can be used for a variety of purposes such as text summarization, question answering, classification and finding similar sentences. BERT also helps with Google search and Google cloud AutoML language. Google has made BERT available for download via TensorFlow library while Hugging Face company has made Transformers available in Python language.

Saturday, February 11, 2023

A conjecture about Artificial Intelligence:

ChatGPT is increasingly in the news for its ability to mimic a human conversationalist and for being versatile. It introduces a significant improvement over the sequential encoding with the GPT-3 family of parallelizable learning. This article wonders if the state considered to be in summation form so that as the text continues to be encoded, the overall state is continuously accumulated in a streaming manner. But first a general introduction to the subject.

ChatGPT is based on a type of neural network called a transformer. These are models that can translate text, write poems and op-ed and even generate code. Newer natural language processing (NLP) models like BERT, GPT-3 or T5 are all based on transformers. Transformers are incredibly impactful as compared to their predecessors that were also based on neural networks. Just to recap, neural networks are models for analyzing complicated data by discovering hidden layers that represent the lateny semantics in a given data and often referred to as the hidden layer between an input layer of data and an output layer of encodings. Neural networks can handle a variety of data including images, video, audio and text. There are different types of neural networks optimized for different type of data. If we are analyzing images, we would typically use a convolutional neural network so called because it often begins with embedding the original data into a common shared space before it undergoes synthesis, training and testing. The embedding space is constructed from an initial collection of representative data. For example, it could refer to a collection of 3D shapes drawn from real world images. The space organizes latent objects between images and shapes which are complete representation of objects. The use of CNN is a technique that focuses on the salient invariant embedded objects rather than the noise.

CNNs worked great for detecting objects in images but did not do as well for languages or tasks such as summarizing text or generating text. The Recurrent Neural Network was introduced to adjust this for language tasks such as translation where an RNN would take a sentence from the source language and translate it to a destination language one word at a time and then sequentially generate the translations. Sequential is important because the words order matter for the meaning of a sentence as in “The dog chased the cat” versus “The cat chased the dog”. RNNs had a few problems. They could not handle large sequences of text, like long paragraphs. They were also not fast enough to handle large data because they were sequential. The ability to train on large data is considered a competitive advantage when it comes to NLP tasks because the models become more tuned.

(to be continued...)

Friday, February 10, 2023

The previous post discussed moving enterprise builds and deployments to the cloud for massive repositories. This article explores the significant other challenges that must also be overcome to do so.

The environment provisioning and manual testing of the environment is critical for the DevOps processes to gain popularity. As with any DevOps practice, the principles on which they are founded must always include a focus on people, process and technology. With the help of Infrastructure-as-a-code and blueprints, resources, policies and accesses can be packaged together and become a unit of provisioning the environment. Not all such deployments will cater to all the teams. Instead of allowing teams to customize, there must be some T-shirt sizing and pre-defined flavors for the environment. Common examples are development, stage and production for flavors and sizes might involve small, medium or large. Roles could be administrator, contributor and reader while naming conventions can include well-understood prefixes and/or suffixes. When organizations look for environments to deploy code, they might not plan for all of the resources or the best practices for their configuration. Having the environment to be well-thought out and pre-defined templates allows them to focus more on their business objectives. Unfortunately, the task of creating environments might be daunting for a centralized organization wide IT staff and especially when there are cross-cutting concerns. In such cases, it is better to delegate the choices to the customers and then use a consensus to prepare the environments. When the environment choices and configurations are baked, they can be made reusable, and a registry could be used to maintain their versions and share them out.

Container images, source code repositories and release repositories are all examples of reusability of the artifacts that they store. The public cloud comes with many aspects of automations, functions, runbooks and logic apps that can help ease the customizations that individual teams need. Creating a pipeline often involves programmability for these teams to leverage but it is not possible to anticipate all their logic or provide a library of common routines. Only as usage increases over time, it is possible to curate scripts.

In the build and development, there are always cyclical dependencies that are encountered at various scopes and levels. These must be broken by forming an initial artifact or logic with which the subsequent cycles can work with. The creation of a dependency container, for example, might require the creation of a dependency which in turn might require the container to package the dependency. This can be broken by creating the first version of the dependencies and then allowing the subsequent to use the iterative versions. Cyclical dependencies are a symptom of trouble that must be actively sought to be eliminated with more streamlined processes so that there is no remediation required.

The DevOps Adoption RoadMap has evolved over time. What used to be Feature Driven Development around 1999 gave way to Lean thinking and Lean software development around 2003, which was followed by Product development flows in 2009 and Continuous Integration/Delivery in 2010. The DevOps Handbook and the DevOps Adoption Playbook is recent as of the last 5-6 years. Principles that inform practices that resolve challenges also align accordingly. For example, the elimination of risk happens with automated testing and deployments and this resolves the manual testing, processes, deployments and releases.

Thursday, February 9, 2023

Massive Code Repositories, scalable builds and DevOps Processes hosted in the cloud.

As enterprises and organizations survey their applications and assets to be moved to the cloud, one of the often-overlooked processes involve the entrenched and almost boutique build systems that they have invested in over the years. The public clouds advocate the use of cloud native DevOps pipeline and automations that work well for new repositories and small projects but when it comes to billion dollars plus revenue generating source code assets, the transition of build and deployment to cloud become surprisingly challenging.

New code projects and businesses can start out with a code repository in GitHub or GitHub Enterprise with files conforming to the 100MB limit and the repository sizes conforming to the 5GB limit. When we start clean, on the Cloud based DevOps, managing the inventory to retain only text in the source and move the binaries to an object storage is easy. When the enterprises have accrued massive repositories over time, even a copy operation becomes difficult to automate. What used to be robocopy on windows involving large payloads, must now involve a transfer over S3.

One of the first challenges in the movement of build and capacity planning infrastructure to the cloud is to prepare the migration. External dependencies and redundancies can cause these repositories to become very large, not to mention branches and versions. Using a package manager or their equivalents to separate out the dependencies into their packages can be helpful to their reusability. Bundler, Node’s package manager and Maven are testament to this effect. Object storage or Artifactory and their equivalents can store binary data and executables. Backup and restore can easily be added from Cloud Services when they are not configurable via the respective cloud services.

Another challenge is the proper mapping of infrastructure to handle such large processes involved in Continuous Integration and Continuous Deployment. GitHub Enterprise can provide up to 32 cores and 50000 minutes/month for public repositories of sizes up to 50GB. The cloud, on the other hand, is limitless compute, storage and networking, all with the convenience of pay-as-you-go billing. If there is effective transformation of DevOps automations, both the infrastructure required and the automations they support become easier to host in the cloud. As with the first challenge, the ability to take stock of the inventory for infrastructure resources and automation logic becomes daunting. Consequently, some form of organization and nomenclature to divide up the inventory into sizeable chunks can help with the transformation and even parallelization

A third challenge involved is environmental provisioning and manual testing. Subscriptions, resource groups, regions and configurations proliferate in the cloud when such DevOps are transformed and migrated. These infrastructure and state become veritable assets to guard just the same way as the source that are delivered with the DevOps. From importing and exporting these infrastructure-as-a-code templates as well as their states and forming blueprints that can include policies and reconcile the state become a necessity. A proper organization and naming convention are needed for these as well.

Other miscellaneous challenges include but are not limited to forming best practices and centers of excellence, creating test data, providing manual deployments and overrides, ensuring suppliers, determining governance, integrating an architecture for tools say in the form of runbooks, manual releases, determination of telemetry, determining teams and managing accesses, supporting regulatory compliance, providing service virtualization, and providing education for special skillsets. In addition, managing size and inconsistencies, maintaining the sanctity as a production grade system and providing an escalation path for feedback and garnering collaboration across a landscape of organizations and teams must be dealt with.

Finally, people, process and technology must come together in a planned and streamlined manner to make this happen. These provide a glimpse of the roadmap towards the migration of build and deployments to the cloud.

Wednesday, February 8, 2023

Public clouds provide an adoption framework for businesses that helps to create an overall cloud adoption plan that guides programs and teams in their digital transformation. The plan methodology provides templates to create backlogs and plans to build necessary skills across the teams. It helps rationalize the data estate, prioritize the technical efforts, and identify the data workloads. It’s important to adhere to a set of architectural principles which help guide development and optimization of the workloads. A well-architected framework stands on five pillars of architectural excellence which include:

- Reliability (REL)

- Security (SEC)

- Cost Optimization (COST)

- Operational Excellence (OPS)

- Performance efficiency (PERF)

The elements that support these pillars are a review, a cost and optimization advisor, documentation, patterns-support-and-service offers, reference architectures and design principles.

This guidance provides a summary of how these principles apply to the management of the data workloads.

Cost optimization is one of the primary benefits of using the right tool for the right solution. It helps to analyze the spend over time as well as the effects of scale out and scale up. An advisor can help improve reusability, on-demand scaling, reduced data duplication, among many others.

Performance is usually based on external factors and is very close to customer satisfaction. Continuous telemetry and reactiveness are essential to tuned up performance. The shared environment controls for management and monitoring create alerts, dashboards, and notifications specific to the performance of the workload. Performance considerations include storage and compute abstractions, dynamic scaling, partitioning, storage pruning, enhanced drivers, and multilayer cache.

Operational excellence comes with security and reliability. Security and data management must be built right into the system at layers for every application and workload. The data management and analytics scenario focus on establishing a foundation for security. Although workload specific solutions might be required, the foundation for security is built with the Azure landing zones and managed independently from the workload. Confidentiality and integrity of data including privilege management, data privacy and appropriate controls must be ensured. Network isolation and end-to-end encryption must be implemented. SSO, MFA, conditional access and managed service identities are involved to secure authentication. Separation of concerns between azure control plane and data plane as well as RBAC access control must be used.

The key considerations for reliability are how to detect change and how quickly the operations can be resumed. The existing environment should also include auditing, monitoring, alerting and a notification framework.

In addition to all the above, some consideration may be given to improving individual service level agreements, redundancy of workload specific architecture, and processes for monitoring and notification beyond what is provided by the cloud operations teams.

Each pillar contains questions for which the answers relate to technical and organizational decisions that are not directly related to the features the software to be deployed. For example, a software that allows people to post comments must honor use cases where some people can write and others can read. But the system developed must also be safe and sound enough to handle all the traffic and should incur reasonable cost.

Since the most crucial pillars are OPS and SEC, they should never be traded in to get more out of the other pillars.

The security pillar consists of Identity and access management, detective controls, infrastructure protection, data protection and incident response. Three questions are routinely asked for this pillar:

1. How is the access controlled for the serverless api?

2. How are the security boundaries managed for the serverless application?

3. How is the application security implemented for the workload?

The operational excellence pillar is made up of four parts: organization, preparation, operation, and evolution. The questions that drive the decisions for this pillar include:

1. How is the health of the serverless application known?

2. How is the application lifecycle management approached?

The reliability pillar is made of three parts: foundations, change management, and failure management. The questions asked for this pillar include:

1. How are the inbound request rates regulated?

2. How is the resiliency build into the serverless application?

The cost optimization pillar consists of five parts: cloud financial management practice, expenditure and usage awareness, cost-effective resources, demand management and resources supply, and optimizations over time. The questions asked for cost optimization include:

1. How are the costs optimized?

The performance efficiency pillar is composed of four parts: selection, review, monitoring and tradeoffs. The questions asked for this pillar include:

1. How is the performance optimized for the serverless application?

In addition to these questions, there’s quite a lot of opinionated and even authoritative perspectives into the appropriateness of a framework and they are often referred to as lenses. With these forms of guidance, a well-architected framework moves closer to reality.