Cluster computing

Saturday, October 14, 2023

This is a continuation of articles on Infrastructure-as-code aka IaC for short. There’s no denying that IaC can help to create and manage infrastructure and that they can be versioned, reused and shared – all of which helps to provision resources quickly and consistently and manage them consistently throughout their lifecycle. Unlike software product code that must be general purpose and provide a strong foundation for system architecture and aspiring to be a platform for many use cases, IaC often varies a lot and must be manifested in different combinations depending on environment, purpose and scale and encompass complete development process. It can even include CI/CD platform, DevOps, and testing tools. The DevOps based approach is critical to rapid software development cycles. This makes IaC spread over in a variety of forms. The more articulated the IaC the more predictable and cleaner the deployments.

The IAC architecture is almost always dominated by the choice of technology stacks. There is no universal system architecture but a more devops oriented tailored approach with all the tools necessary to keep the deployments consistent and repeatable. Technology varies with cloud native forms, providers like Ansible, Terraform, and domain specific language such as Pulumi. IaC can be curated as a set of machine-readable files, descriptive model, configuration template, and an imperative approach. Then there are two approaches for writing it which are an imperative approach and a declarative approach. The imperative approach allows users to specify the exact steps to be taken for a change and the system does not deviate from them while a declarative approach specifies the final form and the tool or platform involved goes through the motion of provisioning them.

Infrastructure can be made available as a service and shared as code. Provisioning infrastructure can be a cloud service and many public clouds offer it in their service portfolio. These so-called native infrastructures are great for leveraging the public cloud built-in features but more than usual, organizations build a veritable library of assets and prefer it to not be limited to any one cloud-based resources. It can even include on-premises infrastructure. No matter what choices are made and the decision process for navigating the IaC landscape, it is unquestionable that IaC reduces shadow IT within organizations, integrates directly with CI/CD platforms, version controlling infrastructure and configuration changes, standardizing infrastructure, effectively managing configuration drift and with the ability to scale up or out without increasing CapEx or OpEx.

Configuration Management is separate from infrastructure management although tools like Ansible provide hybrid solutions. True configuration management is demonstrated by software like CFEngine while infrastructure management is demonstrated by providers like Terraform and Pulumi. Businesses can mix and match any tool and use them in their CI/CD pipelines depending on their custom requirements.

As a real-world example, a developer writes application code and the configuration management related instructions that will trigger actions from the virtualization environment. When the code is delivered, the configuration management and infrastructure management provide a live operational environment for testing. When the tests run and the error detection and resolution occur, the new code changes become ready for deployment to customer facing environments. Managing the state drift as changes keep propagating is one of the core management routines for Infrastructure-as-code.

#codingexercise

Q: An array A of N elements has each element within the range 0 to N-1. Find the smallest element P such that every value that occurs in A also occurs in sequence A[0], A[1] ... A[P]

For example, A = [2,2,1,0,1] and the smallest value of P is 3 where elements 2,2,1,0 contain all values that occur in A.

public int getPrefix(int[] A) {

Int prefix = Integer.MIN_VALUE;

Int n = A.length;

Int visited = new int[n];

for (int i = 0; i < n; i++) {

if (visited[A[I]] == 0){

visited[A[I]] = 1;

prefix = I;

}

return prefix;

}

Friday, October 13, 2023

This is a continuation of articles on Infrastructure-as-code. This one talks about locking of resources. Locks, policies and IaC sometimes compete to provide protection to resources and can overstep on each other with conflicts that require resolution. Each of them has a part to play and cannot be done away with and the hope is that each pipeline run is smooth and leaves a clean state after its run.

This might be wishful thinking when resources that need to be created, modified or deleted have sub-resources or are associated with other resources. With such dependencies, operations might result in an error that states that one or the other has locks on it. Locking is essential to prevent any accidental modifications to the resources. It is assumed that authorized operations will be able to unlock the resources prior to the change and then lock afterwards. With the example of a private endpoint on an Azure public cloud resource to provide private ip address for incoming traffic, associated resources including the parent resource, private links and dns zones might all get locked. Only when all the locks are released, will the operations succeed. This makes it hard to know upfront which locks to acquire and release.

One of the approaches with pipeline automations is the cascaded unlocking of all resources in a resource hierarchy such as a resource group or subscription level. Since the identity with which the Azure operations are performed must be privileged. Only the Owner and User Access Administrator built-in roles can create and delete management locks. The corresponding permissions belong to the Microsoft.Authorization/* or Microsoft.Authorization/locks/* organizational prefix. Custom roles having these permissions could also be sufficient. It might be time consuming to go through all the resources and sub-resources in a resource hierarchy to unlock them first before the operations begin and to lock them at the end and often includes some wait time to be specified in the script. But this leaves the resources in a clean state for the changes to be propagated from the IaC to the management portal for these resources. It is also possible to conditionally run these for changes that carry certain labels or distinguishing features such as a filter on operations.

A policy might act like a catch-all to apply locking where locks are missed out from resources but a policy on the Azure public cloud has a compliance interval of 24 hours. It is also a default allow and explicit deny system If a resource violates a policy, it is marked as non-compliant. The effects that a policy takes are detection or prevention. The IaC code is the ultimate source of truth for the resources and there are ways to specify locks in the IaC for resources that must behave independently from the collective approach taken by the policy. Anytime a policy changes the locks and the IaC is unaware, there is a conflict. It is preferable to keep locking as simple as possible without any customizations for any subset of resources so that the pipeline automation is sufficient to co-ordinate the locking and unlocking.

Finally, it is much easier to do locking and unlocking with command-line interface than execute it elsewhere. Both pipeline scripts and public cloud automations can execute these commands and although Runbooks might not be able to execute them in PowerShell, the az cli can certainly be run via functions or such other resources. Invoking a script for locking or unlocking does not require a resource or its state to change.

Wednesday, October 11, 2023

This is a continuation of a series of articles on the shortcomings and resolutions of Infrastructure-as-code (IaC). One of the commonly encountered situations is when IaC must be defined differently for non-production and production environments. There is a separation that must be maintained between actions taken on non-production and production environments because they require different maintenance. The diligence and rigor for production environments is usually high.

Although software product code and infrastructure-as-code can both be written with reusable modules and a centralized repository and pipeline, Infrastructure can vary between business objectives, and between environments. This results in distinct sets of resources and as such have their own lifecycle and maintenance requirements. Therefore, while the emphasis with software development has been one of system architecture and microservices framework, that for infrastructure is about variations and independent management. This calls for separation not only in declaration and definition but also in pipelines and resource deployments.

It is important to call out that this requirement to keep different sets of infrastructure resources available for different purposes also manifests in IaC as separate folders for various business objectives. Each folder will have its own set of templates, variables, parameters and so on and constitute a logical holistic declaration of all the resources used towards that objective. This might result in an explosion of folders and organizational units within them which makes it difficult to enforce consistency and best practices. The increase in folders must also be matched with investments that enforce consistency possibly as pipeline automation if they fall outside what the compiler can support. Investments in pipeline automations and tests or validations are just as important as they are with the code for software products.

Some of the hierarchy is determined by the IaC compiler and all most favor locality with all the resources and associated definitions to be available within the same folder for generating a plan. The restriction comes from the compiler requiring a root folder for the project to build. In some cases, there are features that allow import by virtue of referencing external modules. A common theme in organizing IaC is the use of common modules that act like a wrapper over primitives so that all the consumers from various projects have dependencies on single point of definitions. This is great for consistency enforcement as well as introduction of optional attributes to resources. Large IaC assets also manifest maturity in their naming conventions, terse definitions and avoidance of unnecessary declarations and dependencies. This suits IaC because the unit of declarations is usually on a resource-by-resource basis.

Another frequently borrowed functionality from automations is scriptability. While IaC manifests the resource declarations, scriptability is sometimes unavoidable when working with various resources, not all of which have feature parity with IaC syntax. This calls for scripts to be made part of IaC and the use of pseudo-resources for this purpose is even facilitated by the compiler. However, it is important to remember that the idempotent and deterministic nature of IaC wins over the changes that scripts go through.

Tuesday, October 10, 2023

This is a summary of the book “How to stay smart in a smart world – why human intelligence still beats algorithms” written by Gerd Gigerenzer and published by MIT Press 2022. He is a psychologist known for his work on bounded rationality and directs the Harding Center for Risk Literacy at the University of Potsdam. He is also a partner at Simply Rational – The Decision Institute.

Recent advances in artificial intelligence have juxtaposed a different form of intelligence to ours and poses a question about the role of either intelligence. With the spectrum of reactions ranging from embracing it openly to being apprehensive about its prevalence or dominance, the author picks out a cautious approach playing on the strengths and avoiding the weaknesses. With several examples and case studies, he argues that one form of intelligence works well in stable environments with well-defined rules while the other will never lose its relevance outside that world.

The salient points from this book include assertions that AI excels in stable environments and follows rules dictated by humans, AI systems don’t perform well in dynamic environments, filled with uncertainty. Humans must try out AI to get best results. In unexplored territory, simple and transparent algorithms perform better than complex ones. Among the negative impacts, ad-based model from social media platforms can be cited. It’s possible to separate human interaction with machine supervision with clear demarcation. For example, self-driving cars could be given their own dedicated lanes where possible. Market hype and profit incentives can lead companies to overcompromise and underdeliver on digital technologies.

AI wins hands down in many games such as chess, Go etc because it learns the game rules that are fixed, it is tuned by human experts and uses brute calculation to determine the best possible move. The better defined and more stable the premise, the better the performance. The flip side is self-evident with facial recognition for instance that works 99.6% of the time. In dynamic environments, the number drops significantly. When UK police scanned the faces of 170000 soccer fans in a stadium for matches with criminal database, 93% of the matches were false.

AI is good at making correlations with huge amounts of data, even some that would have escaped humans, but it cannot recognize scenarios and deal with ambiguity. For example, Maine’s divorce rate and the United States’ per capita consumption of margarine have a significant correlation but it makes no sense. Its these false findings by AI that makes them even harder to replicate leading to a lot of waste and error in areas such as health science and biotechnology and to the tune of hundreds of billions of dollars. Assertions made today such as eat blueberries to prevent memory loss, eat bananas to get higher verbal SAT score, eat kiwis late at night to sleep better etc may just be the opposite in due time.

Whenever the effectiveness of AI decreases, human intervention can significantly boost their performance. The human brain has a remarkable ability to adapt to constantly changing cues, contexts and situations in what is termed as vicarious functioning. Staying smart means leveraging singularity capabilities but staying in charge. AI lacks four components of common sense – a capacity to think casually, an awareness of others’ intentions and feelings, a basic understanding of space, time and objects, and a longing to join in group norms. Some tasks like recommending the nearest restaurant do not need common sense but the detection of a person crossing the road in a war zone as a threat requires it.

Complex problems do not justify complex solutions. Google Flu trends tried to predict the spread of flu with approximately 160 search terms but they still overpredicted doctors’ visits. In comparison, an algorithm from Max Planck Institute for human development simply used one data point: recent visits to the doctor from the CDC website and performed much better in predicting the flu’s spread.

Information when served subliminally or unknowingly have potential to alter our behavior. This is why ad-based model for social media can be harmful by creating distractions. With attention control technology, the user is held captive by these algorithms. Texting while driving has caused 3000 deaths per year in the United States between 2010 and 2020. In areas other than driving, smartphones have proven to be very distracting.

Finally, the business aspect of artificial intelligence must be realized in the context of historical trends with killer technologies and the commerce behind it. The author says we should be able to profit from AI but not be easily misled with expectations and predictions.

Earlier book summaries: BookSummary10.docx

Monday, October 9, 2023

Locking:

Resources can be locked to prevent unexpected changes. A subscription, resource group or resource can be locked to prevent other users from accidentally deleting or modifying critical resources. The lock overrides any permissions the users may have. The lock level can be set to CannotDelete or ReadOnly with ReadOnly being more restrictive. Lock inheritance can be applied at a parent scope, all resources within that scope can then inherit the same lock. Some considerations still apply after locking. For example, a CannotDelete lock on a storage account does not prevent data within that account from being deleted. A read only lock on an application gateway prevents you from getting the backend health of the application gateway because it uses POST. Only Owner and User Access Administrator role members are granted access to Microsoft.Authorization/locks/* actions.

When the IaC is applied, it can be quite frustrating to find the resources locked in the public cloud and preventing the IaC actions to complete. For example, a resource might have a private endpoint which in turn might be associated with a DNS and have a private NIC card and these sub-resources might be locked that prevents the private endpoint from being deleted which in turns fails the IaC application. The resolution for the owner of the subscription is to delete the lock from the said resource via the Azure Portal or the command-line interface and then proceed to apply the locks. And iterate over the ‘apply’ and the ‘unlock’ steps until there are no further obstructions.

While this works for the role with the elevated privileges, many developers using the credentials for the CI/CD pipeline to make changes to the subscription do not have that privilege and might find the experience harrowing to resolve without external intervention. One way that they overcome this unlocking is by applying the unlock commands via a pipeline step prior to the application of the IaC. Fortunately, there are ways to unlock at a global subscription level scope rather than at a resource-by-resource level. Even so, it might not be clear when the locks reappear, and the unlocking might need to be repeated. Checking the policies to make sure that the locking is not enforced automatically, which in turn interferes with the infrastructure changes by code, is a good practice and one that can potentially advise about the intent behind the locking. If the locking were simply to prevent accidental deletions against a broad range of resources, then the unlocking is straightforward for the applying of the changes

Let us make a specific association between say a firewall and a network resource such as a gateway. The firewall must be associated with the gateway to prevent traffic flow through that appliance. When they remain associated, they remember the identifier and the state for each other. Initially, the firewall may remain in detection mode where it is merely passive. It becomes active in the prevention mode. When the modes are attempted to be toggled, the association prevents it. Neither end of the association can tell what state to be in without exchanging information and when they are deployed or updated in place, neither knows about nor informs the other.

The above resolutions are easy when the error messages are descriptive and indicate that the failure of the IaC is exclusively due to locks. There are other forms of errors where the cause may not be straightforward. In such cases, the activity log on the resources or at the subscription level can be quite helpful when the json content of a logged event explains exactly what happened. This particular feature is also helpful to know if something transpired by actions of something other than the deployment of the infrastructure changes.

Sunday, October 8, 2023

These are some more additions to the common errors faced during the authoring and deployment of Infrastructure-as-Code aka IaC artifacts along with their resolutions:

First, resources might pass the identifier of one to another by virtue of one being created before the other and in some cases, these identifiers might not exist during compile time. For example, the code that requires to assign an rbac based on the managed identity of another resource might not have it during compile time and only find it when it is created during execution time. The rbac IaC will require a principal _id for which the managed identity of the resource created is required. This might require two passes of the execution – one to generate the rbac principal id and another to generate the role assignment with that principal id.

The above works for newly created resources with two passes but it is still broken for existing resources that might not have an associated managed identity and the rbac IaC tries to apply a principal id when it is empty. In such cases, no matter how many times the role-assignment is applied, it will fail due to the incorrect principal id. In this case, the workaround is to check for the existence of the principal id before it is applied.

A second type of case occurs when the application requires ip address to be assigned for explaining the elaborate firewall rules required based on ip address value rather than references and the ip address is provisioned in the portal before the IaC is applied. This IaC then requires to import the existing pre-created ip address into the state so that the IaC and the state match.

Third, there may be objects in the Key Vault that were created as part of the prerequisites for the IaC deployment and now their ids need to be reconciled with the IaC. Again, the import of that resource into the state would help the IaC provider to reconcile the actual with the expected resource.

Fourth, the friendly names are often references to actual resources that may have long been dereferenced, orphaned, changed, expired, or even deleted. The friendly names, also called keys, are just references and hold value to the author in a particular context but the same author might not guarantee that the moniker is in fact consistently used unless there are some validations and review involved.

Fifth, there are always three stages between design and deploy of Infrastructure-as-code which are “init”, “plan” and “apply” and they are distinct. Success in one stage does not guarantee success in the other stage especially holding true between plan and apply stages. Another limitation is that the plan can be easily validated on the development machine but the apply stage can be performed only as part of pipeline jobs in commercial deployments. The workaround is to scope it down or target a different environment for applying.

Sixth, the ordering and sequence can only be partially manifested with corresponding attributes to explain dependencies between resources. Even if resources are self-descriptive, combination of resources must be carefully put-together by the system for a deterministic outcome.

These are only some of the articulations for the carefulness required for developing and deploying IaC.

Saturday, October 7, 2023

This is a summary of the book “How to say it for First-Time Managers” written by Jack Griffin and published by Prentice Hall Press, 2010. This book teaches winning words and strategies for earning your team’s confidence.

Managers must be able to communicate with their reports. If newly appointed managers can’t communicate their ideas, directions and instructions, the areas they supervise will fall apart. By paying attention to what needs to be said and how and when it needs to be said, this books provides invaluable advices to newbies. The author suggests the best words to use and those to avoid and even the body language that an inexperienced manager must adopt.

The language of leadership is both verbal and non-verbal. Effective leadership requires effective communication. The best posture is one that imparts a sense of relaxed energy. The eyes must be wide open during direct communications. Fidgeting or yawning must be avoided. Signaling an engagement by nodding or leaning forward is necessary. Eyes, ears, or nose must not be rubbed because they signal doubt. Similarly, scratching your head signals confusion. Smiling is very helpful.

Leadership language fluency helps new managers establish authority and credibility. The language of business concerns money and time. Words explain, motivate, encourage, discourage, inspire, depress, demand, invite, guide, mislead, clarify, confuse, hearten, and terrify. The author mentions ten touchstones for day-to-day communications which include 1. Accountability where someone is responsible for something, 2. Collaboration where teamwork is essential to business, 3. Decisions where conflicts are resolved and trade-offs are balanced, 4. Ethics for guarding against falls, 5. Evaluations for making value judgements, 6. Excellence – for leading the reports to high-quality work, 7. Learning to involve distilling knowledge from experience, 8. Mission – for a well defined sense of purpose, 9. Performance for continuous improvement, and 10. Quality for business that can succeed with excellence.

“Every Manager needs a useful, effective, and productive vocabulary.” Part of the vocabulary builds with “active listening” because by repeating what the other person says, co-operation is earned. Avoiding shaking the head that signals a rejection, keeping eye contact for the person to feel engaged, never lowering the chin because it signals defensiveness and avoiding or alleviating “rapid breathing” because it suggests anxiety, are some of the ways in which negatives can be balanced.

On the first day as a manager, always speak from knowledge, says the author and if there is doubt, not to say anything. Plan how to conduct the meetings, the preamble, body or the epilogue. Pausing before speaking can imply confidence and self-assuredness. Focusing on what one is going to say is mutually helpful to the speaker and the listener.

Clarity in written and spoken communication depends on speaking to the point and staying focused. The five W’s approach delineating who, what, when, where, and why can help in this regard. Using a step-by-step format in chronological order is much better than a long narrative. All rules, policies and procedures must be written out. Do not delegate work by starting out with a pep talk. Goals must be specified in the order where the intent is laid out, the benefits explained, the fit within the big picture, the reachability of the goal discussed, calling out the tasks that are necessary, delegating those tasks, and explaining what and when a task must be completed. Praise is much better than criticism for motivation but give it with a story. Supportive words include reset, overcome, self-starter, and retry. Negative responses must be provided with an explanation. Meetings must have agenda; it must never be a monologue and ideas must be requested. Ideas must also be examined.