Cluster computing

Wednesday, June 14, 2023

How to address IaC shortcomings – Part 3?

A previous article discussed a resolution to IaC shortcomings for declaring resources with configuration not yet supported by an IaC repository. This article discusses another case where the IaC does not fully address all concerns for packaging a solution specifically blueprints.

As a recap, Azure Blueprints can be leveraged to allow an engineer or architect to sketch a project’s design parameters, define a repeatable set of resources that implements and adheres to an organization’s standards, patterns and requirements. It is a declarative way to orchestrate the deployment of various resource templates and other artifacts such as role assignments, policy assignments, ARM templates, and Resource Groups. Blueprint Objects are stored in the CosmosDB and replicated to multiple Azure regions. Since it is designed to setup the environment, it is different from resource provisioning. This package fits nicely into a CI/CD.

With Azure templates, one or more Azure resources can be described with a document, but it doesn’t exist natively in Azure and must be stored locally or in source control. Once those resources deploy, there is no active connection or relationship to the template.

Other IaC providers like Terraform also have features such that it tracks the state of the real-world resources which makes Day-2 and onward operations easier and more powerful and with Azure Blueprints, the relationship between what should be deployed and what was deployed is preserved. This connection supports improved tracking and auditing of deployments. It even works across several subscriptions with the same blueprint.

Typically, the choice is not between a blueprint and a resource template because one comprises the other but between an Azure Blueprint and a Terraform tfstate. They differ in their organization methodology as top-down or bottom-up. Blueprints are great candidates for compliance and regulations while Terraform is preferred by developers for their flexibility. Blueprints manage Azure resources only while Terraform can work with various resource providers.

Once the choice is made, some challenges will require to be tackled next. The account with which the IaC is deployed and the secrets it must know for those deployments to occur correctly are something that works centrally and not in the hands of individual end-users. Packaging and distributing solutions for end-users is easier when these can be read from a single source of truth in the cloud, so at least the location in the cloud for the solution to read and deploy the infrastructure must be known beforehand.

The organization can make use of the best of both worlds with a folder structure that separates the Terraform templates into a folder called ‘module’ and the ARM Templates in another folder at the same level and named something like ‘subscription-deployments’ and includes native blueprints and templates. The GitHub workflow definitions will leverage proper handling of either location or trigger the workflow on any changes to either of these locations.

Finally, PowerShell scripts can help with both the deployment as well as the pipeline automations. There are a few caveats with scripts because the general preference is for declarative and idempotent IaC rather than script whether those attributes are harder to enforce, and the logic quickly expands to cover a lot more than originally anticipated. All scripts can be stored in folders with names ending with ‘scripts’.
These are sufficient to address the above-mentioned shortcomings in the Infrastructure-as-Code.

Tuesday, June 13, 2023

How to address IaC shortcomings – Part 2?

A previous article discussed a resolution to IaC shortcomings for declaring dependencies between resources for deployment to public cloud. This article discusses another case where the IaC does not yet fully capture the configurations possible for a resource.

As a recap, almost all IaC providers try to keep pace with the new features being added to a resource type and while the format of the template can vary between say Azure Resource Manager and Terraform, the IaC provider is usually the resource provider as well. Terraform is universally extendable through providers that furnish IaC for resource types. It’s a one-stop shop for any infrastructure, service, and application configuration. It can handle complex order-of-operations and composability of individual resources and encapsulated models. It is also backed by an open-source community for many providers and their modules with public documentation and examples. Microsoft also works directly with the Terraform maker on building and maintaining related providers and this partnership has gained widespread acceptance and usage. Perhaps, one of the best features is that it tracks the state of the real-world resources which makes Day-2 and onward operations easier and more powerful.

ARM templates are entirely from Microsoft consumed internally and externally as the de facto standard for describing resources on Azure and with their import and export options. There is a dedicated cloud service called the Azure Resource Manager service that expects and enforces this convention for all resources to provide effective validation, idempotency and repeatability.

Some features including the sought-after preview features are delayed from inclusion in the templates until General Acceptance, but they should still be available from the ARM template. In such cases, leveraging mixed templates in the IaC source helps to bridge the gap between what can be defined in the IaC and what can be made available from the public cloud. We strive to reduce the drift between the public cloud and the IaC.

The folder structure will separate the Terraform templates into a folder called ‘module’ and the ARM Templates will be located in another folder at the same level and named something like ‘subscription-deployments’ and include native blueprints and templates. The GitHub workflow definitions will leverage proper handling of either location or trigger the workflow on any changes to either of these locations.

It is preferable not to save state in the IaC source code repository and if necessary, it can be stored in the public cloud itself.

These are sufficient to address the above-mentioned shortcomings in the Infrastructure-as-Code.

Monday, June 12, 2023

How to address IaC shortcomings?

Dependencies between instances of the same resource types go undetected in Infrastructure-as-code aka IaC but are still important to resource owners. The knowledge that two resources of the same resource type have a dependency as a caller-callee cannot remain tribal knowledge and impacts the sequence at the time of both creation and destruction. Different IaC providers have different syntax and semantics associated with expressing dependencies, but none can do away with it. At the same time, their documentation suggests using these directives as a last resort and often for one resource type dependency on another. In such cases, some prudence is necessary.

When the dependency is an entire module, this directive affects the order in which the deployment rolls out. The IaC runtime will process all the resources and data sources associated with that module. If the resource requires information generated by another resource such as its assigned and dynamic public IP address, then those references can still be made part of the attributes of this resource without requiring the directive to declare dependencies. The runtime will know that the references imply and implicit dependency. In such cases, it is not necessary to manually define the dependencies between on other resources. When the dependencies are hidden such as access control policies must be managed and actions must be taken that require those policies to be present, then the directive becomes necessary. This directive does not impact replacements when the dependency undergoes a change. For that reason, a different directive is specified to cascade parent resource replacements when there is a change to the referenced resource or attribute.

At the time of deployment, none of the resources are operational. So, caller-callee relationships do not justify a depends_on directive to be specified. Also, if for any reason one resource was required to be present for the other to be created, the idempotency of the IaC allows the deployment to be run again so that if that creation order is not met, it will succeed the next time over because one of the two resources would go through since there is at least one that does not have a dependency. If a dependency must still be specified to get the order right the first time between resources of the same resource type, it is possible to sequentially specify them in the IaC source code. Finally, if the program order is not maintained correctly, it should be possible to introduce pseudo-attributes to these two resources of the same resource type that define different references to other hybrid resource types that have predetermined order established by virtue of being different resource types. These marginal references can be made to local-only resources such as those for generating private-keys, issuing self-signed TLS certificates, and even generating random ids. These serve as the same glue to help connect “real” infrastructure objects. These local resources have the added advantage of being shallow as in being visible only to the IaC runtime and not the cloud as well as being persisted only in the state referenced by the runtime.

Finally, the dependency information can indeed be used as a last resort by storing all dependencies in a separate store that can be queried dynamically at the time that the dependency information becomes relevant

Sunday, June 11, 2023

Thoughts on Healthcare data and technology integrations

The following case study pertains to a massive Healthcare mission to modernize data and technology operations. A previous article discussed public cloud adoption for the same cause and the journey therein. This article discusses future improvements in terms of overall ambition and growth.

First, it must be said that the mission is just over a decade old and has made significant strides in all areas of computing including migrations and modernizations towards the public cloud. For instance, the detection and redressal of incorrect or fraudulent payment activities alone are in the range of hundreds of millions of dollars each year. Similarly, the use of technology stacks is as exhaustive as it could possibly be, with mention about the use of all forms of latest languages, libraries, packages, and hosting solutions including Golang and Python with significant investments in GPU based Machine learning models, old and new data science and analysis software including statistical analysis and use of most recent versions of each for reducing technical debt and overhead. There are internal portals to request all kinds of resources for computing, storage and network as well as cloud infrastructure and Platform-as-a-service stacks demonstrating the best practices in the industry.

Second, there are multiple levels of development, architecture, and cross cutting initiatives that the venture has already tried and tested, which although began on-premises has also nimbly moved to the cloud. Leaders and pioneers have forged significant trends and patterns with little or no concerns about organizational mindsets about remaining on-premises or adhering to old and outdated practice.

With this background, some of the improvements that can be called out are in the areas of data and machine learning pipelines because the infrastructure is a significant contributor to the spirit and practice of exploring new opportunities for business improvements. The next sections focus on these independently:

Data pipelines:

Some of the lessons learned from centralizing data and moving on-premises data to the cloud are that access to the data and the manner of use are just as significant as making the data available. Pass-through authentication allows a variety of clients to access data so that the callers are responsible individually for their actions.

Another significant area of challenge is that network utilization becomes increasingly a bottleneck as data transfers must be spread across time and calendar to justify the bandwidth that they consume.

The third area of challenge involves setting up a variety of central data stores for structured, unstructured and event or streaming data architectures.

Saturday, June 10, 2023

Nesting Azure Active Directory Security Groups:

Role-based access control aka RBAC is a technique for granting access to resources based on roles assigned to users. Both roles and resources are independent of each other and so are the security principals who could be users or groups. Groups help to manage RBAC assignments so that users can be added to or removed from groups without changing the assignments. Roles, groups, and assignments are used to control access to resources and are often created only as many as necessary.

Some examples of roles include owner, contributor, and reader roles. These are universally applicable to users and groups across resources. When the current practice involves creating an Active Directory group for the purpose of controlling access to a specific resource, organizations tend to generate a lot of security groups. While resources can also be grouped and roles assigned to the scope of resource groups, it is likely that those resources have independent use cases which makes proliferation less likely to avoid.

Two techniques come to the rescue of the business to tame the number of groups, roles, and assignments. First is the use of a custom role and the second is the nesting of groups.

The custom role is inherently an entitlement to a collection of permissions. These permissions must be included based on least privileges required and cover as many permissions as necessary to facilitate one or more use cases. Since a use case could involve multiple resources, the permissions in the set could be a mixed lot. The higher the scope the more the number of resources and thereby more mixed the set of permissions. By being more inclusive in the customization of roles and assignments, we come up with definitions than those with built-in definitions from the resource providers.

Some people like to keep the permissions independent even for customizations because they can be assigned to different resources with the notion that one resource can be done away with while the other persists longer. This can arise from changing business needs where ownership and operations are no longer managed together. Besides, the resource grouping is generally done purely from the point of view that resources in a group will have the same lifetime and that deletion of the group will remove all the resources. Expansive roles and combining resources can therefore require different security groups for each even for the same set of users. One way to manage this is to treat two sets of security groups as one requiring another. One security group allows the users to be grouped and the nesting of one group into another gives those same users the roles and permissions necessary to complete their use cases. Nesting of security groups might not have been allowed on-premises, but it is most likely, allowed in the cloud.

The permission sets must also have some categories as they cannot all be privileges that allow access. Some must be called out as deny-permissions to clearly demarcate the utility of the custom role. Similarly, the permissions for control and data path must be called out independently of one another. There can be other categories as well, but these are widely used.

Lastly, there are several ways to organize hybrid entities but order and method help to make it easier. Also, prioritizing the use cases and determining the impact that they have, help to address the most important ones rather than cover all. Also, maintaining a sorted order of high to low privileges or being mindful of grading with each articulation and organization, helps to be exhaustive where necessary.

NesƟng Azure AcƟve Directory Security Groups:

Role-based access control aka RBAC is a technique for granƟng access to resources based on roles

assigned to users. Both roles and resources are independent of each other and so are the security

principals who could be users or groups. Groups help to manage RBAC assignments so that users can be

added to or removed from groups without changing the assignments. Roles, groups, and assignments

are used to control access to resources and are oŌen created only as many as necessary.

Some examples of roles include owner, contributor, and reader roles. These are universally applicable to

users and groups across resources. When the current pracƟce involves creaƟng an AcƟve Directory group

for the purpose of controlling access to a specific resource, organizaƟons tend to generate a lot of

security groups. While resources can also be grouped and roles assigned to the scope of resource

groups, it is likely that those resources have independent use cases which makes proliferaƟon less likely

to avoid.

Two techniques come to the rescue of the business to tame the number of groups, roles, and

assignments. First is the use of a custom role and the second is the nesƟng of groups.

The custom role is inherently an enƟtlement to a collecƟon of permissions. These permissions must be

included based on least privileges required and cover as many permissions as necessary to facilitate one

or more use cases. Since a use case could involve mulƟple resources, the permissions in the set could be

a mixed lot. The higher the scope the more the number of resources and thereby more mixed the set of

permissions. By being more inclusive in the customizaƟon of roles and assignments, we come up with

definiƟons than those with built-in definiƟons from the resource providers.

Some people like to keep the permissions independent even for customizaƟons because they can be

assigned to different resources with the noƟon that one resource can be done away with while the other

persists longer. This can arise from changing business needs where ownership and operaƟons are no

longer managed together. Besides, the resource grouping is generally done purely from the point of view

that resources in a group will have the same lifeƟme and that deleƟon of the group will remove all the

resources. Expansive roles and combining resources can therefore require different security groups for

each even for the same set of users. One way to manage this is to treat two sets of security groups as

one requiring another. One security group allows the users to be grouped and the nesƟng of one group

into another gives those same users the roles and permissions necessary to complete their use cases.

NesƟng of security groups might not have been allowed on-premises, but it is most likely, allowed in the

cloud.

The permission sets must also have some categories as they cannot all be privileges that allow access.

Some must be called out as deny-permissions to clearly demarcate the uƟlity of the custom role.

Similarly, the permissions for control and data path must be called out independently of one another.

There can be other categories as well, but these are widely used.

Lastly, there are several ways to organize hybrid enƟƟes but order and method help to make it easier.

Also, prioriƟzing the use cases and determining the impact that they have, help to address the most

important ones rather than cover all. Also, maintaining a sorted order of high to low privileges or being

mindful of grading with each arƟculaƟon and organizaƟon, helps to be exhausƟve where necessary.

Friday, June 9, 2023

Shared access signatures are popular for storage accounts to give web-based access to blobs, containers and account. It is a secure way to delegate access to resources and provides granular control on how that data is accessed in terms of resources, permissions, and durations.

Among the three types of SAS, user delegation, service, and account, the first one is preferred while the last one can be easily compromised. The first one uses Azure AD credentials to create a user delegation SAS but it must be set up properly. This article explains a few of those settings so that the pesky error messages are resolved.

The SAS Token is a string that is generated on the client side. This is one of common misunderstandings about the SAS URL. It is not generated and not stored or tracked by Azure Storage Service in any way. Once the SAS URL is created, it can be distributed as many times as necessary. When the service receives a SAS URL, it can validate based on all the contents in the SAS token itself.

A SAS token can be signed by using a user delegation key that was created using the Azure Active Directory credentials. A user delegation SAS is signed with the user delegation key. The key can be generated only when the Azure AD security principal requesting it, is assigned a role that has Microsoft.Storage/storageAccounts/blobServices/generateUserDelegationKey.

Often, it is assumed that the above permission is available to all users of a storage account but this is not necessarily the case. The permission for blob service operations are different between the built-in roles of Storage Blob Data Owner, Storage Blob Data Contributor and Storage Blob Data Reader.

Another permission that is required for both listing the blobs and reading the contents of the blobs is the

Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read. Together with this permission and the above generateUserDelegationKey permission, it is possible to create a SAS URL.

Another area of attention is that the authorizations. On the Shared Access Signature menu of the Azure Storage Account Navigation sidebar, there are options to permit on the basis of allowed services, allowed resource types, allowed permissions, blob versioning permissions, and allowed blob index permissions with allowed IP addresses, protocols and preferred routing tier. If any of this options are inconsistent with the operations to be permitted, the net result is the typical error message of:

You do not have permissions to use the access key to list the data

You do not have permission to generate user-delegation SAS for this folder.

It is advisable to restrict those options that do not apply such as file, queue, or table for services or services and container for resource types but check the option for blob and object, if that is the granularity desired.

These are some of the caveats with properly setting the Azure storage account for generating the Shared Access Signature URL.