Ha ha - my post would not publish because genuine IaC conflicts with Google's policies, cheers.
Saturday, July 22, 2023
Friday, July 21, 2023
Considerations when using private endpoint for
communication between Azure Application Gateway and Azure App Services.
An App Service has a public internet facing
endpoint. An Application Gateway can communicate with an app service over a
service endpoint which allows traffic from a specific subnet within an Azure
Virtual Network to which the gateway is deployed and blocks everything else.
Enabling the service endpoint in that subnet and setting an access restriction
on the app service ensures that the communication is private and secure.
The above configuration can be achieved using
various tools such as CLI, portal and SDK.
A sample command would like this:
az webapp config access-restriction add
--resource-group myRG --name myWebApp --rule-name AppGwSubnet --priority 200
--subnet mySubNetName --vnet-name myVnetName
The service endpoint helps to tag all the traffic
leaving the subnet towards the app service to be tagged with the specific
subnet id.
A similar effect can be achieved with an
alternative that uses a private endpoint but there are a few things we need to
establish first. We need to ensure that the application gateway can DNS resolve
the private IP address in the backend pool and override the hostname in the
http settings.
The Gateway caches the DNS lookup results, so if
we use FQDNs and rely on the DNS lookup to get the private ip address, then we
may need to restart the Application Gateway if the DNS update or link to Azure
private DNS zone was done after the backend pool. The application gateway can
be restarted by starting and stopping the instance such as with the commands
shown below.
az network application-gateway stop
--resource-group myRG --name myAppGw
az network application-gateway start
--resource-group myRG --name myAppGw
Since it is customary for the resources to be
succinctly described with an IaC code for deployment, the following section
illustrates how to do that.
Wednesday, July 19, 2023
Previous articles
in this regard have been discussing resolutions for shortcomings in the use of
Infrastructure-as-a-code (IaC) in various scenarios. This section discusses the
resolution for the case there are cascading resource locks.
Resources can be locked to prevent unexpected changes. A
subscription, resource group or resource can be locked to prevent other users
from accidentally deleting or modifying critical resources. The lock overrides
any permissions the users may have. The lock level can be set to CannotDelete
or ReadOnly with ReadOnly being more restrictive. Lock inheritance can be
applied at a parent scope, all resources within that scope can then inherit the
same lock. Some considerations still apply after locking. For example, a
CannotDelete lock on a storage account does not prevent data within that
account from being deleted. A read only lock on an application gateway prevents
you from getting the backend health of the application gateway because it uses
POST. Only Owner and User Access Administrator role members are granted access
to Microsoft.Authorization/locks/* actions.
When the IaC is applied, it can be quite frustrating to find
the resources locked in the public cloud and preventing the IaC actions to
complete. For example, a resource might have a private endpoint which in turn
might be associated with a DNS and have a private NIC card and these
sub-resources might be locked that prevents the private endpoint from being
deleted which in turns fails the IaC application. The resolution for the owner
of the subscription is to delete the lock from the said resource via the Azure
Portal or the command-line interface and then proceed to apply the locks. And
iterate over the ‘apply’ and the ‘unlock’ steps until there are no further
obstructions.
While this works for the role with the elevated privileges,
many developers using the credentials for the CI/CD pipeline to make changes to
the subscription do not have that privilege and might find the experience
harrowing to resolve without external intervention. One way that they overcome
this unlocking is by applying the unlock commands via a pipeline step prior to
the application of the IaC. Fortunately, there are ways to unlock at a global
subscription level scope rather than at a resource-by-resource level. Even so,
it might not be clear when the locks reappear, and the unlocking might need to
be repeated. Checking the policies to make sure that the locking is not
enforced automatically, which in turn interferes with the infrastructure
changes by code, is a good practice and one that can potentially advise about
the intent behind the locking. If the locking were simply to prevent accidental
deletions against a broad range of resources, then the unlocking is
straightforward for the applying of the changes
Let us make a specific association between say a firewall
and a network resource such as a gateway. The firewall must be associated with
the gateway to prevent traffic flow through that appliance. When they remain
associated, they remember the identifier and the state for each other.
Initially, the firewall may remain in detection mode where it is merely
passive. It becomes active in the prevention mode. When the modes are attempted
to be toggled, the association prevents it. Neither end of the association can
tell what state to be in without exchanging information and when they are
deployed or updated in place, neither knows about nor informs the other.
The above resolutions are easy when the error messages are
descriptive and indicate that the failure of the IaC is exclusively due to
locks. There are other forms of errors where the cause may not be
straightforward. In such cases, the activity log on the resources or at the
subscription level can be quite helpful when the json content of a logged event
explains exactly what happened. This particular feature is also helpful to know
if something transpired by actions of something other than the deployment of
the infrastructure changes.
Tuesday, July 18, 2023
IaC Resolutions Part 8:
Previous articles in this regard have been discussing resolutions for shortcomings in the use of Infrastructure-as-a-code (IaC) in various scenarios. This section discusses the resolution for the case when changes to resources involves breaking a deadlock in state awareness between, say a pair.
Let us make a specific association between say a firewall and a network resource such as a gateway. The firewall must be associated with the gateway to prevent traffic flow through that appliance. When they remain associated, they remember the identifier and the state for each other. Initially, the firewall may remain in detection mode where it is merely passive. It becomes active in the prevention mode. When the modes are attempted to be toggled, the association prevents it. Neither end of the association can tell what state to be in without exchanging the information and when they are deployed or updated in place, neither knows about or informs the other.
There are two ways to overcome this limitation.
First, there is a direction established between the resources where the update to one forcibly updates the state of the other. This is supported by the gateway when it allows the information to be written through by the update in the state of one resource.
Second, the changes are made by the IaC provider first to one resource and then to the other so that the update to the other picks up the state of the first during its change. In this mode, the firewall can be activated after the gateway knows that there is such a firewall.
If the IaC tries to form an association while updating the state of one, the other might end up with an inconsistent state. One of the two resolutions above works to mitigate this.
This is easy when there is a one-to-one relationship between resources. Sometimes there are one-to-many relationships. For example, a gateway might have more than a dozen app services as its backend members and each member might be allowing public access. If the gateway must consolidate access to all the app services, then there are changes required on the gateway to route traffic to each app service as intended by the client and a restriction on the app services to allow only private access from the gateway.
Consider the sequence in which these changes must be made given that the final operational state of the gateway is only acceptable when all barring none remain reachable for a client through the gateway.
If the app services toggle the access from public to gateway sooner than the gateway becomes operational, there is some downtime to them, and the duration is not necessarily bounded if one app service fails to listen to the gateway. The correct sequence would involve first making the change in the gateway to set up proper routing and then restricting the app services to accept only the gateway. Finally, the gateway validates all the app service flows from a client before enabling them.
Each app service might have nuances about whether the gateway can reach it one way or another. Usually, if they are part of the same vnet, then this is not a concern, otherwise peering might be required. Even if the peering is made available, routing by address or resolution by name or both might be required unless they are universally known on the world wide web. If the public access is disabled, then the private links must be established, and this might require both the gateway and the app service to do so. Lastly, with each change, an app service must maintain its inbound and outbound properly for bidirectional communication, so some vetting is required on the app service side independent of the gateway.
Putting this altogether via IaC requires that the changes be made in stages and each stage validated independently.
Monday, July 17, 2023
Your guide to sustainability with cloud solutions.
Solutions for the industry that are implemented new, benefit from
a set of principles that provide prescriptive guidance to improving the quality
of their deployments. When the industry
moves from digital adoption to digital transformation to digital acceleration,
the sustainability journey requires a strong digital foundation. It is the best
preparation for keeping pace with this rapid change.
This is true for meeting new sustainability requirements, avoiding
the worst impacts of climate change and other business priorities such as
driving growth, adapting to industry shifts, and navigating energy consumption
and economic conditions. It helps to track and manage data at scale, unifying
data and improving visibility across the organization.
As an
aside, the well-architected framework consists of five pillars. These are
reliability (REL), security (SEC), cost optimization (COST), operational
excellence (OPS) and performance efficiency (PERF). The elements that support these pillars are a review, a
cost and optimization advisor, documentation, patterns-support-and-service
offers, reference architectures and design principles.
Sustainability is a journey. Cloud solutions can be
developed by fostering growth and controlling costs while contributing to
sustainability goals. The journey can be shaped by building resilience with an
ESG strategy, resizing opportunities to control costs, improving efficiency
with energy reduction, and tracking progress for environmental impact.
The well-architected framework helps to reliably
report your sustainability impact, driving meaningful progress and finding gaps
where the most impact can be delivered but a leader’s insights and perspectives
can tune the sustainability investments to create opportunities that align with
other goals. While some of the industry-tested learnings are brought in this
article, meeting sustainability goals is a big win for an organization of any
size. That is why a tiny country like Bhutan can claim to be a leader with its
carbon negative footprint. Companies that meet sustainability goals are favored
by investors, consumer satisfaction and employee retention are also improved
with clear indicators such as customers willing to pay more for options with
sustainability. Knowing where we are today, setting future goals and making
data driven decisions to create steps towards realizing those dreams are
invaluable exercises for your blue ocean strategy.
Eventually, there will be regulations mandating similar work but a
determination of the ESG metrics provides competitive differentiation for your
solutions. Granted there might be a cultural shift involved not different from
the one experienced in adopting the cloud, but the potential for dramatic
discoveries and strong storytelling can unite the people behind the
vision. Some tools help such as the
Microsoft Sustainability Manager can be instrumental towards realizing this
vision as it unifies the data to monitor and manage the environmental impact.
An ideal tool and central vantage point can assess the value of
corporate sustainability. But an even more important consideration is that the
scope for similar assessment and impact can be delegated to different
organizational units and departments and assist with incremental progress
towards the sustainability journey when those participants learn to
self-evaluate their path and metrics.
Sustainability is also about consumption, efficiency and
digitizing the supply chain.
Transparency and tighter upstream and downstream collaboration are
needed if cyclical efficiencies are to be discovered in products and processes.
A single unified platform can help with visualizing the data from disparate
devices and systems. Adopting recyclable and repairable software goes a great
way towards sustainability just as much as devices. A strong digital foundation can drive both
sustainability and transformational goals. The urgency, scope and scale of the
task cited in this article can help you with the journey from pledges to
progress.
References: IaC
Shortcomings and resolutions.
Sunday, July 16, 2023
Some methods of organization for large scale
Infrastructure-as-a-Code deployments.
The purpose of IaC is to provide a dynamic, reliable, and
repeatable infrastructure suitable for cases where manual approaches and
management practices cannot keep up. When automation increases to the point of
becoming a cloud-based service responsible for the deployment of cloud
resources and stamps that provision other services that are diverse, consumer
facing and public cloud general availability services, some learnings can be
called out that apply universally across a large spectrum of industry clouds.
A service that deploys other services must accept IaC
deployment logic with templates, intrinsics, and deterministic execution that
works much like any other workflow management system. This helps to determine the order in which to run them and
with retries. The tasks are self-described. The automation consists of a
scheduler to trigger scheduled workflows and to submit tasks to the executor to
run, an executor to run the tasks, a web server for a management interface, a
folder for the directed acyclic graph representing the deployment logic
artifacts, and a metadata database to store state. The workflows don’t restrict
what can be specified as a task which can be an Operator or a predefined task
using say Python, a Sensor which is entirely about waiting for an external
event to happen, and a Custom task that can be specified via a Python function
decorated with a @task.
The organization
of such artifacts posed two necessities. First, to leverage the builtin
templates and deployment capabilities of the target IaC provider as well as
their packaging in the format suitable to the automation that demands certain
declarations, phases, and sequences to be called out. Second the co—ordination
of context management switches between automation service and IaC provider.
This involved a preamble and an epilogue to a context switch for bookkeeping
and state reconciliation.
This taught us
that large IaC authors are best served by uniform, consistent and global naming
conventions, registries that can be published by the system for cross
subscription and cross region lookups, parametrizing diligently at every scope
including hierarchies, leveraging dependency declarations, and reducing the
need for scriptability in favor of system and user defined organizational units
of templates. Leveraging supportability via read-only stores and frequently
publishing continuous and up-to-date information on the rollout helps alleviate
the operations from the design and development of IaC.
IaC writers
frequently find themselves in positions where the separation between pipeline
automation and IaC declarations are not clean, self-contained or require
extensive customizations. One of the approaches that worked on this front is to
have multiple passes on the development. With one pass providing initial deployment
capability and another pass consolidating and providing best practice via
refactoring and reusability. Enabling the development pass to be DevOps based,
feature centric and agile helps converge to a working solution with learnings
that can be carried from iteration to iteration. The refactoring pass is more
generational in nature. It provides cross-cutting perspectives and non-functional
guarantees.
A library of
routines, operators, data types, global parameters and registries are almost
inevitable with large scale IaC deployments but unlike the support for programming
language-based packages, these are often organically curated in most cases and
often self-maintained. Leveraging tracking and versioning support of source
control, its possible to provide compatibility as capabilities are made native
to the IaC provider or automation service.
Reference: IaC shortcomings and resolutions.
Saturday, July 15, 2023
Improve
workloads and solution deployments:
Solutions for the industry that are implemented new, benefit from
a set of principles that provide prescriptive guidance to improving the quality
of their deployments. When the industry
moves from digital adoption to digital transformation to digital acceleration,
the sustainability journey requires a strong digital foundation. It is the best
preparation for keeping pace with this rapid change.
This is true for meeting new sustainability requirements, avoiding
the worst impacts of climate change and other business priorities such as
driving growth, adapting to industry shifts, and navigating energy consumption
and economic conditions. It helps to track and manage data at scale, unifying
data and improving visibility across the organization. This helps to reliably
report your sustainability impact, driving meaningful progress and finding gaps
where the most impact can be delivered.
The
well-architected framework consists of five pillars. These are reliability
(REL), security (SEC), cost optimization (COST), operational excellence (OPS)
and performance efficiency (PERF). The elements that support these pillars are a review, a cost and
optimization advisor, documentation, patterns-support-and-service offers,
reference architectures and design principles.
This guidance provides a summary of how these principles
apply to the management of the data workloads.
Cost optimization
is one of the primary benefits of using the right tool for the right solution.
It helps to analyze the spend over time as well as the effects of scale out and
scale up. An advisor can help improve reusability, on-demand scaling, reduced
data duplication, among many others.
Performance is usually based on external factors and is
very close to customer satisfaction. Continuous telemetry and reactiveness are
essential to tuned up performance. The shared environment controls for
management and monitoring create alerts, dashboards, and notifications specific
to the performance of the workload. Performance considerations include storage
and compute abstractions, dynamic scaling, partitioning, storage pruning,
enhanced drivers, and multilayer cache.
Operational excellence comes with security and
reliability. Security and data management must be built right into the system
at layers for every application and workload. The data management and analytics
scenario focus on establishing a foundation for security. Although workload
specific solutions might be required, the foundation for security is built with
the Azure landing zones and managed independently from the workload.
Confidentiality and integrity of data including privilege management, data
privacy and appropriate controls must be ensured. Network isolation and
end-to-end encryption must be implemented. SSO, MFA, conditional access and
managed service identities are involved to secure authentication. Separation of
concerns between azure control plane and data plane as well as RBAC access
control must be used.
The key considerations for reliability are how to detect
change and how quickly the operations can be resumed. The existing environment
should also include auditing, monitoring, alerting and a notification
framework.
In addition to all the above, some consideration may be
given to improving individual service level agreements, redundancy of workload
specific architecture, and processes for monitoring and notification beyond
what is provided by the cloud operations teams.
Each pillar contains questions for which the answers
relate to technical and organizational decisions that are not directly related
to the features of the software to be deployed. For example, a software that
allows people to post comments must honor use cases where some people can
write, and others can read. But the system developed must also be safe and
sound enough to handle all the traffic and should incur reasonable costs.
Since the most crucial pillars are OPS and SEC, they
should never be traded in to get more out of the other pillars.
The security pillar consists of Identity and access
management, detective controls, infrastructure protection, data protection and
incident response. Three questions are routinely asked for this pillar: How is
the access controlled for the serverless api? How are the security boundaries
managed for the serverless application? How is the application security
implemented for the workload?
The operational excellence pillar is made up of four
parts: organization, preparation, operation, and evolution. The questions that
drive the decisions for this pillar include: How is the health of the serverless
application known? How is the application lifecycle management approached?
The reliability pillar is made of three parts:
foundations, change management, and failure management. The questions asked for
this pillar include: How are the inbound request rates regulated? How is the
resiliency build into the serverless application?
The cost optimization pillar consists of five parts:
cloud financial management practice, expenditure and usage awareness,
cost-effective resources, demand management and resources supply, and
optimizations over time. The questions asked for cost optimization include: How
are the costs optimized?
The performance efficiency pillar is composed of four
parts: selection, review, monitoring and tradeoffs. The questions asked for
this pillar include: How is the
performance optimized for the serverless application?
In addition to these questions, there’s quite a lot of
opinionated and even authoritative perspectives into the appropriateness of a
framework and they are often referred to as lenses. With these forms of
guidance, a well-architected framework moves closer to an optimized
realization.