Saturday, November 13, 2021

 

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

·        Resources can be locked to prevent unexpected changes. A subscription, resource group or resource can be locked to prevent other users from accidentally deleting or modifying critical resources. The lock overrides any permissions the users may have. The lock level can be set to CannotDelete or ReadOnly with the ReadOnly being more restrictive. Lock inheritance can be applied at a parent scope, all resources within that scope can then inherit the same lock. Some considerations still apply after locking. For example, a CannotDelete lock on a storage account does not prevent data within that account to be deleted. A read only lock on an application gateway prevents you from getting the backend health of the application gateway because it uses POST. Only Owner and User Access Administrator role members are granted access to Microsoft.Authorization/locks/* actions.

·        Blob rehydration to the archive tier can be for either hot or cool tier. There are two options for rehydrating a blob that is stored in the archive tier. A) One can copy an archived blob to an online tier using the reference of the blob or its URL. B) Or one can change the blob access tier to an online tier. It can rehydrate the archived blob to hot or cool by changing its tier. Rehydrating might take several hours but several of them can be done concurrently. Rehydration priority might also be set.

·        Virtual Network peering allows us to connect virtual networks in the same region or across regions as in the case of Global VNet Peering through the Azure Backbone network. When the peering is setup, traffic to the remote virtual network, traffic forwarded from the remote virtual network, virtual network gateway or Route server and traffic to the virtual network can be allowed by default.

·        Transaction processing in Azure is not on by default. A transactions locks and logs records so that others cannot use it, but it can be bound to partitions, enabled as distributed transactions and with two phase commit protocol. Transaction processing requires two communication steps for a resource manager and a response from the transaction coordinator which are costly for a datacenter in Azure. It does not scale as the number resource to calls expands as 2 resources – 4 network calls, 4 resources – 16 calls, 100 resource – 400 calls. Besides, the datacenter contains thousands of machines, failures are expected, and the system must deal with network partitions. Waiting for response from all resource managers has costly communication overhead.

·        Diagnostic settings to send platform logs and metrics to different destinations can be authored. Logs include Azure Activity logs and resource logs. Platform metrics are collected by default and stored in the Azure monitor metrics database. Each Azure resource requires its own diagnostic settings, and a single setting can define no more than one of each of the destinations. The available categories will vary for different resource types. The destinations for the logs could include the Log Analytics workspace, Event Hubs and Azure Storage. Metrics are sent automatically to the Azure Monitor Metrics. Optionally, settings can be used to send metrics to Azure monitor logs for analysis with other monitoring data using restricted queries. Multi-dimensional metrics (MDM) are not supported. They must be flattened

Friday, November 12, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

There are several references to best practices throughout the series of articles posted on the documentation for the Azure Public Cloud. This article focuses on the antipatterns to avoid, specifically the cloud readiness antipatterns.

Antipatterns are experienced when planning a cloud adoption. Misaligned operating models can lead to increased time to market, misunderstanding and increased workload on IT departments. Companies choose the wrong operating model when they assume Platform-as-a-service decreases costs without their involvement. Sometimes change of direction in business can lead to radical changes in architecture requiring replacement projects which can become complex and cost intensive.

A model articulates types of accountabilities, landing zones and focus and the company chooses a model based on strategic priorities and scope of its portfolio. When we assign too much responsibility to a small team, it may result in slow adoption journey. Such a team is burdened to approve measures only after fully understanding the impact on the business, operations and security and it could be worse if these aren’t the teams’ main area of expertise.

Cloud readiness antipatterns are those that are experienced during the readiness phase of cloud adoption.

Assuming released services are ready for production is the first cloud readiness antipattern we discuss.

Services age over time. Not all services are mature. Preview services cannot keep up with a Service-Level Agreement (SLA). New services are unstable. When organizations are satisfied that a new or preview service fits their use case, they take a huge risk on the guarantees an SLA provides. This might lead to unexpected downtime, disaster recovery program, and availability issues. When such things do occur, the perception is that this is true for cloud services in general which is not the case and is even more problematic.

Another antipattern is that all cloud services are more resilient and available than those on-premises. Increased resiliency implies recovery after failures and availability implies running in healthy state with little or no downtime. It is true that cloud services offer these advantages but not all of them follow suit. Even when services offer them, they might be offered at a premium or an additional feature.

Take availability for instance and it depends on service models like PaaS and SaaS or on technical architectures like load-balanced availability sets and availability zones.  A single VM may be highly available, but it can still be a single point of failure leading to a case when its downtime might cause the services that are hosted to be in an unrecoverable state.

Another common antipattern is when cloud providers try to make their internal IT department a cloud provider. It becomes responsible for reference architectures while it is providing PaaS or SaaS to business units. This antipatterns severely hampers usability, efficiency, resiliency, and security. Sometimes IT is even tasked with providing monolithic end-to-end services which results in an order for a fully managed cloud VM as a service but IT controls who can access and use the entire platform and business units don’t get to take full advantage of the cloud portal or get SSH or RDP access. This kind of wrapper over cloud services which can be several and changing frequently, does not lower the cost of release that business units want. Instead, a mature cloud operating model such as centralized operations with guardrails like governance can empower the business units.

Finally, the choice of the right model improves the cloud adoption roadmap.

 

Thursday, November 11, 2021

Cosmos DB RBAC access

 


Introduction: The focus of this article is the provisioning of access control on Cosmos DB data access.

Description: One of the frequently encountered errors after a successful provisioning of Cosmos DB instance is the following error message: 
Response status code does not indicate success: Forbidden (403); Substatus: 5302; ActivityId: 9f80d692-0d31-4aab-918b-e84586cb11fb; Reason: (Message: { "Errors":["Request is blocked because principal [0cd8f3af-37e3-49cb-9bea-b84a6dc67f50] does not have the required RBAC permissions to perform action [Microsoft.DocumentDB\/databaseAccounts\/sqlDatabases\/containers\/items\/create] with OperationType [0] and ResourceType [2] on resource [dbs\/API\/colls\/ApiActionStateStore]. Learn more: https:\/\/aka.ms\/cosmos-native-rbac This could be because the user's group memberships were not present in the AAD token."]}
ActivityId: 9f80d692-0d31-4aab-918b-e84586cb11fb, Request URI: /apps/bebfc2ab-b138-45af-8a32-3fe539d00d75/services/3869c06c-7fef-4642-8185-1eb90808b36f/partitions/1244f14f-3de3-40d6-888c-9683e5e13def/replicas/132741653163445857p/, RequestStats: Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, SDK: Windows/10.0.22000 cosmos-netstandard-sdk/3.22.2)

The reason it is frequently encountered is that the users often mistake the role-based access control to apply only to control plane where the objects used to store data such as Account, Database and containers are secured by roles such as contributor or read only. In addition to securing control plane data access, the same must be done for data plane access. Specific examples of data plane actions include “Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers/items/read” and “Microsoft.DocumentDB/databaseAccounts/readMetadata”. The Azure Cosmos DB exposes built-in role definitions which are CosmosDB Built-in data reader that gives permission to perform data actions that includes:

Microsoft.DocumentDB/databaseAccounts/readMetadata

Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers/items/read

Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers/executeQuery

Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers/readChangeFeed

And the Azure Cosmos DB built-in data contributor that grants permissions to take the following data actions:

Microsoft.DocumentDB/databaseAccounts/readMetadata

Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers/*

Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers/items/*

Custom role definitions can also be created but these are the minimum required.

The role definitions can be fetched with the command: Get-AzCosmosDBSqlRoleDefinition -AccountName $accountName  -ResourceGroupName $resourceGroupName

Once the role is defined via one of the interactivity methods such as SDK, PowerShell, CLI or REST based methods, it must then be assigned to users and groups.  When this assignment is incomplete, then the error message as shown is sent to the caller. Assignment requires proper privilege. The remedy to resolve the error message is shown with the following command:

PS C:\users\ravirajamani\source\repos> New-AzCosmosDBSqlRoleAssignment -ResourceGroupName sampleproject-dev-global -AccountName sampleprojectdev -RoleDefinitionName ReadWrite -PrincipalId 0cd8f3af-37e3-49cb-9bea-b84a6dc67f50 -Scope /subscriptions/ad7cfdd8-8685-44b5-8390-284363464cc4/resourceGroups/sampleproject-dev-global/providers/Microsoft.DocumentDB/databaseAccounts/sampleprojectdev

Id : /subscriptions/ad7cfdd8-8685-44b5-8390-284363464cc4/resourceGroups/sampleproject-dev-global/providers/Microsoft.DocumentDB/databaseAccounts/sampleprojectdev/sqlRoleAssignments/899ad926-b869-42a0-bb28-16f

deba32992

Scope : /subscriptions/ad7cfdd8-8685-44b5-8390-284363464cc4/resourceGroups/sampleproject-dev-global/providers/Microsoft.DocumentDB/databaseAccounts/sampleprojectdev

RoleDefinitionId : /subscriptions/ad7cfdd8-8685-44b5-8390-284363464cc4/resourceGroups/sampleproject-dev-global/providers/Microsoft.DocumentDB/databaseAccounts/sampleprojectdev/sqlRoleDefinitions/00000000-0000-0000-0000-000

000000001

PrincipalId : 0cd8f3af-37e3-49cb-9bea-b84a6dc67f50

The account and principal id from actual usage of the command are substituted with fake identifiers.


There can be up to 100 role definitions and up to 2000 role assignments per account.  Role definitions can be assigned to the Azure AD identities belonging to the same Azure AD tenant as the Azure Cosmos DB account. Azure AD group resolution is not currently supported for identities belonging to more than 200 groups. The Azure AD token is currently passed as a header with each individual request sent to the Azure Cosmos DB service.

 

Wednesday, November 10, 2021

 

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

There are several references to best practices throughout the series of articles posted on the documentation for the Azure Public Cloud. This article focuses on the antipatterns to avoid.

Antipatterns are experienced when planning a cloud adoption. Misaligned operating models can lead to increased time to market, misunderstanding and increased workload on IT departments. Companies choose the wrong operating model when they assume Platform-as-a-service decreases costs without their involvement. Sometimes change of direction in business can lead to radical changes in architecture requiring replacement projects which can become complex and cost intensive.

A model articulates types of accountabilities, landing zones and focus and the company chooses a model based on strategic priorities and scope of its portfolio. When we assign too much responsibility to a small team, it may result in slow adoption journey. Such a team is burdened to approve measures only after fully understanding the impact on the business, operations and security and it could be worse if these aren’t the teams’ main area of expertise.

Subject matter experts would like to use the cloud service, so business units increase pressure and if this is unregulated, shadow IT will emerge. Instead of this antipattern, models could be evaluated, and a readiness plan can be built. There are four most common cloud operational patterns which include decentralized operations, centralized operations, Enterprise operations, and Distributed operations from which a choice is made based on the strategic priorities and motivations and the scope of the portfolio to be managed. Strategic priorities could be one of innovation, control, democratization or integration. Portfolio scope could be one of workload, landing zone, cloud platform, or full portfolio.  It identifies the largest scope that a specific operating model is designed to operate.

A decentralized operation is the least complex of the common operating models. In this form of operations, all workloads are operated independently by dedicated teams. Innovation is prioritized over control. Speed is maximized with reduction in cross-workload standardization.  It introduces risk when managing a portfolio of workloads and it is limited to workload level decisions. The advantages include easy mapping of cost of operations, greater workload optimization, responsibilities shifted to DevOps and automation and DevOps and development teams are most empowered by this approach. They experience the least resistance to driving the market change. Many public cloud services are incubated and nurtured in this manner.

A centralized operation is for a stable state environment. It might not require as much focus on the architecture or distinct operational requirements of the individual model. Commercial off-the-shelf applications and slow-release cadence products benefit most from this model. The advantages include the economies of scale when services are shared across several workloads. Responsibilities are reduced on the workload focused team. Standardization and operations support are improved. Build tools and release pipelines are examples of centralized operations.

Enterprise state is the suggested target state for all cloud operations. Enterprise operations balance the need for control and innovation by democratizing decisions and responsibilities. Central IT is replaced by a cloud center of excellence and holds them accountable for decisions as opposed to controlling or limiting their actions. The advantages include cost management, cloud native tools, guardrails for consistency, clear processes and greater impact of the centralized experts along with separation of duties.

Distributed operations are unavoidable when existing operating model is too engrained or there are restrictions which prevent specific business units from making a change. It prioritizes on integration of multiple existing operating models. Since there is no commitment to a primary operating model, it requires a management group hierarchy to lower the risks. There is a distinct advantage for integration of common operating model elements from each business unit.

Finally, the choice of the right model improves the cloud adoption roadmap.

Tuesday, November 9, 2021

Cost comparisons between standalone products and cloud native solutions

 

Introduction:

This article is a TCO calculator for a comparison of cost between an isolated storage appliance and one native to public cloud computing

Description:

Many datacenter products are sold as separate isolated standalone appliances which start out as lean and mean to fit on a single host and eventually justify their own expansion to several racks. The backend processing for many IT operations is delegated to these appliances. For example, object storage is one such example where each organization can choose to have a private cloud storage.

This is a comparison of the features and their relative price comparisons as low or high:

Feature/Subsystem

Standalone appliance

Cloud native DIY solution

Organization

Multi-layered and multi-component monolithic application which requires significant bare metal libraries – High

This is staged and pipelined execution including several pre-built Azure resources - Low

Cluster based architecture for scale out

Involves deploying specific types of components to control and data nodes with costs for coordinator – High

State based reconciliation of control plane resources including scale out and replicas – Low

Microservices for each of the components for ease of integration, testing and programmability

Each component targets the same core storage layer which if distributed between clusters relies on message-based consistency algorithms. Depending on code organization, maintenance and individual component health, the costs for shipping releases of software are accumulated over timeframes. High

Each service can be included into an app service and a plan while components are replaced by efficient use of resources. Packing, unpacking multi-layer blob and user-access-resolution independent layers are replaced by pipelined services that add minimal code to existing resources. Message broker, passing, pub-sub and other routines are eliminated in favor of dedicated products like service bus while the algorithm remains the same. Code reduction and independent release results in cost savings- Low

Since the user namespace hierarchy, user object management, web user interface and virtual data centers are implemented independently as layers, the flexibility to provide business functionalities can remain shallow and restricted to upper layers or frontend

Behind the scenes, the system architecture facilitates the changes to be restricted to frontend or middle tier including data access. Most features can be added in a single shot feature delivery. But the cost often includes metadata changes that might also be persisted to the store. Most features that require persistence reuse the store. High

Behind the staged pipeline and region-based storage accounts, the feature implementations do not rely on anything more than a message queue and a database. Custom logic can be added via extensions and functions that are easy to add without impacting the rest of the organization. Low

DIY libraries and code

Significant investment – High

Little or no investment – leveraging available resources- Low

Objects owned by a virtual data center within a replication group will need to be replicated.

Code must be written to replicate readable objects from one virtual data center to another. Three nodes might be chosen from a pool of cluster nodes for the writes. For example: the storage engine records the disk locations of the chunk in a chunk location index and the disk locations corresponding to the chunk are written to three different disks/nodes. The index locations are chosen independently from the object chunk locations. The VDC needs to know the location of the object. Directories such as for location of objects might be designated for different purposes.  Cost: High

Syncing across availability zones is built into the Azure resources. Although this might not be exposed to the resource invokers, they are welcome to create regions for read-write and read-only. Cosmos DB for instance supports automatic replication across regions. If a storage engine layer must be written on top of the cloud resources, it may still have to write its own replication but usages involving existing data stores can leverage an Azure store, cache or CDN with automatic replication. Cost: Low

Query execution engine

A storage engine could have standard query operators for the query language if the entire data were to be considered as enumerable. In order to collapse the enumeration, efficient lookup data structures such as Bplus tree are used. These indexes can be saved right in the storage for enabling faster lookup later. Cost: High

Unlike preparation, resolving, compilation, plan creation, plan optimization and caching of plans, objects and their heuristics, the cloud services provide simpler indexing and searching capabilities that transcend even document types let alone documents. Besides the operational advantages of using these services from the cloud, this simplifies the search experience. Cost: Low

Analysis engine

The reporting stack has always been a read-only stack which made it possible to interchange analysis stacks independent from the strict or eventually consistent writes.

 

A storage engine with its own reporting stack is a significant investment for that product even if the query interfaces are exposed as standard query operators Cost: High

Many analytical stacks can easily connect to the storage via existing and available connectors reducing the need for integration. Services for analysis from the public cloud are rich, robust and very flexible to work with. Cost: Low

 

Conclusion:

The use of a TCO calculator realizes the reimagining of a storage appliance built for the cloud so that the footprint on premises of individual organizations is minimized.

Sunday, November 7, 2021

 

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

This article focuses on support for compute in Azure.

Compute requirement of a modern cloud app typically involves load balanced compute nodes that operate together with control nodes and databases.

VM Scale sets provide scale, customization, availability, low cost and elasticity.

VM scale sets in Azure resource manager have a type and a capacity. App deployment allow VM extension updates just like OS updates.

Container infrastructure layering allows even more scale because it virtualizes the operating system. While traditional virtual machines enable hardware virtualization and hyper V’s allow isolation plus performance, containers are cheap and barely anything more than just applications.

Azure container service serves both Linux and windows container services. It has standard docker tooling and API support with streamlined provisioning of DCOS and Docker swarm.

Azure is an open cloud because it supports open-source infrastructure tools such as Linux, ubuntu, docker, etc. layered with databases and middleware such as hadoop, redis, mysql etc., app framework and tools such as NodeJS, java, python etc., applications such as Joomla, drupal etc and management applications such as chef, puppet, etc. and finally with devops tools such as jenkins, Gradle, Xamarin etc.

Job based computations use larger sets of resources such as with compute pools that involve automatic scaling and regional coverage with automatic recovery of failed tasks and input/output handling.

Azure involves a lot of fine grained loosely coupled micro services using HTTP listener, Page content, authentication, usage analytic, order management, reporting, product inventory and customer databases.

 

Efficient Docker image deployment for intermittent low bandwidth connectivity scenarios requires the elimination of docker pulling of images. An alternative deployment mechanism can compensate for the restrictions by utilizing an Azure Container Registry, Signature Files, a file share, an IOT hub for pushing manifest to devices. The Deployment path involves pushing image to device which is containerized. The devices can send back messages which are collected in a device-image register. An image is a collection of layers where each layer represents a set of file-system differences and stored merely as folders and files. A SQL database can be used to track the state of what is occurring on the target devices and the Azure based deployment services which helps with both during and after the deployment process.

 

Resource groups are created to group resources that share the same lifecycle. They have no bearing on the cost management of resources other than to help with querying. They can be used with tags to narrow down the interest. There is metadata stored about the resources and it is stored in a particular region. Resources can be moved from one resource group to another or even to another subscription. Finally, resource groups can be locked to prevent actions such as delete or write by users who have access.

As with its applicability to many deployments, Azure Monitor provides tremendous insight into operations of Azure Resources. It is always recommended to create multiple application insights resources and usually one per environment. This results in better separation of telemetry, alerts, work items, configurations, and permissions. Limits are spread such as web test count, throttling, data allowance etc. and it also helps with cross-resource queries.

 

 

Saturday, November 6, 2021

 

This post is a continuation of a series of articles on Azure such as this one: https://1drv.ms/w/s!Ashlm-Nw-wnWhKdsT81vgXlVl38AfA?e=cZz5jq . This describes the Azure architecture for startups:

Azure saves significant costs for big businesses, but it is a compelling value provider for startups as well. The core startup stack architecture makes use of factors that differentiate the requirements for startups. These refer to speed, cost and options where the business needs change rapidly and a sound architecture and system design can minimize the impact to the system while incrementally building the solutions.

Many startups start with a single monolithic application because they don’t have the requirements mature enough to warrant complex microservices pattern. This is best served by an architecture that involves an Azure App Service to provide a simple App Server to deploy scalable applications without configuring servers, load balancers or other infrastructure, an Azure database such as PostgreSQL for RDBS storage without the hassle for maintenance and performance tuning, Azure Virtual Network to segment virtual network traffic and keep internal services protected from internet threat, a GitHub Actions which helps build a continuous Integration and continuous deployment pipeline, a blob storage for unstructured data, a CDN or content delivery network for distributing data throughout the global network with reduced latency and Azure monitor that analyzes happenings across the applications infrastructure.

The core startup stack components are layered minimally such that the product can get off the ground and into the hands of the customers. 80 percent of the startups will use a stack comprising of a storage or persistence layer, a compute or static content layer and the CDN to distribute it to different clients.

With few customers at the start, a CDN might seem premature but adding a CDN serves two-fold in terms of avoiding significant costs to retrofit later on and providing a façade behind which api and architecture can be refined.

The AppServer is where the code runs. This platform should make deployments easy, while requiring the least possible operational input. The app server should scale horizontally. With the help of PaaS, challenges concerning the traditional use of bare-metal, web server, virtual machines etc. are avoided.

Static content does not have to reside on the app server.  The proper use of a CI/CD pipeline can build and deploy static assets with each release. Most production web frameworks deploy static assets.

When the app is running, it needs to store data in a database from online transactions processing. Using the managed instance of a relational database reduces the operational overhead and improves app optimizations.

Log aggregation is extremely useful to debugging and troubleshooting. Frequent integration avoids divergent code bases that lead to merge conflicts.

Leveraging the GitHub Actions, many of the compliance, regulatory, privacy and bounty program processes can be automated. These automations can similarly be expanded to relieve pain points for the startup during the initial growth phase.