Cluster computing

Saturday, September 24, 2022

Multitenant application and monitoring:

Monitoring is not divested from a multitenant solution provider’s concern. In fact, unlike applications and databases that proliferate across tenants and vary in number and size, monitoring must be a consistent story and experience across tenants and thereby deserves to be studied separately.

Surely, Public cloud services for monitoring and log analytics can be leveraged for a cloud native multitenant application and we can start from there before we change the topology to include on-premises deployments. As with all multitenant applications deployments, they can come in all forms and modes of deployments spanning on-premises, public cloud and hybrid environments.

A cloud native monitoring service helps us leverage the best practices of monitoring solutions in the cloud and reduce the complexity of managing something native to the multitenant application while allowing both the monitoring service and the multitenant application to scale independently and remain highly available. A cloud monitoring service is also a complete solution for collecting, analyzing, and acting on the telemetry from the multitenant application solution in the cloud environment.  In the case of Azure, this monitoring program comprises of an application performance management system called the Application Insights, the host monitoring system called the VM Insights and Container Insights, the Log Analytics solution which allows drill down into the monitoring data, smart alerts, and automated actions which help support operations at scale, and visualizations with dashboard and workbooks. The data collected from this comprehensive solution becomes part of Azure Monitor Metrics.  

Azure Monitoring is not only about metrics, but it is also about logs, and it allows us to gather insights, visualize, analyze, respond, and integrate. The monitoring data platform works for both metrics as well as logs. While events and traces become part of logs, metrics are numerical values that quantify application performance at a given point in time.  The metrics store and its visualization with metrics explorer and the log data and its filtering with Log Analytics are just applications dedicated to their respective data. The Azure Monitor uses the Kusto query language that is suitable for simple log queries that also include advanced functionalities such as aggregations, joins, and smart analytics. Kusto benefits from both SQL and Splunk querying practices.  

One of the most interesting aspects of a cloud monitoring service of interest to a multitenant solution provider is that it collects metrics from the multitenant solution, the Guest OS, Azure resources, subscriptions, and of course tenants which pretty much covers the depth and breadth of the systems involved. Additionally, Alerts and Autoscale help determine the appropriate thresholds and actions that become part of the monitoring stack, so the data and the intelligence are together and easily navigated via the dashboard. The dashboards available for monitoring provide a variety of eye-candy charts that better illustrate the data to the viewers than the results of the query. Workbooks provide a flexible canvas for data analysis and the creation of rich visual reports in the Azure Portal.  The analysis is not restricted to just these two. Power BI remains the robust solution to provide analysis and interactive visualizations across a variety of data sources and it can automatically import log data from Azure monitor. Azure Event Hubs is a streaming platform and event ingestion service which permits real-time analytics as opposed to batching or storage-based analysis. APIs from the Azure monitor help with reading and writing data as well as configure and retrieve alerts.  

Let us take a closer look at how the monitoring data is gathered and analyzed internally within the cloud. The architecture behind Azure Monitoring is a diagnostics pipeline.  This pipeline consists of an ingestion gateway, a delivery service, a distributed graph, a normalizer service and scrubber, logs to metrics converter, and an uploader to a global database. The pipeline supports ingestion, streaming, transformations, and querying. This is its hallmark. All these paths are supported end-to-end via the pipeline without any interference from each other.  

The idea behind the monitoring pipeline is one of queuing and pub-sub mechanisms. Logs and metrics flow from gateways to storage queues, where blobs are listened for, scrubbed, forwarded to event hubs, and uploaded to different destinations such as CosmosDB, Azure Data Lake Storage (ADLS), and delivery services. The rate of flow to the queues can be throttled and the schema hints can be propagated to the storage where the schema and notifications power the analytics. The metrics accumulation in an MDM facilitates the logic for throttling and rate adjustments while the schemas are mostly published and queried via Kusto.  

Configurations for different storage containers, queues, and hubs are defined between the collection and the delivery services. These are called Monikers and it is a pairing of Event hub and storage account. The ingestion service is responsible for connecting the monitoring agent with its storage account. The use of this service reduces the number of monikers, the number of blob writes to storage, and the complexity of the distributed graph representation. The storage is billed in terms of transactions and what would earlier take hundreds of transactions and blob writes, would require only tens of transactions using the ingestion or ingress service. It can also aggregate the blobs before writing them to the storage account.  

The corresponding egress service is the delivery service and can be considered an equivalent of Apache Kafka. It comes with a set of producer and consumer definitions and this pub-sub service operates at the event level. There is an application programmability interface provided for consumers who would like to define the monikers instead of the control on the events. The setting up of monikers determines where and how the data is delivered and the use of monikers reduces the bill in an equivalent way to how the ingress did. The destinations are usually Kusto clusters and event hubs. The delivery service forms the core of the pipeline with agents and ingestion pouring data to storage defined by monikers. At the other end of the pipeline are the event hubs and Kusto clusters.  

Collection and Storage have prerequisites. For example, when virtual machines are created, they automatically have a monitoring agent (MA) installed. This agent reaches out to a collection service with an intent to write and define a namespace. The handshake between the monitor and the agent gives the agent the configuration necessary to direct its data to a destination Moniker which can scale automatically for the storage account.  

Unlike the collection and the storage, which are automatically provisioned, the delivery and the paths are set up by the customer using the application programmability interfaces in the extensibility SDK associated with the delivery services. The delivery service then concerns itself merely with the resolving of monikers, the listening on the monikers, the filtering of the events, and its delivery to the Kusto clusters and event hubs. If the destination is unreachable or unavailable, the data is handed off to the snapshot delivery service which reads the delivery service namespaces for retries. The data is never put in memory when the delivery service forwards the data to a cache under a namespace key. The snapshot delivery service acts as the standby destination in place of the unreachable one.  

The access to the data is controlled by the storage access service key also called a blob token that is issued at the time of writing the blob so that the destination can use that token to import the data and handle the single-cast or multi-cast as appropriate to event handlers or appropriate Kusto Cluster. Data copying and rewriting is avoided by merely exchanging the payload information and blob tokens with the delivery service absorbing the process of fetching the permissions from GCS on behalf of the customer.  

The role of the distributed graph might be standing out in the form of a question at this point. It is a service that is used for searching logs and for transformations. It consists of a front-end service and a backend service with each individual component within the FE and BE cloud services as individual micro-services performing a specific task. The front-end service allows the customers to set up query conditions such as job scope and interval period.   

All the monitoring services are region-bound and can be repeated in other regions. Availability within the region such as for disaster recovery purposes requires the use of availability zones. The backend service merely schedules the workers for the appropriate handling of the logs to the customer’s storage account.  

Many miscellaneous activities are specific to the data and whether the data is logs or metrics such as scrubbing, logs to metrics transformations, normalization, and uploading which are handled by dedicated services and serve to enhance the pipeline described so far. The monitoring architecture is generic and one that requires queues, blobs, collections, schedulers, pub-subs, producer-consumers accounts, analysis and reporting stacks and their configurations. 

Most of the resources for Azure monitoring are region scoped. This enables Azure Monitoring to be set up in each region.  Some shared data and resources across these regions may exist in a dedicated region which would power use cases of monitoring via the Azure portal. 

Azure Monitoring also performs continuous monitoring which refers to processes and tools for monitoring each phase of the DevOps and IT operations lifecycles. It helps to continuously ensure the health, performance and reliability of the application and the infrastructure as it moves from deployment to production. It builds on Continuous Integration and Continuous deployment which are ubiquitously embraced by organizations for software development. Azure Monitoring is a unified monitoring solution that provides transparency to the application, runtime, host and cloud infrastructure layers. As a continuous monitoring tool, Azure Monitor allows gates and rollback of deployments based on monitoring data. Software releases to the services hosted in the cloud and have very short software development cycles and must pass through multiple stages and environments before it is made public.  Monitoring data allows any number of environments to be introduced without sacrificing the controls for software quality and gated release across environments. The data not only allows thresholds to be set but also alerts so that appropriate action may be taken. As the software makes its way to the final production environment, the alerts increase in levels and become more relevant and useful for eliminating risks from the production environment. 

It may be argued that tests and other forms of software quality control achieve that as the software goes through the CI/CD pipeline. While this is true, the software quality is enhanced by monitoring data because it is not intrusive or vulnerable to flakiness that many tests are prone to in different environments. The monitoring data, its visualization with dashboards need to be set only once even as the code and tests change over time. The investments in continuous monitoring and its implications boost the planning and predictability of software releases. 

When the cloud monitoring service such as Azure Monitor logs is used to monitor elastic pools and databases for a multitenant application, typically only one account and namespace is needed for the entire integration unlike keyvaults, storage accounts, application instances and databases that proliferate in size and number with the number of tenants. A single geographical region and instance of the Azure Monitor logs is sufficient for the multitenant application solution.

Friday, September 23, 2022

Tenancy in Active Directory

Multitenant applications can switch between single tenant deployment mode to multitenant deployment mode. The Active Directory can handle both tenancies and this document discusses how to do that. Single tenant applications must enable sign-in to the home tenant. Multitenant applications are available to users to sign-in to both their home tenant and other tenants.

Public clouds like Azure enable application to be configured for single-tenancy or multi-tenancy via the management portal. The “Accounts in any organizational directory” must be specifically set. The AppID URI of the application must be unique. Global uniqueness is guaranteed by requiring the AppId URI to have a hostname that matches a verified domain of the Active Directory tenant.

When these applications are developed, there can be a number of challenges from the different policies that can be configured by the administrators.

The best practices for developing sign-in experience on multitenant applications include the following:

1. The application must be tested in a tenant that has conditional access policies configured.

2. The principle of least user access must be honored to ensure that the application only requests permission that it needs.

3. Appropriate names and descriptions for any permissions must be provided. This helps users and administrators know what they are agreeing to before they give consent.

When the sign-ins are configured to be accepted from any Active Directory tenant, it makes the application multi-tenant. If the application is using an existing proprietary account system or from other cloud providers, Active Directory sign-in can still be added via OAuth2, OpenID connect, or SAML integration if the application adds a published UI control on their sign-in page.

In a single-tenant application, the sign-in requests are sent to the tenant’s sign-in endpoint. In the multitenant application, the originating tenant from which the user is attempting to sign-in is not known. Instead, requests are sent to an endpoint that multiplexes across all the Azure AD tenants. When the Microsoft’s identity platform receives a request on the /common endpoint, it signs the user in and discovers the tenant she belongs to and sets this value in the issuer field in the token response to the application. The application can read the issuer field in the token response and take it to correspond to the user’s tenant. The application must handle the validation of the token before reading this field from the multiplexer. There could also be multiple issuer values. Each public cloud AD tenant has a unique issuer value of the form of a URI containing a GUID. A single tenant application can validate the tenant by matching the GUID in the response, but a multitenant application only receives a templatized URI from the multiplexer and is unable to validate it this way. It must maintain a registry of valid tenants and check the issuer value or the tid claim value in the token. It can also choose to admit based on userId registry and ignore the tenant Id altogether.

Thursday, September 22, 2022

Some rules, applications, and guidelines for multitenant application extension development: 

When multitenant application extensions are developed, there are a few best practices that can be called out. This article covers some of them. 

Some of the common pitfalls in multitenant application development include the following: 

Prefix/suffix missing. This is required to ensure a healthy app ecosystem that avoids name collisions.

DataClassification missing or set incorrectly – There is a tool to detect these, and it is not hard to automate.

Required translation files missing – These are required for specifying additional languages.

Missing permission sets - The least privilege execution policy requires proper permission set to be granted.

Permission errors – These must not be shown unless it entails a necessary action for the user.

Missing application area tagging – tenant applications can only be categorized with tagging

Usage category not set – This is required for search because the property helps to provide hints

Business open/close for a tenant implies a common handler into the invocation of code for the tenant. These must be properly maintained.

Upgrade procedures – An application can be upgraded properly if the standard operating procedure is followed.

Use logic owning the artifacts – they should not be updated or maintained if there is no ownership

Testing with elevated privileges hides the errors users might encounter.

Testing does not cover a specific scenario because the documentation for the scenario is not proper.

The configuration and environment settings can alter the behavior of the product and they must be set properly before testing.

The tenant application extension lifecycle also differs significantly enough from mainstream single-tenant applications to be called out here:

Migration – Data might need to be migrated between extensions. The process applies to both large- and small-scale data migrations. Upgrade can be treated as a large-scale data migration. Small scale data migrations are where a few objects are hand-picked between extensions. The migration’s direction is dependent on the dependency graph.

Translations can be applied to multiple properties and these captions are scoped to their entities. They can be changed from multiple places and might live in different artifacts, but they are associated with the entities they are for.

Tests can be isolated as database transactions in the tenant namespace. The difference between a normal run versus a test run is that a failure in the former means that it stops while a failure in the latter means that it can skip to the next.

Publishing and installing an extension requires an external registry and installation in individual tenant namespaces. The extension then becomes available to the users in the client.

Updating and upgrading differ in the behavior of the code before and after. Upgrade strives to maintain backward compatibility while update can replace wholesome.

Deprecation best practices involve the use of conditional directives to surround the code to be obsoleted.