Multitenant application and monitoring:
Monitoring is not divested from a multitenant solution provider’s concern. In fact, unlike applications and databases that proliferate across tenants and vary in number and size, monitoring must be a consistent story and experience across tenants and thereby deserves to be studied separately.
Surely, Public cloud services for monitoring and log analytics can be leveraged for a cloud native multitenant application and we can start from there before we change the topology to include on-premises deployments. As with all multitenant applications deployments, they can come in all forms and modes of deployments spanning on-premises, public cloud and hybrid environments.
A cloud native monitoring service helps us leverage the best practices of monitoring solutions in the cloud and reduce the complexity of managing something native to the multitenant application while allowing both the monitoring service and the multitenant application to scale independently and remain highly available. A cloud monitoring service is also a complete solution for collecting, analyzing, and acting on the telemetry from the multitenant application solution in the cloud environment. In the case of Azure, this monitoring program comprises of an application performance management system called the Application Insights, the host monitoring system called the VM Insights and Container Insights, the Log Analytics solution which allows drill down into the monitoring data, smart alerts, and automated actions which help support operations at scale, and visualizations with dashboard and workbooks. The data collected from this comprehensive solution becomes part of Azure Monitor Metrics.
Azure Monitoring is not only about metrics, but it is also about logs, and it allows us to gather insights, visualize, analyze, respond, and integrate. The monitoring data platform works for both metrics as well as logs. While events and traces become part of logs, metrics are numerical values that quantify application performance at a given point in time. The metrics store and its visualization with metrics explorer and the log data and its filtering with Log Analytics are just applications dedicated to their respective data. The Azure Monitor uses the Kusto query language that is suitable for simple log queries that also include advanced functionalities such as aggregations, joins, and smart analytics. Kusto benefits from both SQL and Splunk querying practices.
One of the most interesting aspects of a cloud monitoring service of interest to a multitenant solution provider is that it collects metrics from the multitenant solution, the Guest OS, Azure resources, subscriptions, and of course tenants which pretty much covers the depth and breadth of the systems involved. Additionally, Alerts and Autoscale help determine the appropriate thresholds and actions that become part of the monitoring stack, so the data and the intelligence are together and easily navigated via the dashboard. The dashboards available for monitoring provide a variety of eye-candy charts that better illustrate the data to the viewers than the results of the query. Workbooks provide a flexible canvas for data analysis and the creation of rich visual reports in the Azure Portal. The analysis is not restricted to just these two. Power BI remains the robust solution to provide analysis and interactive visualizations across a variety of data sources and it can automatically import log data from Azure monitor. Azure Event Hubs is a streaming platform and event ingestion service which permits real-time analytics as opposed to batching or storage-based analysis. APIs from the Azure monitor help with reading and writing data as well as configure and retrieve alerts.
Let us take a closer look at how the monitoring data is gathered and analyzed internally within the cloud. The architecture behind Azure Monitoring is a diagnostics pipeline. This pipeline consists of an ingestion gateway, a delivery service, a distributed graph, a normalizer service and scrubber, logs to metrics converter, and an uploader to a global database. The pipeline supports ingestion, streaming, transformations, and querying. This is its hallmark. All these paths are supported end-to-end via the pipeline without any interference from each other.
The idea behind the monitoring pipeline is one of queuing and pub-sub mechanisms. Logs and metrics flow from gateways to storage queues, where blobs are listened for, scrubbed, forwarded to event hubs, and uploaded to different destinations such as CosmosDB, Azure Data Lake Storage (ADLS), and delivery services. The rate of flow to the queues can be throttled and the schema hints can be propagated to the storage where the schema and notifications power the analytics. The metrics accumulation in an MDM facilitates the logic for throttling and rate adjustments while the schemas are mostly published and queried via Kusto.
Configurations for different storage containers, queues, and hubs are defined between the collection and the delivery services. These are called Monikers and it is a pairing of Event hub and storage account. The ingestion service is responsible for connecting the monitoring agent with its storage account. The use of this service reduces the number of monikers, the number of blob writes to storage, and the complexity of the distributed graph representation. The storage is billed in terms of transactions and what would earlier take hundreds of transactions and blob writes, would require only tens of transactions using the ingestion or ingress service. It can also aggregate the blobs before writing them to the storage account.
The corresponding egress service is the delivery service and can be considered an equivalent of Apache Kafka. It comes with a set of producer and consumer definitions and this pub-sub service operates at the event level. There is an application programmability interface provided for consumers who would like to define the monikers instead of the control on the events. The setting up of monikers determines where and how the data is delivered and the use of monikers reduces the bill in an equivalent way to how the ingress did. The destinations are usually Kusto clusters and event hubs. The delivery service forms the core of the pipeline with agents and ingestion pouring data to storage defined by monikers. At the other end of the pipeline are the event hubs and Kusto clusters.
Collection and Storage have prerequisites. For example, when virtual machines are created, they automatically have a monitoring agent (MA) installed. This agent reaches out to a collection service with an intent to write and define a namespace. The handshake between the monitor and the agent gives the agent the configuration necessary to direct its data to a destination Moniker which can scale automatically for the storage account.
Unlike the collection and the storage, which are automatically provisioned, the delivery and the paths are set up by the customer using the application programmability interfaces in the extensibility SDK associated with the delivery services. The delivery service then concerns itself merely with the resolving of monikers, the listening on the monikers, the filtering of the events, and its delivery to the Kusto clusters and event hubs. If the destination is unreachable or unavailable, the data is handed off to the snapshot delivery service which reads the delivery service namespaces for retries. The data is never put in memory when the delivery service forwards the data to a cache under a namespace key. The snapshot delivery service acts as the standby destination in place of the unreachable one.
The access to the data is controlled by the storage access service key also called a blob token that is issued at the time of writing the blob so that the destination can use that token to import the data and handle the single-cast or multi-cast as appropriate to event handlers or appropriate Kusto Cluster. Data copying and rewriting is avoided by merely exchanging the payload information and blob tokens with the delivery service absorbing the process of fetching the permissions from GCS on behalf of the customer.
The role of the distributed graph might be standing out in the form of a question at this point. It is a service that is used for searching logs and for transformations. It consists of a front-end service and a backend service with each individual component within the FE and BE cloud services as individual micro-services performing a specific task. The front-end service allows the customers to set up query conditions such as job scope and interval period.
All the monitoring services are region-bound and can be repeated in other regions. Availability within the region such as for disaster recovery purposes requires the use of availability zones. The backend service merely schedules the workers for the appropriate handling of the logs to the customer’s storage account.
Many miscellaneous activities are specific to the data and whether the data is logs or metrics such as scrubbing, logs to metrics transformations, normalization, and uploading which are handled by dedicated services and serve to enhance the pipeline described so far. The monitoring architecture is generic and one that requires queues, blobs, collections, schedulers, pub-subs, producer-consumers accounts, analysis and reporting stacks and their configurations.
Most of the resources for Azure monitoring are region scoped. This enables Azure Monitoring to be set up in each region. Some shared data and resources across these regions may exist in a dedicated region which would power use cases of monitoring via the Azure portal.
Azure Monitoring also performs continuous monitoring which refers to processes and tools for monitoring each phase of the DevOps and IT operations lifecycles. It helps to continuously ensure the health, performance and reliability of the application and the infrastructure as it moves from deployment to production. It builds on Continuous Integration and Continuous deployment which are ubiquitously embraced by organizations for software development. Azure Monitoring is a unified monitoring solution that provides transparency to the application, runtime, host and cloud infrastructure layers. As a continuous monitoring tool, Azure Monitor allows gates and rollback of deployments based on monitoring data. Software releases to the services hosted in the cloud and have very short software development cycles and must pass through multiple stages and environments before it is made public. Monitoring data allows any number of environments to be introduced without sacrificing the controls for software quality and gated release across environments. The data not only allows thresholds to be set but also alerts so that appropriate action may be taken. As the software makes its way to the final production environment, the alerts increase in levels and become more relevant and useful for eliminating risks from the production environment.
It may be argued that tests and other forms of software quality control achieve that as the software goes through the CI/CD pipeline. While this is true, the software quality is enhanced by monitoring data because it is not intrusive or vulnerable to flakiness that many tests are prone to in different environments. The monitoring data, its visualization with dashboards need to be set only once even as the code and tests change over time. The investments in continuous monitoring and its implications boost the planning and predictability of software releases.
When the cloud monitoring service such as Azure Monitor logs is used to monitor elastic pools and databases for a multitenant application, typically only one account and namespace is needed for the entire integration unlike keyvaults, storage accounts, application instances and databases that proliferate in size and number with the number of tenants. A single geographical region and instance of the Azure Monitor logs is sufficient for the multitenant application solution.