Sunday, May 14, 2023

Queries for Operational Engineering of Cloud Resources

Introduction: This article focuses on the operational engineering aspects of Cloud DevOps and solutions development. While the public cloud and the technology landscape owned by companies play up their strengths for operational engineering, there is no defined solution or product that addresses all the concerns of this discipline. The public cloud does come with immense capabilities for monitoring and management of resources but leaves the authoring of rules, alerts, action group, dashboards and their organizations by scope and level to the folks. Individual teams end up with custom practice and approach for a single pane of management or often do without it by resorting to those built-in products and repeated efforts usually by resources on rotation.

Operational engineering is just as much about asking the right questions for the smooth running of operations as it is about troubleshooting and remediations for current and future deployments.  Instead of running the same questions time and again through different sets of means and personnel, it might be better to articulate them and invest in systems that can automate the efforts and increase the insights for these operational engineers. While different companies may have developed the tools and techniques for this purpose already, this article suggests that the purpose of those systems is to answer questions that can be curated and automated.

With different products in the technology landscape vying for user acceptance in answering some of these questions, the notion of developing a reporting stack regardless of the size and scale of the deployments it provides information for, might seem an overhead but it is precisely the limitations of those products and the convenience and consistency of the answers that any system built to answer these questions, bring additional value and even become a staple.

Another angle of a custom approach to meeting the operational engineering needs of the deployments from various teams is the virtualized view of the different data sources so that the queries do not necessarily have to lose their relevance with the quirks and demands of the products that store the data. Let us say the incidents stored in an IT Operations Management database served by a SaaS provider needs to be related to the resources in the different subscriptions and resource groups of the cloud provider for a holistic view of the pain point resources. In such a case, there is no correlation possible because there is simply no data store or a common query that can provide the answer to both. A data store provides order and a query provides a method to come up with the answers that management might want to know about the resource types that have caused the most incidents in the last N days or to correlate it with costs. The information could very well be answered in parts either by the SaaS provider or the cloud provider or both but not as a complete answer.

No comments:

Post a Comment