Introduction: This article focuses on the operational engineering aspects of Cloud DevOps and solutions development. While the public cloud and the technology landscape owned by companies play up their strengths for operational engineering, there is no defined solution or product that addresses all the concerns of this discipline. The public cloud does come with immense capabilities for monitoring and management of resources but leaves the authoring of rules, alerts, action group, dashboards and their organizations by scope and level to the folks. Individual teams end up with custom practice and approach for a single pane of management or often do without it by resorting to those built-in products and repeated efforts usually by resources on rotation.
Operational engineering is just as much about asking the
right questions for the smooth running of operations as it is about
troubleshooting and remediations for current and future deployments. Instead of running the same questions time
and again through different sets of means and personnel, it might be better to
articulate them and invest in systems that can automate the efforts and
increase the insights for these operational engineers. While different
companies may have developed the tools and techniques for this purpose already,
this article suggests that the purpose of those systems is to answer questions
that can be curated and automated.
With different products in the technology landscape vying
for user acceptance in answering some of these questions, the notion of
developing a reporting stack regardless of the size and scale of the
deployments it provides information for, might seem an overhead but it is
precisely the limitations of those products and the convenience and consistency
of the answers that any system built to answer these questions, bring
additional value and even become a staple.
Another angle of a custom approach to meeting the
operational engineering needs of the deployments from various teams is the
virtualized view of the different data sources so that the queries do not
necessarily have to lose their relevance with the quirks and demands of the
products that store the data. Let us say the incidents stored in an IT
Operations Management database served by a SaaS provider needs to be related to
the resources in the different subscriptions and resource groups of the cloud
provider for a holistic view of the pain point resources. In such a case, there
is no correlation possible because there is simply no data store or a common
query that can provide the answer to both. A data store provides order and a
query provides a method to come up with the answers that management might want
to know about the resource types that have caused the most incidents in the
last N days or to correlate it with costs. The information could very well be
answered in parts either by the SaaS provider or the cloud provider or both but
not as a complete answer.
No comments:
Post a Comment