Cluster computing: Writing automation that retrieves data via APIs versus retrieving it from Kusto.

Introduction:

Many partners want access to the data that a service team maintains in its inventory. One popular technique to open up data about the resources provisioned by the service involves the use of a Kusto database and cluster. APIs provide real-time access to the data and Kusto provides continuously replicated data. Sometimes there is a lag for the refresh of the data in the shared Kusto database and it can vary arbitrarily from table to table depending on their size and use. Besides the lag, the data is expected to be the same between the two and this article explores the appropriate usage of one versus the other.

Description:

Kusto data access is an option to access the data directly from a database. It is very helpful when we want to browse through the data or explore it. APIs encapsulate logic with access to the data and provide validation, error translation and response formatting along with diagnosability, telemetry and troubleshooting help. These are owned by the service team that also owns the data. The APIs are also versioned providing applications that use them with some reassurance towards compatibility and migration path. APIs are also very performant and already have fast path for critical scenarios. They are continuously maintained for streamlined access to data and with proper controls and overrides for desired custom behavior. APIs guarantee robustness, predictability, and SLAs for existing and new capabilities that the service team authors. In this sense, it is managed access to the data

This can be compared to the Kusto data access where the compute required to extract, transform, and load the data is similar in nature to the implementation of the API but now falls under the Do-It-Yourself onus of the service. If the client teams were to invest in building access to the data via Kusto queries, they will also own the maintenance and the total cost of ownership which accrues enormously over extended periods of time. One of the unaccounted costs for Kusto comes from its fragility. The queries, the formatting of the data, the semantics and deprecation of schema associated with the data are all susceptible to change without any notification to the applications and their authors. Even the values of the data can change, and assumptions made on them can break. This implies that the cost for data access is higher for Kusto.

Another dimension of comparison is storage. The APIs provide both read-write capabilities while the Kusto is essentially for read-only purposes mandating local storage for stashed results and transformations during vectorized executions. The size of the storage is also a consideration when the frequency and the access patterns are high. An application that wishes to enrich the data in place must make a copy of the original, transform it and save it in local or remote stores but not at the source. If the data supported user-defined objects and dictionaries, then the APIs provide a way to enhance them so that the next data access will get the additional state persisted with the access.

APIs become the first choice for accessing data, but Kusto can be useful in automations that cannot wait for the functionality to be available via APIs that are published by the service team. They are also very useful to write one-off automations that are special purposed or dedicated without any impact to customers. Most commercial systems will rely on APIs for interacting between services especially for production environments and cloud scale. In-house projects, and reporting dashboards can make use of Kusto directly or Azure Data Explorer or automations based on them. Kusto queries can also be quite elaborate or custom defined to suit specifics needs that are faster and lightweight compared to the staged management, release pipelines and scheduled delivery of features introduced into the APIs. The ability to include such queries in background automation is sometimes useful because they don’t have interactivity with the customer and those automations can be kicked off periodically or on-demand.

Both Kusto and API data access can be programmatic involving the use of a query provider and an http client respectively. But the code for the Kusto data access will likely involve more packing and unpacking into objects as well as conversions whereas the requests and responses for the API come even versioned with their corresponding APIs in already composed form. This investment can be made if the language and the query provide usefulness that is not available otherwise or requires much more code to be written on the API side. No code or low code scenarios prefer this approach, but those scenarios do not include cases where transfer of data must be made formal.

Conclusion:

Data access, its mode and delivery are governed by factors that together weigh in favor of one versus the other.

Cluster computing

Tuesday, February 8, 2022

Writing automation that retrieves data via APIs versus retrieving it from Kusto.

Introduction:

Description:

Conclusion:

No comments:

Post a Comment