Cluster computing: Kusto data producer

Azure Kusto data access differs by producer and consumer. This article explains how to ingest data into Kusto with a producer and how to retrieve data with a consumer.

Producers:

There are a few products to publish data to Kusto. These include:

· Metadata Mover: Cosmos DB changes result in a change feed that can be used to ingest Kusto data tables.

· Azure Data Factory: This is a service designed to bridge disparate data sources. It is quick and easy to use a preconfigured data pipeline, connects to both SQL DB and KQL (Kusto Query Language) cluster, and allows the creation of scheduled triggers. The pipeline will not run an activity individually. ADF (Azure Data Factory) requires MSI (Managed Service Identity) and does not have support for MI (Managed Identities)

· Studio Jobs: replicates source to destination fully every time including new columns

These involve two different approaches primarily:

· Change tracking. The source must support change tracking and publishing via a change feed. It should be detailed about every delta in terms of the scope, type and description of the change and must come with versioning so that each change can be referred to by its version. Then these changes can be applied to Kusto

· E2E Workflow: This is a two-stage publishing.

o The first stage does an initial load from the source to the destination again using some watermark such as a version or a timestamp for the data to be transferred to the destination.

o The second stage involves periodic incremental loading

Some progress indicator is needed for incremental updates. Prefer overwriting the destination to reading and merging changes, if any.

There can also be a hybrid man-in-the-middle implementation that acts as consumer for source and a producer for destination, but it would be implementing both the producer and consumer in code versus leveraging the capabilities of these technologies in an ADO pipeline.

ADO is widely recognized as the best way to create a pipeline in Azure and your task to add data to Kusto requires you to connect your data source to Kusto.

The Azure DevOps project represents a fundamental container where data is stored when added to Azure DevOps. Since it is a repository for packages and a place for users to plan, track progress, and collaborate on building workflows, it must scale with the organization. When a project is created, a team is created by the same name. For an enterprise, it is better to use collection-project-team structure which provides teams with a high level of autonomy and supports administrative tasks to occur at the appropriate level.  

Cluster computing

Friday, April 15, 2022

Kusto data producer

No comments:

Post a Comment