Wednesday, October 27, 2021

Versioning stored documents in Azure Cosmos DB


Introduction

History of data is often important as much as the data itself. For example, Finance, healthcare and insurance industries often track histories of portions of the data for audit purposes, and reporting. CosmosDB forms the storage layer for many microservices in Azure. This article explains the ‘change-feed’ feature associated with this storage.

Description:

CosmosDB exposes an API for the underlying log of changes regarding the documents in its collection. For users familiar with the SQL Server relational store, this is the equivalent of the change data capture. The changes are recorded incrementally and can be distributed across one or more consumers for parallel processing, enabling a variety of applications. The change feed works for updates and other forms of writes but not deletions. Usually only the most recent change is available. Intermediate changes are not visible.

The change feed is not targeted at solving all the versioning requirements from the CosmosDB store. That requires a Document Versioning Pattern which involves the following:

1.       Intent – This ensures that each entity in collections, when updated maintains the history of changes.

2.       Motivation – This tracks the history of entities throughout their lifecycle

3.       Applicability – This covers the usages such as auditing, reporting and analysis

4.       Structure – In order to keep the state of the objects, every update must be turned into an append operation.

5.       Participants - A materialized view is made possible with the change feed

6.       Consequences – This should work for short and long histories. If it suffers performance degradation, it might not apply to all use cases for versioning.

Change feed allows the use of a “soft marker” on the items for the updates and the filter based on that when the processing items in the change feed. This enables the recording of deletes since deletes are not supported. Inserts and updates are recorded by the change feed automatically.

Change feed items come in the order of their modification time. This sort order is guaranteed per logical partition key.

In a multi-region Azure Cosmos DB account, the failover of a write region will be supported where the change feed will work across the manual failover operation and will remain contiguous.

Conclusion:

This approach solves the capture of data changes for its applicability to auditing, reporting and analysis.

No comments:

Post a Comment