This is a continuation of an article that describes operational considerations for hosting
solutions on the Azure public cloud.
There are several references to best practices throughout
the series of articles we wrote from the documentation for the Azure Public
Cloud. The previous article focused on the antipatterns to avoid, specifically
the noisy neighbor antipattern. This article focuses on the performance tuning
of CosmosDB usages
An example of an application using CosmosDB is a drone
delivery application that runs on Azure Kubernetes Service. When a fleet of drones sends position data in
real-time to Azure IoT Hub, a functions app receives the events, transforms the
data into GeoJSON format, and writes it to CosmosDB. The geospatial data in
CosmosDB can be indexed for efficient spatial queries which enables a client
application to query all drones within a finite distance of a given location or
find all drones in a certain polygon. Azure Functions is used to write data to
CosmosDB because it can be lightweight and there is no requirement to require a
full-fledged stream processing engine that joins streams, aggregates data, or
processes across time windows and CosmosDB can support high write throughput.
Monitoring data for CosmosDB can show 429 error codes in
responses. Cosmos DB would throw this error when it is temporarily throttling
requests and usually when the caller is consuming more resource units than
provisioned. It can also be thrown when
the items to be created are already existing in the store.
When the 429-error code and is accompanied with a wait of
about 600 ms before the operation is retried, it points to waits without any
corresponding activity. Another chart for resource unit consumption per
partition versus provisioned resource units per partition will help with the original
cause for the 429-error preceding the wait. This may show that the resource
unit consumption has exceeded the provisioned resource units.
Another likely case for CosmosDB errors is the incorrect
usage of partition keys. Cross-partition queries may result when queries do not
include a partition key, and this is quite inefficient. It might even lead to
high latency when multiple database partitions are queried in serial. On the
opposite side, hot write partitions may result when the documents are being
written and a partition key is missing.
A partition heat map can assist in this regard because it will show the
head room between allocated and consumed resource units.
CosmosDB provides snapshot isolation. So, it is important
to include version string with the operations. There is a system defined _eTag
property that is automatically generated and updated by the server every time
the item is updated. _eTag can be used with the client supplied if-match
request header to allow the server to decide whether an item can be
conditionally updated. This property value changes every time it is updated and
this can be relied upon as a signal to the application to reapply updates and
retry the original client request.
No comments:
Post a Comment