Wednesday, December 1, 2021

Continued from previous post

 

The next step would require an increase in the resource units (RU) pertaining to this operation. When the RU is quadrupled, the throughput increases from 19 requests/second to 23 requests per second, and the average latency drops from 669ms to 569 ms. Notice that the maximum throughput is not significantly higher, but it eliminates all the 429 errors that were encountered. This is still a significant win.

The number of RUs provisioned still had sufficient headroom between provisioned and consumption. At this point, we could increase the RU per partition but let us review another angle where we plot the number of calls to the database per successful operation.  The number of calls reduces from 11 to 9 but it should match the actual query plan. This implies that the database call was for a cross-partition query that targeted all nine partitions. The client must fan out the query to all the partitions and collect the results. The queries however were completed one after the other. The operation takes as long as the sum of all the queries and the problem will only get worse as the size of the data grows and more physical partitions are added.

If the queries were executed in parallel, the latency would decrease, and the throughput would increase. In fact, the gains would be so much that the throughput would keep pace with the load. One of the side effects of increasing the throughput is that the resource unit consumption would increase and the headroom between the provisioned and the consumption would shrink. This would entail a database scale-out of the operation, but an alternative might be to optimize the query. The cross-partition query is a concern especially given that it is being run every time instead of selectively. The query is trying to filter the data based on the owner and the time of the call. Switching the collection to the new partition key where the owner ID is the partition helps mitigate the cross-partition querying. This will dramatically improve the throughput and keep it more regular just like the other calls noticed from the monitoring data. A consequence of the improved performance is that the node CPU utilization is also improved. When this happens, we know that the bottleneck has been eliminated.

 

No comments:

Post a Comment