Tuesday, April 24, 2018


Events as a measurement of Cloud Database performance
In the previous post, we talked about cloud databases. They come with the benefits of the cloud and can still offer at par performance with on-premise database and perhaps better Service-Level Agreements. Today we talk about a test case to measure performance of a cloud database, say Azure Cosmos Database using it as a queue.  I could not find a comparable study to match this test case.
Test case: Let us say, I stream millions of events per minute to CosmosDB and each bears a tuple <EventID, Byte[] payload> The payload is irrelevant other than forcing a data copy operation for each event thereby trying to introduce processing time so I use the same payload with each event.
The EventIDs range from 1 to Integer.MaxValue. I want to store this in a  big table where I utilize the 255 byte row key to support horizontal scaling of my table.
Then I want to run a query every few seconds to tell me the top 50 events by count in the table.
The Query will be :
SELECT events.ID id,
COUNT(*) OVER ( PARTITION BY events.ID ) count,
ORDER BY count DESC;
Which semantically has the same meaning as GROUP BY COUNT(*)
This is a read-only query so the writes to the table should not suffer any performance impact. In other words, the analysis query is separate from the transactional queries to insert the events in a table.
In order to tune the performance of the Cloud Database, I set the connection policy to
·         Direct mode
·         Protocol to TCP
·         Avoid startup latency on first request
·         Collocate clients in same Azure region for performance
·         Increase number of threads/tasks
This helps to squeeze the processing time of the events as they make their way to the table. The number of Request Units (RU) will be attempted to exceed 100,000
This high throughput will facilitate a load that the analysis query performance may become slower over time with the massive increase in data size.
Conclusion – This test case and its pricing will help determine if it is indeed practical to store high volume traffic in the database such as from mobile fleet or IoT devices.


1 comment: