This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on Azure Data Lake which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category. This article focuses on Azure Data Lake which is suited to store and handle Big Data. This is built over Azure Blob Storage, so it provides native support for web-accessible documents. It is not a massive virtual data warehouse, but it powers a lot of analytics and is centerpiece of most solutions the conform to the Big Data architectural style. In this section, we focus on Data Lake monitoring.
As we might expect from the use of Azure storage account,
the monitoring for the Azure Data Lake leverages the monitoring for the storage
account. Azure Storage Analytics performs logging and provides metric data for
a storage account. This data can be used to trace requests, analyze usage
trends, and diagnose issues with the storage account.
The storage account must enable it individually for each
service that needs to be monitored. Blobs, queues, tables and file services are
all subject to monitoring. The aggregated data is stored in a well-known blob
designated for logging and in well-known tables, which may be accessed using
the Blob service and Table service APIs. There is a 20TB limit for the metrics
and this is besides what size is provisioned for data, so we can forget about
resizing. When we monitor a storage
service, the service health, capacity, availability and performance of the
service is studied. The service health can be observed from the portal and the
notifications can be subscribed to. The $MetricsCapacityBlob enables monitoring
for the blob service. Storage Metrics records this data once per day. The
capacity is measured in bytes and both the containerCount and ObjectCount are
available per daily entry. Availability is monitored in the hourly and minute
metrics tables that record primary transactions against blob, tables and
queues. The availability data is a column in these tables. Performance is
measured in the AverageE2ELatency and AverageServerLatency columns. The
E2ELatency is recorded only for successful requests and the average server
latency includes time that the client takes to send the data and receive
acknowledgements from the storage service. A high value for the first and a low
value for the second implies that the client is slow, or the network
connectivity is poor. Nagle’s algorithm is a TCP optimization on the sender and
it is designed to reduce network congestion by coalescing small send requests
into larger TCP segments. So small segments are held back until a larger
segment is available to send the data. But it does not work well with delayed
acknowledgements that are an optimization on the receiver side. When the
receiver delays the ack and the sender waits for the ack to send a small
segment, the data transfer gets stalled. Turning off these optimizations
enables with improved table, blob and queue usages.
Requests to create blobs for logging and the requests to
create table entities for metrics are billable. Older logging and metrics data
can be archived or truncated. As with all data using Azure storage, this is set
via the retention policy on the containers. Metrics are stored for both the
service and the API operations of that service which includes the percentages
and the count of certain status messages. These features can help analyze the
cost aspect of the usages.
No comments:
Post a Comment