Monday, September 26, 2016

Today we look at an interesting problem space of metrics reporting.
Almost all application owners want to have an understanding of their resources usages. These resources usage metrics may apply to different clouds, per cloud or even as granular as per services.For example, we may have a microservice that provisions a single resource for the application. It may have APIs for create-update-delete that return different status codes. A simple metric may be to find out success rate in terms of the total calls made. The success in this case is determined by the number of http success status code sent across all the api calls. Since this kind of metric comes from individual services, the application owner may want to see these metrics across all services. This is best shown in  a dashboard with drill downs to different services.

It's important to note that metrics data is inherently time based. Although metrics can be spot measurements, cumulative over time or delta between time slices, they have an association with the time that they were measured.  Consequently, most metric data is stored in a time series database. Also, previous measurements can be summarized with statistics while the running measurements apply to only the current window of time. Therefore, some of these time-series databases can even be fix-sized.

There are many different tools available to present such charts from data sources. Almost all such reporting mechanisms want to pull data for their charts. However, they cannot directly call the services to query their metrics because this may affect the functionality and performance of those services. Therefore they do it in a passive manner by finding information from log trails and database states. Most services have logging configured and their logs are usually indexed in a time series database of their own. Consequently the pull information can come from these logs.

The Grafana stack is one such custom reporting solution. It presents beautiful dashboards with immense flexibility to add and configure different charts from several time series data sources. As long as the metrics data makes its way to the database configured with Grafana such as Graphite, InfluxDB, Prometheus, the reporting solution is a breeze.

Generally, this information flow into the reporting database needs to be automated. In the example cited above, it would be pulled from the log index server and pushed into say Graphite.  This collection statistics daemon usually works very well in periodically pulling information from wherever they are available. If the daemon has plugins to custom reporting data sources such as Graphite or full service reporting solutions to say Nagios or Zabbix, then the data automatically makes its way to charts for the end user.

Since there are different technologies with overlapping functionalities and even more vendors for those technologies in any IT deployments, workforce often has to decide how the information flows. Sometimes this is straightforward in an out of box mechanism from a purchased or re-licensed software but other times, it requires automation to route the data from existing data sources that can neither be replaced because of their operational dependency nor can they be configured to directly send data to the charts. It's in this case where collection may need to be written.

Billing and Monitoring are two aspects of IT operations that often have dependencies on these very same metric data. Therefore, the software products for those operations may very well support reporting out of the box.

This goes to show that there are many ways to tackle the same problem for metrics to charts data flow but the thumbrules of 1) reusing existing infrastructure and 2) applying customization - be it automation or dashboards, to be as minimal as necessary, hold true in many contexts.

#codinginterview
In yesterday's post we discussed decoding digits as letters from a stream. It was batch processing. Here we optimize it:
void decodeStreamOptimized(Stream numbers,  ref List<string> results)
{
var last = int_max;
var current =  numbers.Read();
while ( current != EOF){
var last = digits.Last();
digits = new List<int>(){ last, current };
s = new StringBuilder();
decode( digits, digits.count(), ref s, ref results);
// now separate levels between consecutive evaluations
results.Add(null);
last = current;
current = numbers.Read();
}
}
The results in each evaluation round is a single character denoted by z if the current digit is taken independently or z' if the last and current digit is taken together
Consecutive results R1 and R2 are therefore stitched together as one of two binary selections:
R1zR2z or R2z'

Another way to optimize would be to take a large window and translate before moving to the next.

No comments:

Post a Comment