Tuesday, January 31, 2017

Grafana - No More Non-Zero Flat Lines

This tutorial explains a workaround for a design feature of the Dropwizard metrics library. The undesired idiosyncrasy happens when there is no activity that would generate metric data. On the project I'm working on, requests to a web service are what generate metric data. Thus, when there are no requests, the anomaly occurs. Justin Mason goes into great detail on the cause of this on his blog here.

On the project I work on, a web service uses the Dropwizard library with an Slf4jReporter that writes to Syslog. This Syslog data is then fed into Graphite. Grafana dashboards are then created using the graphite data. This tutorial assumes a working knowledge of Grafana.

Here's a scenario describing the undesired behavior. When there are no requests, the Dropwizard library continues to report the last metric values for the most recent request. For example, at 16:50 a request is made that takes 173 ms to process, no other requests are made until 02:50. See image labeled Original Graph. The metric data in Graphite will continue to show a 173 ms response time between 16:50 and 02:50. The untrained eye could interpret this as follows. At midnight, the response time for Web Service X was 173 ms, which is not true. A how-to-workaround to prevent this misinterpretation is the topic of this blog.



Original Graph
The goal of this tutorial is to show you how to manipulate the display of the data in such a way that flat-lines go back to zero. For arguments sake, let's say the graph above is generated by the query below. Those of you with experience know it's not, so just pretend :-)


Base Query

The first step is to clone two additional queries and apply the derivative function to each. This results in the three following queries.


Now, displaying derivative query C would appear as follows.



Next divide the two derivative queries by each other. See image labeled "Divide Query". What kind of nonsense is this you ask? Bear with me. This results in a graph showing the points of change, whether positive or negative, as a spike with a value of one. See image labeled "Divide Graph".


Divide Query

Divide Graph


To ensure you see the points in the graph above, make sure line options Null point mode is set to "null as zero".


Next multiply the original query by the division result. See image labeled "Multiply Query". This results in graph that looks like the original except the flat lines go to zero. See image labeled "Multiply Graph".

Multiply Query
Multiply Graph

To smooth-out the jaggedness of the graph, you could use the summarize function:

Summarize Query
Which results in the image below.




The future holds an update to the Dropwizard library that allows us to use a custom reservoir here. This would produce the required result at the source. However, it is not yet available. There are workarounds that currently use a custom reservoir and that would require a local build of either Codahale or Dropwizard. I'd bet a paycheck my employer would frown upon this for production software.