Sensu Checks to report Metrics

Once you understand the basics of creating a Sensu check, creating a check that reports metrics is actually pretty simple. Here’s the lowdown.

The differences vs a standard check

Sensu needs to be told that this is a metrics check:

1
2
3
4
5
{
  "type": "metric", // <- Boom
  "command": "disk-usage-metrics.rb",
  // ...
}

The exit status (ok, warning, critical) can still be monitored by Sensu, so no change there. Although my understanding is that most metrics checks simply take care of the metrics aspect and aren’t built for alerting with critical and warning statuses.

The textual output is expected to be in Graphite format. Most of the time it’s spread over multiple lines, to output multiple data points:

1
2
3
4
5
lolcathost.disk_usage.lolcats.used   9000000    1383246228
lolcathost.disk_usage.lolcats.avail   9000    1383246228
lolcathost.disk_usage.lolcats.used_percentage   99    1383246228
lolcathost.disk_usage.root.used   42    1383246229
...

The Graphite format is pretty simple: a path of words separated by dots, a numeric value and an optional timestamp (defaults to the time of reception if not supplied).

Ruby is nicer

Once again, you can use the sensu-plugin gem to get a few things handled automatically for you. Here’s the basics for building a metrics check.

  • You inherit from Sensu::Plugin::Metric::CLI::Graphite
  • You still implement run().
  • You still describe configurations with option and access them with config[].
  • You output each stat with output(name, value, timestamp).
  • You need to at least end with ok(), you can also use the other exit helpers if you want.

Here’s disk-usage-metrics.rb as an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/bin/env ruby

require 'rubygems' if RUBY_VERSION < '1.9.0'
require 'sensu-plugin/metric/cli'
require 'socket'

class DiskGraphite < Sensu::Plugin::Metric::CLI::Graphite

  option :scheme,
    :description => "Metric naming scheme, text to prepend to metric",
    :short => "-s SCHEME",
    :long => "--scheme SCHEME",
    :default => "#{Socket.gethostname}.disk_usage"

  def run
    # http://www.kernel.org/doc/Documentation/iostats.txt
    metrics = [
      'reads', 'readsMerged', 'sectorsRead', 'readTime',
      'writes', 'writesMerged', 'sectorsWritten', 'writeTime',
      'ioInProgress', 'ioTime', 'ioTimeWeighted'
    ]

    File.open("/proc/diskstats", "r").each_line do |line|
      stats = line.strip.split(/\s+/)
      _major, _minor, dev = stats.shift(3)
      next if stats == ['0'].cycle.take(stats.size)

      metrics.size.times { |i| output "#{config[:scheme]}.#{dev}.#{metrics[i]}", stats[i] }
    end

    ok
  end
end

Conventions

A few conventions have evolved in the sensu-community-plugins.

Most standard checks are named check-xxx and most metrics checks are named xxx-metrics.

A more interesting convention has also evolved, around the metrics checks. If you look back at the disk-usage-metrics.rb code up there, you’ll notice the --scheme option defaults to hostname + plugin name. Let’s come back to the example output I gave earlier:

1
lolcathost.disk_usage.lolcats.used   9000000    1383246228

The first two can be overridden with --scheme, the rest is decided by the plugin. There’s a few scenarios where you could want to override the scheme.

  • If your systems have properly set FQDNs, Socket.gethostname will return that. Which may give you metrics named like www1.app-name.phoenix-1.example.com.disk_usage.root.used. If that’s too unwieldy for you, you could for example invoke the check with --scheme $(hostname --short).disk_usage.
  • You may want to nest your Sensu-generated metrics deeper into an existing Graphite ontology: --scheme system.$(hostname).disk_usage.
  • If you’re using a cloud Graphite provider like HostedGraphite, you may need to prepend your account’s API key to the metric name you’re sending: --scheme deadbeef4242.$(hostname).disk_usage or even better --scheme :::hostedgraphite.apikey:::$(hostname).disk_usage.

Most of the metrics check have this option. If you want to share your new metric check, I would strongly recommend adding this same option as well.

Conclusion

So that’s it! I hope my lenghty prose still let me demonstrate clearly how easy it is to create standard and metrics Sensu checks. Let me know if you have questions or if you think I’ve overlooked anything!

Comments