[chrony-users] Monitoring Chrony

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


Hey,

I'm a systems engineer that is a contributor to the Prometheus monitoring system.  I also maintain the servers for my company.

I've been following various ntpd replacement projects and I'm pretty impressed with the progress of Chrony.

One of the things I would need to do in order to replace our existing monitoring of ntpd.  We currently parse the output of `ntpq -np` in order to generate metrics.

Prometheus[0] uses a simple metric+labels combination format, similar things like OpenTSDB.

Here's an example of what `ntpq -np` turns into:

# TYPE node_ntpd_delay_milliseconds gauge
node_ntpd_delay_milliseconds{remote="130.149.17.8"} 17.092
node_ntpd_delay_milliseconds{remote="193.190.230..65"} 4.937
node_ntpd_delay_milliseconds{remote="82.95.215.61"} 11.726
# TYPE node_ntpd_jitter_milliseconds gauge
node_ntpd_jitter_milliseconds{remote="130.149.17.8"} 0.494
node_ntpd_jitter_milliseconds{remote="193.190.230.65"} 0.770
node_ntpd_jitter_milliseconds{remote="82.95.215.61"} 0.722
# TYPE node_ntpd_offset_milliseconds gauge
node_ntpd_offset_milliseconds{remote="130.149.17.8"} 1.675
node_ntpd_offset_milliseconds{remote="193.190.230.65"} 0.135
node_ntpd_offset_milliseconds{remote="82.95.215.61"} -0.645
# TYPE node_ntpd_peer_status gauge
node_ntpd_peer_status{remote="130.149.17.8",reference=".GPS.",stratum="1",type="unicast"} 3
node_ntpd_peer_status{remote="193.190.230.65",reference=".MRS.",stratum="1",type="unicast"} 4
node_ntpd_peer_status{remote="82..95.215.61",reference=".PPS.",stratum="1",type="unicast"} 6

This allows us to keep running timeseries metrics for peers, and write rules for things like "node_ntpd_peer_status < 4" to find unsynced servers. See here[1] for the code to status value map.

The above metrics are generated by a bash script, which works but isn't my favorite way to deal with getting metrics from software.

So far, I haven't been able to find a good programmatic way to extract stats with chronyc.  There are a bunch of annoying parsing issues with things like the sourcestats command.  The offset includes a precision, so I have to parse the precision and convert that to be all in one precision.  I haven't seen much documentation on the protocol between chronyc and chronyd.

A couple of specific questions.
* Would chrony be interested in supporting the Prometheus metrics format?
* Is there a mode for the various metrics outputs to be more machine readable? (json?)
* Is there documentation for the chronyc protocol outside the code?
* Are there any non-C chronyc client implementations? (python/ruby/whatever)

[0]: http://prometheus.io/
[1]: https://www.eecis.udel.edu/~mills/ntp/html/decode.html#peer

- Ben Kochie


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/