[chrony-users] Monitoring Chrony

Hey,

I'm a systems engineer that is a contributor to the Prometheus monitoring system. I also maintain the servers for my company.

I've been following various ntpd replacement projects and I'm pretty impressed with the progress of Chrony.

One of the things I would need to do in order to replace our existing monitoring of ntpd. We currently parse the output of `ntpq -np` in order to generate metrics.

Prometheus[0] uses a simple metric+labels combination format, similar things like OpenTSDB.

Here's an example of what `ntpq -np` turns into:

# TYPE node_ntpd_delay_milliseconds gauge

node_ntpd_delay_milliseconds{remote="130.149.17.8"} 17.092

node_ntpd_delay_milliseconds{remote="193.190.230..65"} 4.937

node_ntpd_delay_milliseconds{remote="82.95.215.61"} 11.726

# TYPE node_ntpd_jitter_milliseconds gauge

node_ntpd_jitter_milliseconds{remote="130.149.17.8"} 0.494

node_ntpd_jitter_milliseconds{remote="193.190.230.65"} 0.770

node_ntpd_jitter_milliseconds{remote="82.95.215.61"} 0.722

# TYPE node_ntpd_offset_milliseconds gauge

node_ntpd_offset_milliseconds{remote="130.149.17.8"} 1.675

node_ntpd_offset_milliseconds{remote="193.190.230.65"} 0.135

node_ntpd_offset_milliseconds{remote="82.95.215.61"} -0.645

# TYPE node_ntpd_peer_status gauge

node_ntpd_peer_status{remote="130.149.17.8",reference=".GPS.",stratum="1",type="unicast"} 3

node_ntpd_peer_status{remote="193.190.230.65",reference=".MRS.",stratum="1",type="unicast"} 4

node_ntpd_peer_status{remote="82..95.215.61",reference=".PPS.",stratum="1",type="unicast"} 6

This allows us to keep running timeseries metrics for peers, and write rules for things like "node_ntpd_peer_status < 4" to find unsynced servers. See here[1] for the code to status value map.

The above metrics are generated by a bash script, which works but isn't my favorite way to deal with getting metrics from software.

So far, I haven't been able to find a good programmatic way to extract stats with chronyc. There are a bunch of annoying parsing issues with things like the sourcestats command. The offset includes a precision, so I have to parse the precision and convert that to be all in one precision. I haven't seen much documentation on the protocol between chronyc and chronyd.

A couple of specific questions.

* Would chrony be interested in supporting the Prometheus metrics format?

* Is there a mode for the various metrics outputs to be more machine readable? (json?)

* Is there documentation for the chronyc protocol outside the code?

* Are there any non-C chronyc client implementations? (python/ruby/whatever)

[0]: http://prometheus.io/

[1]: https://www.eecis.udel.edu/~mills/ntp/html/decode.html#peer

- Ben Kochie