I'm a systems engineer that is a contributor to the Prometheus monitoring system. I also maintain the servers for my company.
I've been following various ntpd replacement projects and I'm pretty impressed with the progress of Chrony.
One of the things I would need to do in order to replace our existing monitoring of ntpd. We currently parse the output of `ntpq -np` in order to generate metrics.
Prometheus uses a simple metric+labels combination format, similar things like OpenTSDB.
Here's an example of what `ntpq -np` turns into:
# TYPE node_ntpd_delay_milliseconds gauge
# TYPE node_ntpd_jitter_milliseconds gauge
# TYPE node_ntpd_offset_milliseconds gauge
# TYPE node_ntpd_peer_status gauge
This allows us to keep running timeseries metrics for peers, and write rules for things like "node_ntpd_peer_status < 4" to find unsynced servers. See here for the code to status value map.
The above metrics are generated by a bash script, which works but isn't my favorite way to deal with getting metrics from software.
So far, I haven't been able to find a good programmatic way to extract stats with chronyc. There are a bunch of annoying parsing issues with things like the sourcestats command. The offset includes a precision, so I have to parse the precision and convert that to be all in one precision. I haven't seen much documentation on the protocol between chronyc and chronyd.
A couple of specific questions.
* Would chrony be interested in supporting the Prometheus metrics format?
* Is there a mode for the various metrics outputs to be more machine readable? (json?)
* Is there documentation for the chronyc protocol outside the code?
* Are there any non-C chronyc client implementations? (python/ruby/whatever)