Re: [chrony-users] Silent Failure -- Enhancement Request

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]



On 2024-04-19 16:40, Chris Knox wrote:
Bryah, thanks for the answer.  Yes, now that we have the scars, we're
monitoring chronyd's health carefully.  But my question goes a bit

Glad you're back up and running.

Just to make sure since the details/constraints of your operational
setup were not mentioned yet - I take it you have seen the
"Installation, configuration, and monitoring" section on
https://chrony-project.org/links.html ?

It contains many pointers to third-party monitoring & alerting
tools. In particular the chrony_exporter for Prometheus, in combination
with Alertmanager, is just plain great and flexible enough for
any conceivable operational process.

In fact based on this thread I filed an issue:
https://github.com/SuperQ/chrony_exporter/issues/75
earlier today and it already resulted in a PR:
https://github.com/SuperQ/chrony_exporter/pull/76

(..just in case anybody else using Prometheus is reading this :)

Fundamentally it's not clear what chrony can or should do when
upstream servers are not available, because it's a bottomless pit
of compounding rules, problems and workarounds, all of which are
very environment- and process-dependent.

So instead of relying on a human to read syslog it's IMHO probably
more reliable and stress-free to let a machine do the job of reading
the existing statistics, aggregating a metric that distinguishes a
warning from an error caused by wonky network delays, switch reboots
or data center movements, and then acts according to *your* specific
proceses (email, SMS, reboot..)

cheers
Holger

--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject. For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/