Re: [chrony-users] chronyd.service doesn't have Restart=on-failure?

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


On Mon, Jan 11, 2021 at 01:44:03PM +0000, Jamie Gruener wrote:
> Your comment about timeouts on top of timeouts is what I was thinking, too, and with maxpoll the same as minpoll, it wouldn't take long for us to run out of timeouts--only 16 seconds. I think this means that if chronyd can't reach a timeserver within maxpoll, it'll generate this error. Apparently this doesn't happen very often because Google produces nearly zero hits for that error. In that sense it is working as designed configured. Having a short maxpoll was clearly a mistake.

The minpoll and maxpoll options control the chronyd's polling
interval. If the source stops responding, the polling interval will
slowly move to the maxpoll, but that shouldn't cause any issues.
That's an expected state. I think the fatal error could happen even if
the source was responding. The issue seems to be in processing of the
timeouts when it is so slow that received packets cannot be processed
between them.

The smaller minpoll and maxpoll values might make it more likely to
trigger the error, but I don't see how it could happen in a normal
operation. I can only reproduce it in a debugger, or by inserting
a sleep in the right place.

Basically, the execution of the chronyd process needs to slow
down so much that after adding a new timeout, the following check
whether there are any timeouts left to be processed needs to see the
new timeout as already being late. This needs to happen several times
in a row. It's not just stopping and resuming the process. It needs to
stop at the right place.

Was there anything else happening around that time when it crashed?
I'm not sure what that could be.

The only other report of this error that I know was found to be a bug
triggered by slow name resolving. That was many years ago when chronyd
didn't even resolve names asynchronously in a separate thread.

> But I think the lines:
> > Restart=on-failure
> > RestartSec=30s
> Should be added in the [Service] section. If chronyd fails for any reason, given how important time is, I can't think why it wouldn't try to restart.

The issue is that restarting chronyd will allow the system clock to be
stepped again. That can be a serious issue in some environments. It
could be suppressed by the -R option, but there doesn't seem to be a
way to add it to the command line of the restarted chronyd. If you are
ok with that, your custom unit file should work fine. But I'd like
chronyd to not be so buggy that it is necessary to restart it
automatically.

-- 
Miroslav Lichvar


-- 
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx 
with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx 
with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/