RE: [chrony-users] chronyd.service doesn't have Restart=on-failure?

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


Thanks for responding!

Here are the directives from our chrony.conf file as it was when it failed:
server 169.254.169.123 prefer iburst minpoll 4 maxpoll 4
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony

We've since increased maxpoll to 10. At the time of the failure there were some network issues going on, either a flapping (virtual) port or some other issue, possibly DNS. This is an AWS EC2 instance, so our visibility into the physical network infrastructure is pretty minimal.

Yes, I did mean minpoll and maxpoll. 😊

Your comment about timeouts on top of timeouts is what I was thinking, too, and with maxpoll the same as minpoll, it wouldn't take long for us to run out of timeouts--only 16 seconds. I think this means that if chronyd can't reach a timeserver within maxpoll, it'll generate this error. Apparently this doesn't happen very often because Google produces nearly zero hits for that error. In that sense it is working as designed configured. Having a short maxpoll was clearly a mistake.

To be sure, I'm not looking to send chronyd any different values on restart, but that the chronyd service *should* be set to restart after a failure.

The current default service file looks like this:

[Unit]
Description=NTP client/server
Documentation=man:chronyd(8) man:chrony.conf(5)
After=ntpdate.service sntp.service ntpd.service
Conflicts=ntpd.service systemd-timesyncd.service
ConditionCapability=CAP_SYS_TIME

[Service]
Type=forking
PIDFile=/var/run/chrony/chronyd.pid
EnvironmentFile=-/etc/sysconfig/chronyd
ExecStart=/usr/sbin/chronyd $OPTIONS
ExecStartPost=/usr/libexec/chrony-helper update-daemon
PrivateTmp=yes
ProtectHome=yes
ProtectSystem=full

[Install]
WantedBy=multi-user.target

But I think the lines:
> Restart=on-failure
> RestartSec=30s
Should be added in the [Service] section. If chronyd fails for any reason, given how important time is, I can't think why it wouldn't try to restart.

--Jamie

Jamie Gruener | Director of IT & Security, Biospatial, Inc. | 919-624-9760 | jamie.gruener@xxxxxxxxxxxxx

-----Original Message-----
From: Miroslav Lichvar <mlichvar@xxxxxxxxxx> 
Sent: Monday, January 11, 2021 5:11 AM
To: chrony-users@xxxxxxxxxxxxxxxxxxxx
Subject: Re: [chrony-users] chronyd.service doesn't have Restart=on-failure?

CAUTION: This email originated from outside of Biospatial. Do not click links or open attachments unless you recognize the sender and know the content is safe.


On Thu, Jan 07, 2021 at 10:13:12PM +0000, Jamie Gruener wrote:
> We had a scenario where an instance generated the following error after a brief (10-30s?) network issue:
> <26>Dec 16 05:15:45 [instance_name_redacted] chronyd[557]: Fatal error 
> : Possible infinite loop in scheduling

That's bad. It is not supposed to happen unless the machine is extremely overloaded and chronyd have its execution delayed so much that its timeouts are just creating more timeouts without processing any input/output.

Can you post the chrony configuration in which this happened?

> My guess is that this problem occurred because our min and max retries were set the same at 4, but I don't know for sure. We've since changed that.

You mean minpoll and maxpoll?
>
> Here's my question. Why doesn't the default service file for chronyd have lines like the ones below?
>
> Restart=on-failure
> RestartSec=30s

That might help, maybe with a bit longer interval to not flood servers too much if chronyd was crashing soon after start, but there as in issue that it might break expectations about clock steps happening only on start (the makestep limit). AFAIK, there is no way to specify additional options in a systemd unit to be passed to a restarted chronyd.

--
Miroslav Lichvar


--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.

N������y隊W!���������n���\��"������z)�.n7��Z+��f����|�������'��}���*+�����)�.n7��:蹹^f��X��f����'��}���*+


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/