Re: [chrony-users] chronyd.service doesn't have Restart=on-failure?

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


On Mon, Jan 11, 2021 at 05:01:42PM +0000, Jamie Gruener wrote:
> I can see is that we were at 88%+ memory usage and mid 50% CPU usage during the period leading up to the failure and immediately afterwards. I do have detailed syslog data, though, and 10 minutes before chronyd died clamav also died due to an error that is related to an out of memory condition. There's some other evidence (consul logs on other boxes) indicating that other instances were having trouble reaching the problem instance. Something was up with the box, obviously.

Ok, that might be a good hint. If the system was running out of
memory, maybe chronyd was stuck waiting for its pages to load from
disk and execute.

> My working theory is that this problem occurred because chronyd lost network connectivity which would be very similar, conceptually, to losing name resolution. It would take more effort to replicate the behavior than I have time for but setting the time server to some IP that's unreachable, and setting maxpoll to the same as minpoll (or perhaps just to 4, regardless of what minpoll is) should be sufficient, I think. 

If you have a reproducer, that would help a lot.

> Perhaps some more documentation about making sure that maxpoll is some number larger than minpoll?

There is no such requirement. Setting both minpoll and maxpoll to 4 is
perfectly fine and not expected to trigger any fatal errors, at least
not in normal conditions.

-- 
Miroslav Lichvar


-- 
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx 
with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx 
with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/