On Mon, Jan 11, 2021 at 06:15:07PM +0100, Miroslav Lichvar wrote:
> On Mon, Jan 11, 2021 at 05:01:42PM +0000, Jamie Gruener wrote:
> > I can see is that we were at 88%+ memory usage and mid 50% CPU usage during the period leading up to the failure and immediately afterwards. I do have detailed syslog data, though, and 10 minutes before chronyd died clamav also died due to an error that is related to an out of memory condition. There's some other evidence (consul logs on other boxes) indicating that other instances were having trouble reaching the problem instance. Something was up with the box, obviously.
> Ok, that might be a good hint. If the system was running out of
> memory, maybe chronyd was stuck waiting for its pages to load from
> disk and execute.

I have commited a fix to trigger the error only when the rate of
dispatched timeouts is higher than 100 per second. That should not
happen in the slow execution in low-memory conditions.

Miroslav Lichvar

