RE: [chrony-dev] Chrony stuck in an endless loop

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-dev Archives ]


Our boards have an rtc and it is disciplined by chrony. I think your
proposal should solve the issue. The function
RGR_FindBestRobustRegression is only called on 2 places, and the
tolerance (tol) value given in both function calls is high enough to
keep things significant.

> -----Original Message-----
> From: Bill Unruh [mailto:unruh@xxxxxxxxxxxxxx]
> Sent: Monday, November 12, 2012 19:19
> 
> It seems that the only place that the RGR_FindBestRobustRegression is
> used in in rtc.c and manual.c
> 
> It seems that it is estimating the standard deviation of slope and
> getting essentially zero for that, which gives a tiny value for incr.
> Perhaps the lines setting incr could be changed to
>      if (sb > tol) {
>        incr = 3.0 * sb;
>      } else {
>        incr = 3.0 * tol;
>      }
> 
> Of course one should check that tol has a reasonable value then (eg
not
> negative, etc)
> 
> Do you boards have an onboard rtc at all?
> 
> 
> 
> On Mon, 12 Nov 2012, Hattink, Tjalling [FINT] wrote:
> 
> > Hi,
> >
> > Recently I've encountered a lockup of chrony on one of our embedded
> > boards running Linux. It was consuming all CPU resources and not
> > acting on any inputs anymore. I was able to attach a gdb debugger to
> > the process with symbols, so I could see the callstack and where it
> > was stuck.
> >
> > The version I was using is 1.24. And it got stuck in the file
> > regress.c, line 570 to 578. The while loop there never exited. I
also
> > checked the latest regress.c version in git, and I see the same loop
> > at line 600 to 608, so I suspect the latest chrony is still affected
> by this issue.
> >
> > The reason for getting stuck was that the blo and bhi variables
never
> > changed. They both contained the same value -1.9333333333334166e-06.
> > The incr variable contained the value 9.061309612524684e-25. The
> > reason why blo and bhi didn't change was that incr is so small that
> it
> > is insignificant when added/substracted to blo and bhi.
> >
> > After I changed the incr value to a much bigger number 1.0e-15 using
> > gdb the loop exited and chrony was working properly again.
> >
> > The solution is to make sure incr contains a significant value
before
> > going into the loop 570-578. Although I can't think of a good trick
> > yet to achieve this.
> >
> > What are your thoughts about this? How should we fix this?
> >
> > With best regards,
> >
> > Tjalling Hattink
> >
> > --
> > To unsubscribe email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with
> "unsubscribe" in the subject.
> > For help email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "help"
in
> the subject.
> > Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.
> >


--
To unsubscribe email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject.
For help email chrony-dev-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/