Re: [chrony-users] chronyd: Can' Synchronize WHY ?

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


On Sun, 11 Sep 2011, Ed W wrote:

On 09/09/2011 22:35, Miroslav Lichvar wrote:
The only (relatively) special thing: We have a multi-threaded user space application running
with a lot of threads (~70): About 6...10 of them are running using the (almost) realtime
scheduler the standard Linux kernel provides (no RT-patches applied), and 4..6 are using
very high priorities 80..99.
Might this interfere with chrony ?
Perhaps, can you try it without that application?

An experiment to rule out the possibility that the clock discipline is
broken would be to add the noselect option to all servers so chrony
doesn't touch the clock and see if the skew values get better.


Just a note, but I observed a fairly reasonable jitter running chrony on
my lightly loaded machine with a fast, modern processor and chrony
running with default "priority". I don't have the numbers to hand, but I
think it was in the low ms level, with outliers apparently in the low
10s of ms.  This suggests that a fast machine with load mainly caused by
disk io can apparently see scheduling delays of >10ms and this I think
without any realtime processes on the machine

The problem was not with jitter in the offsets, but huge variance (thousands
of PPM) in the rate. That is definitely odd.


I switched chrony to run at some "realtime" priority level and now the
jitter (as seen by the other machine of a pair directly connected to it)
drops to microsecs.

What I'm thinking is that in your case chrony might be missing
scheduling delays by significant margins if you have any reasonable load
from those RT threads?  I'm unsure how certain scheduling options are
managed by various kernel versions, but my limited understanding would
be that threads that are sched_fifo are not pre-empted, and with 10s of
threads at that level you could potentially see significant scheduling
delays of user space processes?  I haven't read the code, so no idea how
big are the critical regions in chrony, but it's easy to imagine that if
you pre-empt chrony at the right point that you can confuse it's idea of
round trip delays?

It seems probable that there are some tools to help debug such a theory.
Never used them, but I think latencytop might be such a tool?

You might consider asking Miroslav about pushing chrony to a realtime
priority higher than your application? The theory would be that chrony
causes negligible scheduling delay to your other applications, but the
higher priority prevents it being pre-empted?  It *might* also be enough
to set chrony as sched_fifo, but with a *lower* priority than your
application - the theory then being that sched_fifo prevents chrony from
being pre-empted once schedule, but being at a lower priority than your
app keeps it from scheduling ahead of your application..?

Curious problem though

Ed W



---
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


--
William G. Unruh   |  Canadian Institute for|     Tel: +1(604)822-3273
Physics&Astronomy  |     Advanced Research  |     Fax: +1(604)822-5324
UBC, Vancouver,BC  |   Program in Cosmology |     unruh@xxxxxxxxxxxxxx
Canada V6T 1Z1     |      and Gravity       |  www.theory.physics.ubc.ca/

---
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "unsubscribe" in the subject. For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/