AW: [chrony-users] chronyd: Can' Synchronize WHY ? |
[ Thread Index |
Date Index
| More chrony.tuxfamily.org/chrony-users Archives
]
- To: <chrony-users@xxxxxxxxxxxxxxxxxxxx>
- Subject: AW: [chrony-users] chronyd: Can' Synchronize WHY ?
- From: <thomas.schmid@xxxxxxxxx>
- Date: Sun, 11 Sep 2011 14:30:16 +0000
- Accept-language: de-CH, en-US
- Thread-index: Acxt0eoyEKrFlFghQqmmfEP3FHpkcv//9VIA//++9cCAAJExAP/+ddwQgAODtID//9bGoAAH0pYAAAywRCsAAc4LAABJ72OAAAu5os0=
- Thread-topic: [chrony-users] chronyd: Can' Synchronize WHY ?
Hi all,
>>> The only (relatively) special thing: We have a multi-threaded user space application running
>>> with a lot of threads (~70): About 6...10 of them are running using the (almost) realtime
>>> scheduler the standard Linux kernel provides (no RT-patches applied), and 4..6 are using
>>> very high priorities 80..99.
>>> Might this interfere with chrony ?
>> Perhaps, can you try it without that application?
>>
>> An experiment to rule out the possibility that the clock discipline is
>> broken would be to add the noselect option to all servers so chrony
>> doesn't touch the clock and see if the skew values get better.
>>
>Just a note, but I observed a fairly reasonable jitter running chrony on
>my lightly loaded machine with a fast, modern processor and chrony
>running with default "priority". I don't have the numbers to hand, but I
>think it was in the low ms level, with outliers apparently in the low
>10s of ms. This suggests that a fast machine with load mainly caused by
>disk io can apparently see scheduling delays of >10ms and this I think
>without any realtime processes on the machine
>I switched chrony to run at some "realtime" priority level and now the
>jitter (as seen by the other machine of a pair directly connected to it)
>drops to microsecs.
>What I'm thinking is that in your case chrony might be missing
>scheduling delays by significant margins if you have any reasonable load
>from those RT threads? I'm unsure how certain scheduling options are
>managed by various kernel versions, but my limited understanding would
>be that threads that are sched_fifo are not pre-empted, and with 10s of
>threads at that level you could potentially see significant scheduling
>delays of user space processes? I haven't read the code, so no idea how
>big are the critical regions in chrony, but it's easy to imagine that if
>you pre-empt chrony at the right point that you can confuse it's idea of
>round trip delays?
>It seems probable that there are some tools to help debug such a theory.
>Never used them, but I think latencytop might be such a tool?
>You might consider asking Miroslav about pushing chrony to a realtime
>priority higher than your application? The theory would be that chrony
>causes negligible scheduling delay to your other applications, but the
>higher priority prevents it being pre-empted? It *might* also be enough
>to set chrony as sched_fifo, but with a *lower* priority than your
>application - the theory then being that sched_fifo prevents chrony from
>being pre-empted once schedule, but being at a lower priority than your
>app keeps it from scheduling ahead of your application..?
>Curious problem though
The application is using about 6..8 real time priority threads, with 2 or 3 of them
running at very high priorities 90..99, the others in 80..90, and is really sensitive
if you mess with it at this level. So I do not dare to run something with priority 100.
What is bugging me is that we have run this system (in the lab) for years
now successfully, where the time was provided by ntpd using 2 reference servers in
the same LAN (stratum = 5). These reference server themselves were connected
to the company's time services, which in turn were connected up streams.
While I never watched ntpd as closely as I now watch chrony, I did not notice
any reference host switching at this intervals like I see now. But at that time the
reference servers were entirely reachable by LAN (not even routers in between),
where as now I have a mix of WAN- and LAN-connections to the various NTP
references. Additionally I see the same reference host switching now when using
ntpd, so 1) I don't think there is some malfunction suddenly introduced by the application,
and 2) chrony is working correctly.
I'll still run a test on a system without the application running, once the previous test
as previously described has finished tomorrow morning (48h of data).
Is it OK, if I post the measurement and statistics log in this forum ? Unfortunately
I have no publicly accessible web space I could put the data on.
Thomas
---
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "help" in the subject.
Trouble? Email listmaster@xxxxxxxxxxxxxxxxxxxx.