Re: [chrony-users] HW Timestamping fails with specific source

[ Thread Index | Date Index | More Archives ]

Nics are Intel 10GB FC, 82599ES on all sites.

I haven't though of that race condition, site which has problem is the most recent so more recent switches, cpu & mobos, so that would make sense,

However, in my case 98% of he requests are Deamon/Deamon, so most of them pasts all the tests (generally those below 60us) which is bad for time accurancy.

Looking at the logs i can also spots some crazy vales getting 2 secondes peer delay (and thoses are Deamon/Kernel ) .. mixed timestamp ? Will investigate more tommorow, thanks

2018-01-29 18:05 GMT+01:00 Miroslav Lichvar <mlichvar@xxxxxxxxxx>:
On Mon, Jan 29, 2018 at 05:17:48PM +0100, Thibaut BEYLER wrote:
> I recently investigated some server on a site that struggle to use HW
> timestamp and spend most of their time in deamon/deamon mode instead of
> hardware/kernel. The weird part is that it occurs only with source 1, the
> client kernels are stable with hw/kernel on the less acurate sources.

It's a race condition between receiving the TX timestamp of the client
request and receiving the response from the server. If the response is
so fast that it is received before the TX timestamp of the request,
the late timestamp will be ignored as there will not be a
corresponding request to which it could be applied.

What NIC do the clients have? I've seen this with an Intel card. It
happened only for a minority of requests and they were all dropped due
to failing the test C, so overall it worked well with HW timestamping.

I'm not sure if it should be treated as a driver/HW issue or if
applications should really be expected to get TX timestamps so late.
I asked about this on the Intel development list some time ago, but
didn't get a response. I think it could be addressed in chrony by
introducing a new timeout for timestamps, but I'd rather avoid the
extra complexity.

As a workaround you could try to add another switch between the server
and clients to increase the peer delay. You could also try to lower
the priority of the chronyd process to give it more time to get the
timestamp. Someone reported it happened only when chronyd was running
with a high priority.

Miroslav Lichvar

To unsubscribe email
with "unsubscribe" in the subject.
For help email
with "help" in the subject.
Trouble?  Email

Mail converted by MHonArc 2.6.19+