[chrony-users] Accurately measuring clock drift

Hi list,

We'd like to achieve a sub-30 μs window of transmission accuracy across multiple devices for simulcast purposes. Ideally, sub-10 μs.

The system uses NTP and 1PPS, which works nearly all the time. We tested both NTP and chrony over a weekend, and the results were not what we expected. In the first plot, NTP+1PPS never drifted above 17 μs. In the second plot, chrony+1PPS drifted up to 281 μs:

Something is wrong with the test, the configuration, or most likely both.

The test data was generated by running the following program:

// SOF

#include <stdio.h>
#include <stdlib.h>
#include <sys/timex.h>
#include <unistd.h>

int main( void ) {
struct timex t = { 0 };

while( 1 ) {
const int clockState = adjtimex( &t );

printf( "%ld,%ld,%ld,%ld\n", t.esterror, t.maxerror, t.offset, t.jitter );
fflush( stdout );
sleep( 1 );
}

return 0;
}

// EOF

The configuration file for the chrony-side of the test follows:

# SOF

server 10.10.10..200 minpoll 0 maxpoll 0 maxdelay 0.5 iburst

# Location of ID/key pairs for NTP authentication.
keyfile /etc/chrony/chrony.keys

# Stores new gain/loss rate, for compensating the system clock upon restart.
driftfile /var/lib/chrony/chrony.drift

logdir /var/log/chrony

# Stop bad estimates from upsetting the machine clock.
maxupdateskew 100.0

# Enables kernel synchronisation (every 11 minutes) of the real-time clock.
rtcsync

# Step the system clock instead of slewing it if the adjustment is larger than
# one second, but only in the first three clock updates.
makestep 1 3

# Enable hardware timestamping of NTP packets for all interfaces.
hwtimestamp *

# Synchronize time using the rising edge of a 1-pulse-per-second clock.
refclock PPS /dev/pps1 refid PPS1 poll 0 precision 1e-9 prefer

# EOF

When running chronyc sources -v, we see:

MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
#* PPS1 0 0 377 1 -502ns[ -534ns] +/- 1000ns
^- 10.10.10.200 1 0 377 0 +33us[ +33us] +/- 1186us

The plots were generated using R:

ntp.data <- read.csv( 'ntp-drift.csv', header=TRUE );
ntp.data$id <- 1:nrow(ntp.data);
plot(x=ntp.data$id,ntp.data$est.err, xlab="ntp", ylab="estimated error (us)", type="l");

chrony.data <- read.csv( 'chrony-drift.csv', header=TRUE );
chrony.data$id <- 1:nrow(chrony.data);
plot(x=chrony.data$id,chrony.data$est.err, xlab="chrony", ylab="estimated error (us)", type="l");

I'm wondering, in no particular order:

Do the esterror values from calling adjtimex() yield an apples-to-apples comparison between chrony and NTP?
Do we need to combine PPS1 with the NTP server?

If so, how?

What else would we need to do to achieve sub-30 μs clock drift (or sub-10 μs)?
How can we verify that the clock isn't drifting more than 30 μs, programatically (i..e., what API calls return the recent clock drift adjustment value)?
What API call returns the most recent clock drift adjustment value in nanoseconds?

Any help is much appreciated.