Re: [chrony-users] makestep in Chrony |
[ Thread Index |
Date Index
| More chrony.tuxfamily.org/chrony-users Archives
]
On Tue, 15 May 2018, Hei Chan wrote:
Hi Bill,
I think you are indeed confused. I want accuracy in 100s of ns range. But again I
want no jitter/extra latency in my application.
That is really tough. And you are operating with your hands tied behind your
back.
In all my measurement from point A to point B, the time span is less than 15 micro
99.9999% of the time (0.0001% for the undesired jitter). And the measurement is taken
probably 1.5 billion times (or more a day) in multiple cores (~10?). As you can see
timestamping happens very frequent in my system. Hence, that's why I have a weird
thought of using rdtsc-clock_gettime() map.
Sure. The designers of the Linux clock had the same idea.
I have to admit that I don't know how to use the chrony/ntp's parameters very well.
What parameters would you recommend with a NTP source that is one hop a way within the
same data center?
And how is that ntp source disciplined? How do you know that the time
delivered by that source has any accuracy whatsoever. And added to that, there
are the transmission problems. The hubs and routers between your machine and
that ntp source introduce jitter and delays. Contention for the ethernet
introduces jitter. The interrupt handling in your computer introduces jitter.
The abysmally slow network (even gigabit cable takes microseconds to send a
packet down the line, and then there is teh behaviour of the ethernet cards
which will amass data and only send it when enough has accumulated and it
feels like sending something. If you want accurate times you HAVE to have
something like gps/pps and to get tens of nanosecond precision, you need to have a
pretty sophisticated one.
So what would you suggest me to use to synchronize in a datacenter that PTP isn't
available and GPS clock isn't allowed?
Here is one of the worl'd foremost watches. Now I want to repair it, but you
must wear boxing gloved while doing so, and you are not allowed to remove them
for any reason.
And indeed I have thought about a better solution for quiet some time because of the
conditions above and temperature effect on TSC. But I can't think of a way to measure
from A to B without jitter and latency, and at the same time, I would like to know the
approximate epoch time of each "timestamping". (again no jitter/latency is more
approximate? century? year? day, second, millisecond, microsecond nanosecond?
important than accuracy of the epoch time.).
But make sure you never remove those gloves.
If you have a good suggestion, i am all ears.
And at a budget of $50? How much are you willing to spend?
Thanks!
On Tuesday, May 15, 2018, 2:58:52 PM GMT+8, Bill Unruh <unruh@xxxxxxxxxxxxxx> wrote:
On Tue, 15 May 2018, Hei Chan wrote:
> If I remember correctly that there was a post explaining why it wasn't a bug, the
post
> mentioned the value was written to a shared memory (or some sort), and the writer and
> reader aren't protected by a lock for performance reason, and so it needs to spin
(i.e
> while loop) to get the value out as soon as the writer finishes.
>
> I don't have an exact percentage of occurrence nor the exact delay. I vaguely
remember
> it was like 200 nano or more.
I must say I am confused. You are wanting accuracy in the 10s of ns range, but you
are using pool servers to set you clock, which will give you accuracy in the
hundreds of usec range (on a good day). Or even a local server, which will
give you something like 10s of usec accuracy. There is a disconnect here.
If you really want ns accuracy you will have to use a refclock directly
connected to the machine. Even GPS has problems as it is only after the fact
that you can figure out the sawtooth time error on a really good gps timing
receiver and compensate for it.
Never mind the temperature changes which make the tsc wander away from its
rate. It is really unclear to me what you are trying to do, and why?
>
> Tho, the comparison between the latency of rdtsc and the latency of clock_gettime()
> (~20 nano vs ~50 nano) is widely available online.
>
> As I mentioned that jitter/latency is more important than accuracy in my case, so I
> comprised accuracy a bit (with complexity).
>
>
> On Tuesday, May 15, 2018, 1:16:23 PM GMT+8, Bill Unruh <unruh@xxxxxxxxxxxxxx> wrote:
>
>
>
> On Tue, 15 May 2018, Hei Chan wrote:
>
> > Hi Bill,
> >
> > Here is the source:
>>https://elixir.bootlin.com/linux/v4.9/source/arch/x86/entry/vdso/vclock_gettime.c#L18
3
>
> >
> >
> > As you can see, clock_gettime() is in a while loop because sometimes, it might
> fail...
>
> Hm, yes. How much of a time delay do you get occassionally due to the while
> loop?
>
> Again that failure sounds like a bug.
>
>
> >
> > On Tuesday, May 15, 2018, 11:26:12 AM GMT+8, Bill Unruh <unruh@xxxxxxxxxxxxxx>
wrote:
> >
> >
> > On Tue, 15 May 2018, Hei Chan wrote:
> >
> > > Thanks for your reply.
> > >
> > > See my comment inline.
> > >
> > > On Friday, May 11, 2018, 4:26:14 PM GMT+8, Miroslav Lichvar <mlichvar@xxxxxxxxxx>
> > > wrote:
> > >
> > >
> > > On Fri, May 11, 2018 at 12:30:30AM +0000, Hei Chan wrote:
> > > > Hi Bill,
> > > > Sorry that I wasn't clear.
> > > > What I tried to do is to call clock_gettime() and rdtsc(p) as soon as chrony
> > finishes
> > > synch so that I can get the best estimate when I try to derive time from
> (invariant)
> > > tsc.
> > >
> > > Ok, so the assumption here is that once the system clock is
> > > "synchronized" by chronyd there will be a linear function between the
> > > tsc and system time? And the goal is to have a clock that can be read
> > > in constant time and it doesn't have to be very accurate, but still
> > > track the real time?
> > >
> > > Yes to both :)
> > >
> > > I'm not sure if that's possible. The tsc is the direct source for the
> > > CLOCK_MONOTONIC_RAW clock. Its frequency doesn't change with chronyd's
> > > adjustments, i.e. it's sensitive to temperature changes etc. The
> > > constants of the linear function would have to be periodically updated
> > > and then you would need to deal with locking, which would increase the
> > > maximum latency in the reading of the clock.
> > >
> > > Here is the design I am thinking.
> > >
> > > I don't have chronyd run in backgroud, and periodically (through cronjob) to
issue
> > the
> >
> > That is a terrible way of usign chrony. One of the key features of both chrony
> > and ntpd is that it disciplines not only the offset but also the the rate of
> > the clock. And the rate can only be determine over a (lengthy ) time period.
> > Why would you run it like this?
> >
> > > command chronyd -q 'pool [some NTP server/switch which is 1 switch away] iburst',
> > then
> > > as soon as it returns (the clock is synchronized right?), then I do something
like:
> >
> > No. See above.
> >
> > > s = cpuid + rdtsc
> > > clock_getime(REALTIME_CLOCK, &t)
> > > e = rdtscp + cpuid
> >
> > >
> > > Then, log it.
> > >
> > > So after 24 hours, I have a map for rdtsc<->absolute epoch time in nano.
> >
> > You have a very sophisticated program whose whole purpose is to continuously
> > set the translation between the tsc and the UTC. And you throw it all away and
> > use it in the way that Unix time was disciplined 40 years ago.
> >
> >
> > >
> > > Then, I can use the map to estimate the TSC frequency every 2 t's with the
> assumption
> > > that t is correct and TSC will change between two t's.
> >
> >
> > >
> > > Then, for everything I track with rdtsc, I can estimate the absolute epoch time
in
> > > nano.
> > >
> > > You might question why I don't just have chronyd running in background and call
> > > clock_gettime(CLOCK_REATIME, &t) for all the stamping I do with rdtsc. The main
> > issue
> > > is that clock_gettime(CLOCK_REALTIME) is great 99% of the time but sometimes, it
> just
> > > fails internally and loops and then take a long time to return.
> >
> > No idea what this is all about. I have never seen this. If it truely does
> > this, that is bug, and needs to be reported.
> >
> >
> > >
> > > Any issue you see?
> > >
> > > P.S. calling chronyd and creating the map file will be done by one dedicated
core
> at
> > > C0 (i.e. off OS scheduler to improve accuracy)
> > >
> > > > Ideally, I have a C application that calls chrony's API (if there is one)
similar
> > to
> > > "chronyd -q" to block till it finishes or gets a callback.
> > > > Any suggestion?
> > >
> > > There is no C API for chrony (yet). Instead, you could use adjtimex()
> > > and check the frequency and maxerror fields. The maxerror value
> > > increases slowly and drops only when chronyd updates the clock. When
> > > it drops below a threshold and the frequency didn't change
> > > significantly, the system clock could be considered to be
> > > synchronized.
> > >
> > > --
> > > Miroslav Lichvar
> > >
> > >
> > >
> >
> >
>
>