Re: [chrony-users] RTC Trimming Issues

[ Thread Index | Date Index | More chrony.tuxfamily.org/chrony-users Archives ]


Bill, thanks for responding.

I believe my response to Ed cover all you've asked for so I won't repeat that here.  As I mentioned in that response, I'm not sure what to make of the situation.

--
John Florian






From:        Bill Unruh <unruh@xxxxxxxxxxxxxx>
To:        chrony-users@xxxxxxxxxxxxxxxxxxxx
Date:        10/29/2012 18:57
Subject:        Re: [chrony-users] RTC Trimming Issues




Try using hwclock to set the rtc to some reasonable value (-systohc)
and see if that works. (run hwclock after that). If it does work, it would
indicate that hwclock's way of setting the rtc works while chonry's does not
and we could try to figure out why.
If it also does not work, then we would at least have consistancy, and the
suspicion that there is something wrong with the hwclock so it does not allow
itself to be set.


On Mon, 29 Oct 2012, John.Florian@xxxxxxxx wrote:

> Hello all,
>
> I administer several hundred embedded systems that are running a custom
> live spin of Fedora and consequently
> chrony-1.26-5.20110831gitb088b7.fc16.i686 and all share a common
> configuration enforced by puppet.  Out of all of these, a little over a
> hundred run atop PC/104 hardware made by Advantech, model PCM-3353 and out
> of all of those I have a dozen or so that just cannot seem to get their
> hardware clocks set.  When we deployed these we did nothing to tune the
> BIOS, not even setting the date/time since I knew our NTP processes should
> be able to do this better.  Originally we were running ntpd, but switched
> to chronyd to follow Fedora's defaults; our RTC setting issues have
> persisted all this time.  I've checked the on-board batteries and found
> nothing unusual, checked BIOS versions and found all to be the same.
>
> The problematic dozen devices power up with their clock set to December
> something 2003, which is apparently the epoch in use by this BIOS.  Most
> of the PCM-3353 boards boot up with the hardware clock set correctly (or
> as close as can be reasonably expected).  I'm trying to determine why the
> dozen systems cannot seem to do this correctly despite all things being
> seemingly equal.  Looking at one particular misfit, I see:
>
> #  grep chronyd /var/log/messages
> Dec 31 19:02:20 aos-000bab25fdfa chronyd[1434]: chronyd version
> 1.26-20110831gitb088b7 starting
> Dec 31 19:02:20 aos-000bab25fdfa chronyd[1434]: Linux kernel major=3
> minor=3 patch=0
> Dec 31 19:02:20 aos-000bab25fdfa chronyd[1434]: hz=100 shift_hz=7
> freq_scale=1.00000000 nominal_tick=10000 slew_delta_tick=833
> max_tick_bias=1000
> Dec 31 19:02:20 aos-000bab25fdfa chronyd[1434]: Frequency -243.568 +-
> 0.595 ppm read from /mnt/live/var/lib/chrony/drift
> Dec 31 19:02:29 aos-000bab25fdfa chronyd[1434]: Selected source
> 10.39.192.3
> Dec 31 19:02:29 aos-000bab25fdfa chronyd[1434]: System clock wrong by
> 310142834.657615 seconds, adjustment started
> Oct 29 10:49:43 aos-000bab25fdfa chronyd[1434]: System clock was stepped
> by 310142834.658 seconds
> Oct 29 10:51:45 aos-000bab25fdfa chronyd[1434]: Selected source 10.1.0.97
> Oct 29 10:54:06 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.577 seconds
> Oct 29 10:56:21 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.578 seconds
> Oct 29 11:09:05 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.579 seconds
> Oct 29 11:24:20 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.580 seconds
> Oct 29 11:39:38 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.581 seconds
> Oct 29 11:54:14 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.583 seconds
> Oct 29 12:09:21 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.583 seconds
> Oct 29 12:24:21 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.584 seconds
> Oct 29 12:39:34 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.586 seconds
> Oct 29 12:54:32 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.587 seconds
> Oct 29 13:09:27 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.588 seconds
> Oct 29 13:24:28 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.588 seconds
> Oct 29 13:39:34 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.589 seconds
> Oct 29 13:54:33 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.589 seconds
> Oct 29 14:09:32 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.589 seconds
> Oct 29 14:24:27 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.590 seconds
> Oct 29 14:39:35 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.591 seconds
> Oct 29 14:54:33 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.591 seconds
> Oct 29 15:09:31 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.591 seconds
> Oct 29 15:24:33 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.591 seconds
> Oct 29 15:39:37 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.591 seconds
> Oct 29 15:54:27 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.591 seconds
> Oct 29 16:09:32 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.591 seconds
> Oct 29 16:24:33 aos-000bab25fdfa chronyd[1434]: Trimming RTC, error =
> -310142834.591 seconds
>
>
> What strikes me as most bizarre is that chronyd sees the error and
> allegedly corrects the difference.  If that's true, why would it see the
> same correction being needed a few minutes later?  I have seen some of
> these do as above ("Trimming RTC, error = <BIG NUMBER> seconds") repeated
> for a few dozen times suddenly to be followed by "Trimming RTC, error =
> <SMALL NUMBER> seconds" where SMALL is right around one second.  Most, if
> not all, of the non-misfit PCM-3353 boards show "Trimming RTC, error =
> <SMALL NUMBER> seconds" and at one time I had assumed that just meant the
> RTC had drifted by a second again and was once again due for a trimming.
> I'm now inclined to believe that maybe these are struggling to work
> correctly too, but they just need a small adjustment now, rather than a
> big one.  For example, this one:
>
> #  grep chrony /var/log/messages
> Oct 18 10:16:28 aos-000bab25e0f8 chronyd[1414]: chronyd version
> 1.26-20110831gitb088b7 starting
> Oct 18 10:16:28 aos-000bab25e0f8 chronyd[1414]: Linux kernel major=3
> minor=3 patch=0
> Oct 18 10:16:28 aos-000bab25e0f8 chronyd[1414]: hz=100 shift_hz=7
> freq_scale=1.00000000 nominal_tick=10000 slew_delta_tick=833
> max_tick_bias=1000
> Oct 18 10:16:28 aos-000bab25e0f8 chronyd[1414]: Frequency -7.062 +- 0.093
> ppm read from /mnt/live/var/lib/chrony/drift
> Oct 18 10:16:36 aos-000bab25e0f8 chronyd[1414]: Selected source
> 10.11.192.3
> Oct 18 10:16:36 aos-000bab25e0f8 chronyd[1414]: System clock wrong by
> -0.734176 seconds, adjustment started
> Oct 18 09:18:38 aos-000bab25e0f8 chronyd[1414]: Selected source 10.1.0.97
> Oct 18 09:22:40 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.171 seconds
> Oct 19 05:52:31 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.006 seconds
> Oct 19 20:52:38 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.005 seconds
> Oct 20 11:52:31 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.011 seconds
> Oct 21 04:37:37 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.001 seconds
> Oct 21 15:22:29 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.014 seconds
> Oct 22 01:07:36 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.017 seconds
> Oct 22 22:37:36 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.016 seconds
> Oct 23 08:22:30 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.006 seconds
> Oct 23 17:37:40 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.010 seconds
> Oct 24 11:37:37 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.003 seconds
> Oct 25 06:52:31 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.008 seconds
> Oct 25 14:37:37 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.009 seconds
> Oct 26 07:07:37 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.011 seconds
> Oct 27 00:37:35 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.009 seconds
> Oct 27 20:22:30 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.011 seconds
> Oct 28 03:52:31 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.009 seconds
> Oct 28 13:22:31 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.012 seconds
> Oct 29 02:37:37 aos-000bab25e0f8 chronyd[1414]: Trimming RTC, error =
> 1.015 seconds
>
> We've been tracking the misfit boards a little more closely as of late
> (the problem has been ongoing for a few years) and it seems to be the same
> dozen which naturally makes me suspect of that hardware, but as I've said
> the batteries check out fine and everything else to our eyes is identical
> and these boards work fine otherwise.  The problem spans several Fedora
> releases and I can say for certain we saw the problem with
> chrony-1.24-4.20100428git73d775.fc13.i686 also.  The problem was also
> present with ntpd, although I don't recall if ntpd tries to fix the RTC as
> does chronyd.  I think our config is sane because most devices have no
> such problems.
>
> I've also tried tackling this with a cron job that has this at its core:
>
> adjust_rtc() {
>    chronyc <<EOF
> password $(awk '{print $2}' /etc/chrony.keys)
> trimrtc
> EOF
>    echo 'hardware RTC adjusted to system clock'
> }
>
> This function is only called if the system clock is synchronized, which I
> determine by "chronyc -n sources" showing an asterisk before a
> time-servers IP address.
>
> Any ideas how to get these RTCs to do the right thing and join us in this
> decade?
>
> --
> John Florian
>
>

--
William G. Unruh   |  Canadian Institute for|     Tel: +1(604)822-3273
Physics&Astronomy  |     Advanced Research  |     Fax: +1(604)822-5324
UBC, Vancouver,BC  |   Program in Cosmology |     unruh@xxxxxxxxxxxxxx
Canada V6T 1Z1     |      and Gravity       |  
www.theory.physics.ubc.ca/

--
To unsubscribe email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "unsubscribe" in the subject.
For help email chrony-users-request@xxxxxxxxxxxxxxxxxxxx
with "help" in the subject.
Trouble?  Email listmaster@xxxxxxxxxxxxxxxxxxxx.




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/