[CentOS] Strange NTP problem

Wed May 21 00:49:28 UTC 2008
Alfred von Campe <alfred at von-campe.com>

On May 20, 2008, at 16:56, Paul Heinlein wrote:

> A slew of 5 min/24 hrs should be in the range of fixable.

If the NTP daemon was doing its job :-).

> This is very suspect. Are there any SELinux or other log messages  
> suggesting that ntpd isn't able to write to its drift file? Your  
> local clock is definitely drifting, so a 0.000 value is bogus. It  
> may indicate that there's a disconnect between ntpd and the  
> filesystem.

I grep'ed for ntp in the /var/log/messages file and there were no  
other instances of ntp.  SELinux is disabled, and all my systems are  
built with Kickstart from the same template file.  I'm fairly  
confident that the other 29 systems are configured identically to  
each other.

> I'd be interested in the output of "ntpdc -c kerninfo"; on most  
> systems the 'pll frequency' value is a close match to the figure in  
> the drift file.

# ntpdc -c kerninfo
pll offset:           0 s
pll frequency:        0.000 ppm
maximum error:        0.02724 s
estimated error:      0 s
status:               0001  pll
pll time constant:    6
precision:            1e-06 s
frequency tolerance:  512 ppm

On a system where it's working, it looks like this:

# ntpdc -c kerninfo
pll offset:           0.011131 s
pll frequency:        81.440 ppm
maximum error:        0.241978 s
estimated error:      0.005287 s
status:               0001  pll
pll time constant:    4
precision:            1e-06 s
frequency tolerance:  512 ppm

I re-started the NTP daemon around 5:30 this afternoon, and the clock  
is already off by 45 seconds.  Here are the latest entries from /var/ 
log/messages:

May 20 17:25:25 hepdsw03 ntpd[4225]: ntpd 4.2.2p1 at 1.1570-o Sat Nov 10  
12:33:50 UTC 2007 (1)
May 20 17:25:25 hepdsw03 ntpd[4226]: precision = 1.000 usec
May 20 17:25:25 hepdsw03 ntpd[4226]: Listening on interface wildcard,  
0.0.0.0#123 Disabled
May 20 17:25:25 hepdsw03 ntpd[4226]: Listening on interface  
wildcard, ::#123 Disabled
May 20 17:25:25 hepdsw03 ntpd[4226]: Listening on interface lo, :: 
1#123 Enabled
May 20 17:25:25 hepdsw03 ntpd[4226]: Listening on interface eth0,  
fe80::210:c6ff:feab:dd92#123 Enabled
May 20 17:25:25 hepdsw03 ntpd[4226]: Listening on interface lo,  
127.0.0.1#123 Enabled
May 20 17:25:25 hepdsw03 ntpd[4226]: Listening on interface eth0,  
10.66.42.109#123 Enabled
May 20 17:25:25 hepdsw03 ntpd[4226]: kernel time sync status 0040
May 20 17:25:25 hepdsw03 ntpd[4226]: frequency initialized 0.000 PPM  
from /var/lib/ntp/drift
May 20 17:28:37 hepdsw03 ntpd[4226]: synchronized to LOCAL(0),  
stratum 10
May 20 17:28:37 hepdsw03 ntpd[4226]: kernel time sync disabled 0001
May 20 17:30:47 hepdsw03 ntpd[4226]: synchronized to 10.101.32.104,  
stratum 3
May 20 17:35:04 hepdsw03 ntpd[4226]: synchronized to LOCAL(0),  
stratum 10
May 20 17:37:13 hepdsw03 ntpd[4226]: synchronized to 10.101.32.104,  
stratum 3
May 20 17:38:18 hepdsw03 ntpd[4226]: synchronized to LOCAL(0),  
stratum 10

In comparison, here are the entries from a system where NTP is  
working as expected:

May 20 14:36:51 balboa01 ntpd[3374]: ntpd 4.2.2p1 at 1.1570-o Sat Nov 10  
12:33:50 UTC 2007 (1)
May 20 14:36:51 balboa01 ntpd[3375]: precision = 1.000 usec
May 20 14:36:51 balboa01 ntpd[3375]: Listening on interface wildcard,  
0.0.0.0#123 Disabled
May 20 14:36:51 balboa01 ntpd[3375]: Listening on interface  
wildcard, ::#123 Disabled
May 20 14:36:51 balboa01 ntpd[3375]: Listening on interface lo, :: 
1#123 Enabled
May 20 14:36:51 balboa01 ntpd[3375]: Listening on interface eth0,  
fe80::21a:6bff:fe46:33d1#123 Enabled
May 20 14:36:51 balboa01 ntpd[3375]: Listening on interface lo,  
127.0.0.1#123 Enabled
May 20 14:36:51 balboa01 ntpd[3375]: Listening on interface eth0,  
10.66.43.100#123 Enabled
May 20 14:36:51 balboa01 ntpd[3375]: kernel time sync status 0040
May 20 14:40:06 balboa01 ntpd[3375]: synchronized to LOCAL(0),  
stratum 10
May 20 14:41:10 balboa01 ntpd[3375]: synchronized to 10.101.32.104,  
stratum 3
May 20 15:00:26 balboa01 ntpd[3375]: time reset -1.533233 s
May 20 15:00:26 balboa01 ntpd[3375]: kernel time sync enabled 0001
May 20 15:03:45 balboa01 ntpd[3375]: synchronized to LOCAL(0),  
stratum 10
May 20 15:05:20 balboa01 ntpd[3375]: synchronized to 10.101.32.104,  
stratum 3

> You said the machines are identical. Could there be any variation  
> in the BIOS revision level or its settings? Sometimes ACPI stuff  
> can mess up ntp.

Yes, there may be some BIOS revision differences, but I'm pretty sure  
that at least one of the other 29 systems has an identical BIOS  
revision.

> Also -- the log messages you provide have no "step time server"  
> reference. Do you have a valid /etc/ntp/step-tickers file?

The files exists but is empty on all my systems.

Alfred