On May 20, 2008, at 16:56, Paul Heinlein wrote:
A slew of 5 min/24 hrs should be in the range of fixable.
If the NTP daemon was doing its job :-).
This is very suspect. Are there any SELinux or other log messages suggesting that ntpd isn't able to write to its drift file? Your local clock is definitely drifting, so a 0.000 value is bogus. It may indicate that there's a disconnect between ntpd and the filesystem.
I grep'ed for ntp in the /var/log/messages file and there were no other instances of ntp. SELinux is disabled, and all my systems are built with Kickstart from the same template file. I'm fairly confident that the other 29 systems are configured identically to each other.
I'd be interested in the output of "ntpdc -c kerninfo"; on most systems the 'pll frequency' value is a close match to the figure in the drift file.
# ntpdc -c kerninfo pll offset: 0 s pll frequency: 0.000 ppm maximum error: 0.02724 s estimated error: 0 s status: 0001 pll pll time constant: 6 precision: 1e-06 s frequency tolerance: 512 ppm
On a system where it's working, it looks like this:
# ntpdc -c kerninfo pll offset: 0.011131 s pll frequency: 81.440 ppm maximum error: 0.241978 s estimated error: 0.005287 s status: 0001 pll pll time constant: 4 precision: 1e-06 s frequency tolerance: 512 ppm
I re-started the NTP daemon around 5:30 this afternoon, and the clock is already off by 45 seconds. Here are the latest entries from /var/ log/messages:
May 20 17:25:25 hepdsw03 ntpd[4225]: ntpd 4.2.2p1@1.1570-o Sat Nov 10 12:33:50 UTC 2007 (1) May 20 17:25:25 hepdsw03 ntpd[4226]: precision = 1.000 usec May 20 17:25:25 hepdsw03 ntpd[4226]: Listening on interface wildcard, 0.0.0.0#123 Disabled May 20 17:25:25 hepdsw03 ntpd[4226]: Listening on interface wildcard, ::#123 Disabled May 20 17:25:25 hepdsw03 ntpd[4226]: Listening on interface lo, :: 1#123 Enabled May 20 17:25:25 hepdsw03 ntpd[4226]: Listening on interface eth0, fe80::210:c6ff:feab:dd92#123 Enabled May 20 17:25:25 hepdsw03 ntpd[4226]: Listening on interface lo, 127.0.0.1#123 Enabled May 20 17:25:25 hepdsw03 ntpd[4226]: Listening on interface eth0, 10.66.42.109#123 Enabled May 20 17:25:25 hepdsw03 ntpd[4226]: kernel time sync status 0040 May 20 17:25:25 hepdsw03 ntpd[4226]: frequency initialized 0.000 PPM from /var/lib/ntp/drift May 20 17:28:37 hepdsw03 ntpd[4226]: synchronized to LOCAL(0), stratum 10 May 20 17:28:37 hepdsw03 ntpd[4226]: kernel time sync disabled 0001 May 20 17:30:47 hepdsw03 ntpd[4226]: synchronized to 10.101.32.104, stratum 3 May 20 17:35:04 hepdsw03 ntpd[4226]: synchronized to LOCAL(0), stratum 10 May 20 17:37:13 hepdsw03 ntpd[4226]: synchronized to 10.101.32.104, stratum 3 May 20 17:38:18 hepdsw03 ntpd[4226]: synchronized to LOCAL(0), stratum 10
In comparison, here are the entries from a system where NTP is working as expected:
May 20 14:36:51 balboa01 ntpd[3374]: ntpd 4.2.2p1@1.1570-o Sat Nov 10 12:33:50 UTC 2007 (1) May 20 14:36:51 balboa01 ntpd[3375]: precision = 1.000 usec May 20 14:36:51 balboa01 ntpd[3375]: Listening on interface wildcard, 0.0.0.0#123 Disabled May 20 14:36:51 balboa01 ntpd[3375]: Listening on interface wildcard, ::#123 Disabled May 20 14:36:51 balboa01 ntpd[3375]: Listening on interface lo, :: 1#123 Enabled May 20 14:36:51 balboa01 ntpd[3375]: Listening on interface eth0, fe80::21a:6bff:fe46:33d1#123 Enabled May 20 14:36:51 balboa01 ntpd[3375]: Listening on interface lo, 127.0.0.1#123 Enabled May 20 14:36:51 balboa01 ntpd[3375]: Listening on interface eth0, 10.66.43.100#123 Enabled May 20 14:36:51 balboa01 ntpd[3375]: kernel time sync status 0040 May 20 14:40:06 balboa01 ntpd[3375]: synchronized to LOCAL(0), stratum 10 May 20 14:41:10 balboa01 ntpd[3375]: synchronized to 10.101.32.104, stratum 3 May 20 15:00:26 balboa01 ntpd[3375]: time reset -1.533233 s May 20 15:00:26 balboa01 ntpd[3375]: kernel time sync enabled 0001 May 20 15:03:45 balboa01 ntpd[3375]: synchronized to LOCAL(0), stratum 10 May 20 15:05:20 balboa01 ntpd[3375]: synchronized to 10.101.32.104, stratum 3
You said the machines are identical. Could there be any variation in the BIOS revision level or its settings? Sometimes ACPI stuff can mess up ntp.
Yes, there may be some BIOS revision differences, but I'm pretty sure that at least one of the other 29 systems has an identical BIOS revision.
Also -- the log messages you provide have no "step time server" reference. Do you have a valid /etc/ntp/step-tickers file?
The files exists but is empty on all my systems.
Alfred