[CentOS] Strange NTP problem

Thu May 22 15:31:55 UTC 2008
Jason Clark <jason at jasonandjessi.com>

I had a similar problem on a different server that I fixed last night.
Evidently it had a BIOS level feature that tried to modify the CPU clock
rate, much like cpu-freq does within the kernel, and was doing so by
messing with the system clock impacting the RTC.   I was drifting all
over the place until I found and disabled that feature (foxcon board,
something like foxstep I believe is what it is called in BIOS). Not sure
if your lenovo boards have that feature, but i know that some ASUS
boards do.


Jason
www.cyborgworkshop.org


Paul Heinlein wrote:
> On Tue, 20 May 2008, Alfred von Campe wrote:
> 
>> I have 30 identical Lenovo desktop systems running CentOS 5.1.  On one
>> of those systems the clock is running slow (5+ minutes from yesterday
>> to this morning and another minute since this morning) despite the
>> fact that NTP is running on all of them and they all have the exact
>> same /etc/ntp.conf file (I compared the MD5 sums of that file on all
>> the systems).  Here is the output of "grep ntp /var/log messages" on
>> the system with the problem since I restarted the NTP daemon earlier
>> today:
> 
> A slew of 5 min/24 hrs should be in the range of fixable.
> 
>> May 20 11:35:38 hepdsw03 ntpd[31792]: frequency initialized 0.000 PPM
>> from /var/lib/ntp/drift
> 
> This is very suspect. Are there any SELinux or other log messages
> suggesting that ntpd isn't able to write to its drift file? Your local
> clock is definitely drifting, so a 0.000 value is bogus. It may indicate
> that there's a disconnect between ntpd and the filesystem.
> 
> I'd be interested in the output of "ntpdc -c kerninfo"; on most systems
> the 'pll frequency' value is a close match to the figure in the drift file.
> 
>> May 20 11:38:55 hepdsw03 ntpd[31792]: synchronized to LOCAL(0),
>> stratum 10
>> May 20 11:38:55 hepdsw03 ntpd[31792]: kernel time sync disabled 0001
>> May 20 11:39:59 hepdsw03 ntpd[31792]: synchronized to 10.101.32.104,
>> stratum 3
> 
> This is ungood. Sync-ing to local before your network time server means
> that your machine doesn't want to believe your server -- and you should
> see a "kernel time sync enabled" message once the machine has sync-ed
> with the time server.
> 
> You said the machines are identical. Could there be any variation in the
> BIOS revision level or its settings? Sometimes ACPI stuff can mess up ntp.
> 
> Also -- the log messages you provide have no "step time server"
> reference. Do you have a valid /etc/ntp/step-tickers file?
>