On 05/09/2011 06:53 PM, Brandon Ooi wrote:
On Mon, Apr 25, 2011 at 12:47 PM, Denniston, Todd A CIV NAVSURFWARCENDIV Crane <todd.denniston@navy.mil mailto:todd.denniston@navy.mil> wrote:
> -----Original Message----- > From: centos-bounces@centos.org <mailto:centos-bounces@centos.org> [mailto:centos-bounces@centos.org
mailto:centos-bounces@centos.org] On > Behalf Of Mailing List > Sent: Monday, April 25, 2011 13:57 > To: CentOS mailing list > Subject: Re: [CentOS] CentOs 5.6 and Time Sync > > > > List, > > I was not able to resolve my issue with the time on this machine. > I > went ahead and rolled the update back to 5.5 and disabled the update to > 5.6. > > What I would like to know is if CentOS 6 might be ok when it rolls > out, or am I just going to have to keep with 5.5 till EOL? > > Thanks to all with there help. >
1) I hope you are only talking about having rolled back to the last working for you kernel from 5.5, not the whole distribution. 2) If I was in your position and had time, my method would be[1] a) get the srpm for the last known working kernel (2.6.18-194.32
???) b) get the srpm for the first known not working kernel (2.6.18-238 ???) c) expand each of the above srpms into their own rpm build tree i.e., rpmdev-setuptree;rpm -i kern1; mv rpmbuild rpmbuild.kern1; rpmdev-setuptree;rpm -i kern2; mv rpmbuild rpmbuild.kern2 d) start looking at the differences in the patches applied in kern1 vs. those in kern2, i.e., read/diff the kernel.spec files see if there were any new ones that seemed likely to be causing the problem... RTFS if necessary to make better guesses. Rebuild kernel 2 with patches taken out/modified based on my investigations and test them and see if I guessed right. If no luck, think about opening an TUV bug with lots of the info you have sent here, they may be interested even if you don't have a subscription.
[1] Been there, done that: http://www.gossamer-threads.com/lists/drbd/users/9616
At first I figured this was misconfigured NTP but I actually see this happening on one of my machines as well. Nothing interesting about it in particular but I verified that rolling back to the previous kernel (2.6.18-194.32.1.el5) solves the problem entirely. This happens when NTP is enabled or disabled. I get the following error messages in dmesg which are possibly related.
time.c: can't update CMOS clock from 59 to 0 time.c: can't update CMOS clock from 59 to 0 time.c: can't update CMOS clock from 59 to 0 time.c: can't update CMOS clock from 59 to 0
The time drift is significantly higher than would be expected as normal. Because rolling back the kernel completely solves this issue, this must be a bug.
[root@nexus4 ~]# date; ntpdate -u pool.ntp.org http://pool.ntp.org Mon May 9 16:51:03 PDT 2011 9 May 16:50:21 ntpdate[22117]: step time server 207.182.243.123 offset -42.418572 sec
[root@nexus4 ~]# date; ntpdate -u pool.ntp.org http://pool.ntp.org Mon May 9 16:50:33 PDT 2011 9 May 16:50:35 ntpdate[22127]: step time server 207.182.243.123 offset -0.692146 sec
Yes, this is obviously a problem with the kernel interacting with the clock on some machines. IF we can figure out which ones and why, we can get upstream to fix it.
May I ask to try upstreams current kernel first http://people.redhat.com/jwilson/el5/ to make sure it's not already fixed there?
BTW: Those kernels have been very useful for me in the past and as this example shows may also be useful for others. The sad part is that the same doesn't apply for EL6 anymore because they don't make their dev kernels available anymore.
Simon