-----Original Message----- From: Todd Denniston Sent: Thursday, August 12, 2010 9:07 Jason Pyeron wrote, On 08/12/2010 08:01 AM:
-----Original Message----- From: Simon Billis Sent: Thursday, August 12, 2010 7:36
Jason Pyeron sent a missive on 2010-08-12:
We have a local time server and all of our machines are
pointed at it
for the time.
How can the clock drift by a day and a half?
[root@devserver21 ~]# date Fri Aug 13 14:43:29 EDT 2010 [root@devserver21 ~]# rdate -s 192.168.1.67 [root@devserver21 ~]# date Thu Aug 12 07:02:39 EDT 2010 [root@devserver21 ~]# cat /etc/ntp.conf | grep -v ^# | grep -v ^$ restrict default nomodify notrap noquery restrict 127.0.0.1 server 192.168.1.67 server 192.168.1.66 server 192.168.1.65 server 127.127.1.0 # local clock fudge 127.127.1.0 stratum 10 driftfile /var/lib/ntp/drift broadcastdelay 0.008 keys /etc/ntp/keys
Hi,
It is unlikely that the machine in question drifted
forward in time
if ntpd was running. Have a look at the logs /var/log/messages it should contain the ntpd log messages
[root@devserver21 ~]# grep ntpd /var/log/messages </snip> Jul 28 20:34:41 devserver21 ntpd[3475]: synchronized to
192.168.1.65, stratum
3 Jul 28 21:08:00 devserver21 ntpd[3475]: synchronized to LOCAL(0), stratum 10 Jul 28 21:08:00 devserver21 ntpd[3475]: frequency error -512 PPM exceeds tolerance 500 PPM Jul 28 21:08:11 devserver21 ntpd[3475]: synchronized to 192.168.1.66, stratum 3 Jul 28 21:24:58 devserver21 ntpd[3475]: synchronized to 192.168.1.65,
stratum 3 Jul 28
21:41:26 devserver21 ntpd[3475]: synchronized to
192.168.1.67, stratum
3 Jul 28 21:42:16 devserver21 ntpd[3475]: synchronized to LOCAL(0), stratum 10 Jul 28 21:42:16 devserver21 ntpd[3475]: frequency error -512 PPM exceeds tolerance 500 PPM Jul 28 21:42:34 devserver21 ntpd[3475]: frequency error -512 PPM exceeds tolerance 500
PPM Jul 28
21:43:37 devserver21 ntpd[3475]: frequency error -512 PPM exceeds tolerance 500 PPM
tolerance 500 PPM Jul 28 22:12:07 devserver21 ntpd[3475]: frequency error -512 PPM exceeds tolerance 500 PPM Jul 28 22:13:13 devserver21 ntpd[3475]: frequency error -512 PPM exceeds tolerance 500 PPM Jul 28 22:14:17 devserver21 ntpd[3475]: frequency error -512 PPM exceeds
tolerance 500
PPM Jul 28 22:15:11 devserver21 ntpd[3475]: synchronized to 192.168.1.66, stratum 3 Jul 28 22:31:41 devserver21 ntpd[3475]: synchronized to LOCAL(0), stratum 10 Jul 28 22:31:41 devserver21 ntpd[3475]: frequency error -512 PPM exceeds tolerance 500 PPM
Jul 29 15:14:01 devserver21 ntpd[3475]: synchronized to LOCAL(0), stratum 10 Jul 29 15:26:05 devserver21 ntpd[3475]: synchronized to 192.168.1.65, stratum 3 Jul 29 15:59:17 devserver21
ntpd[3475]: time
reset -1.599691 s Jul 29 16:03:31 devserver21 ntpd[3475]:
synchronized
to LOCAL(0), stratum 10 Jul 29 16:05:38 devserver21 ntpd[3475]: synchronized to 192.168.1.67, stratum 3 Jul 29 16:08:46 devserver21 ntpd[3475]: synchronized to 192.168.1.66, stratum 3 Jul 29 16:11:55 devserver21 ntpd[3475]: synchronized to 192.168.1.65, stratum 3
Jul 29 17:23:57 devserver21 ntpd[3475]: synchronized to
192.168.1.67,
stratum 3 Jul 29 17:24:59 devserver21 ntpd[3475]: synchronized to LOCAL(0), stratum 10 Jul 29 17:30:46 devserver21 ntpd[3475]: synchronized to 192.168.1.65, stratum 3 Jul 29 17:47:24 devserver21 ntpd[3475]: synchronized to LOCAL(0), stratum 10 Aug 12 22:48:29 devserver21 ntpd[3475]: sendto(192.168.1.66): Operation not
permitted
[root@devserver21 ~]# uptime 08:10:19 up 164 days, 9:56, 2 users, load average: 0.20, 0.54, 0.81 [root@devserver21 ~]#
Assumption: this is not from any kind of virtual machine.
Correct.
Assumption: Your local time server is NOT a GPS with an ovenized crystal or even a cell phone time source, i.e. NOT very stable.
Correct.
Assumption: the time servers that you are following (192.168.1.6[57]) are: a) each following the same timeserver(s), or at least have one in common.
192.168.1.6[567] are one machine. Time on that one is/has been good. Other machines in the enterprise follow it accurately.
b) peering with one another
n/a
c) following time servers that are reasonably stable.
Appears to be so.
Assumption: the time farm is on real, non busy (an old cisco router serving as the internet connection to 1000+ computers does not qualify as non busy), hardware and is configured to archive maxpoll 10 or higher.
Unknown, assuming the latency is neglibile. The important detail here is that all the machines in the lan have the same time. There is no unusual latency there.
one problem that you have is that your timeserver farm (192.168.1.6[57]) is occasionally loosing its servers, i.e. we see "synchronized to LOCAL(0)" occasionally, which should
That was on a ntp client, not the ntp server. Am I misunderstanting you?
not happen with a well configured time farm for hours to days, not minutes.
Agreed, see above.
the second problem is that a machine which is not intended to be a time server is configured with a local clock with a stratum better than 15.
I don't understand, I will have to read up more.
suggestion 1: 65 should have local clock at stratum 13, 66 and 67 should have local clock at stratum
They are presently one machine.
14 or 15, all other machines should not have a local clock or should not have one with a stratum better than 15. Yes I, after reading the ntp documentation, disagree with RedHat's default.
Ok.
net result should be that you don't get any local clock loops in the setup because you have a defined leader, but if even the defined leader is lost the other machines should do a stable drift.
suggestion 2: 65, 66 & 67 should ALL peer with one another for added stability in the time farm.
suggestion 3: client machines should 'prefer' one of your servers over the others.
suggestion 4: see if someone has been messing with the kernel ticks on the machine... run `tickadj` file:///usr/share/doc/ntp-4.2.2p1/tickadj.html
[root@devserver21 ~]# tickadj tick = 10000
I had one computer where I needed to tweak the default value up or down one (I don't remember) to have it be real stable, this should be a last resort.
-- Todd Denniston Crane Division, Naval Surface Warfare Center (NSWC Crane) Harnessing the Power of Technology for the Warfighter _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - - - Jason Pyeron PD Inc. http://www.pdinc.us - - Principal Consultant 10 West 24th Street #100 - - +1 (443) 269-1555 x333 Baltimore, Maryland 21218 - - - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- This message is copyright PD Inc, subject to license 20080407P00.