[CentOS] bizarre system slowness
Cal Webster
cwebster at ec.rr.com
Wed Apr 13 20:34:54 UTC 2011
On Wed, 2011-04-13 at 13:06 -0700, Florin Andrei wrote:
> Running v5 64bit on a Dell 1950.
>
> A cluster of 3 DB machines, identical hardware. One of them suddenly
> became slower 2 weeks ago.
>
> tar -zxf with a large file on this machine takes 1.5 minutes, but takes
> only 10 seconds on any of its siblings. CPU usage seems high while
> untarring, with lots of user and sys cycles being used, but almost no
> wait cycles. It doesn't matter whether I untar on a local disk, or on a
> fiber channel SAN volume, it's slow anyway.
>
> scp a file over the network is slow too: 6 MB/s to this machine, 70 MB/s
> to its siblings.
>
> However, this is just as fast on all systems, including the "sick" one:
>
> # time dd if=/dev/zero of=/dev/null bs=1M count=100000
> 100000+0 records in
> 100000+0 records out
> 104857600000 bytes (105 GB) copied, 2.59213 seconds, 40.5 GB/s
>
> real 0m2.600s
> user 0m0.025s
> sys 0m2.550s
>
> /proc/cpuinfo looks fine. Nothing suspect in dmesg.
>
> Reboot doesn't fix it. Power off / power on doesn't fix it. Single mode
> is slow too, and I tried a couple different kernels.
>
> Dell's online diagnostics program could find nothing wrong with it.
>
> /var/log/messages was full of "ntpd[7313]: frequency error -1707 PPM
> exceeds tolerance 500 PPM" messages. There was a lot of messages about
> "the system limit for the maximum number of semaphore sets has been
> exceeded"; there was indeed a lot of leftover semaphores created by NRPE
> (owned by the nagios user); I deleted them but nothing has changed, so
> they were a symptom, not the cause.
Are the system times different between the siblings?
Are all 3 siblings running ntpd and using the same time source
(server(s))?
Do the symptoms change with ntpd stopped/running?
Are the frequency offsets the same on each sibling?
Since your log messages appear to be ntp related, you might try
resetting your frequency offset and drift values. Having a -1707 PPM
offset could cause many issues like you describe.
service ntpd stop
ntptime -f 0
echo "0" > /var/lib/ntp/drift
service ntpd start
> I'm still kind of hoping it's a software issue, but chances are slim.
> OTOH, I can't imagine any hardware problem that would exhibit these
> symptoms.
>
> Any idea what to test?
>
More information about the CentOS
mailing list