Strange performance issue on CentOS 6.7 server

List overview All Threads
Download

newer

older

Kernel Panic post...

devtoolset-4

Alfred von Campe

3 Feb 2016 3 Feb '16

8:30 p.m.

I’m running CentOS 6.7 on my build servers, and on one of the servers the builds are taking almost an order of magnitude longer than usual. There are no runaway processes and there is plenty of free memory. So I suspected that file I/O might be slow, and sure enough, that appears to be the case. I ran a simple dd test and compared the results to a “normal” build server (412 MB/s vs. 31.7 MB/s). Both are on similarly configured VGs on identical hardware. What could cause this performance degradation and are there any other tests I can run before I reboot the server to see if things improve? There is nothing in /var/log messages - are there other logs I should check?

Alfred

Show replies by date

Warren Young

3 Feb 3 Feb

9:13 p.m.

On Feb 3, 2016, at 1:30 PM, Alfred von Campe alfred@von-campe.com wrote:

...

I suspected that file I/O might be slow, and sure enough, that appears to be the case….What could cause this

A dying hard disk can do it. HDDs try to silently paper over I/O errors, but what they can’t hide is the time it takes to do this. If your HDD is constantly correcting errors at the oxide layer, it will be reeeeeallly sllllow.

You can try running SMART tests on it, though that’s not guaranteed to show the problem.

Got tested backups? :)

Alfred von Campe

9:26 p.m.

On Feb 3, 2016, at 16:13, Warren Young wrote:

...

A dying hard disk can do it. HDDs try to silently paper over I/O errors, but what they can’t hide is the time it takes to do this. If your HDD is constantly correcting errors at the oxide layer, it will be reeeeeallly sllllow.

You can try running SMART tests on it, though that’s not guaranteed to show the problem.

Well, it’s not “a” disk: it’s a HW RAID of about dozen (server grade) drives, with a VG/LV on top of that. Are there any log files I can check that test the underlying VG/LV health status?

Alfred

m.roth＠5-cent.us

9:31 p.m.

Alfred von Campe wrote:

...

On Feb 3, 2016, at 16:13, Warren Young wrote:

...
A dying hard disk can do it. HDDs try to silently paper over I/O errors, but what they can’t hide is the time it takes to do this. If your HDD is constantly correcting errors at the oxide layer, it will be reeeeeallly sllllow.

You can try running SMART tests on it, though that’s not guaranteed to show the problem.

Well, it’s not “a” disk: it’s a HW RAID of about dozen (server grade) drives, with a VG/LV on top of that. Are there any log files I can check that test the underlying VG/LV health status?

You don't mention what kind of h/w RAID. LSI-based controller, and HP-based ones, both have utilities to check out the drives (MegaRAID and hpacli, respectively); AC&NC JetStors have a web interface.

mark

Warren Young

10:10 p.m.

On Feb 3, 2016, at 2:26 PM, Alfred von Campe alfred@von-campe.com wrote:

...

On Feb 3, 2016, at 16:13, Warren Young wrote:

...
A dying hard disk can do it. HDDs try to silently paper over I/O errors, but what they can’t hide is the time it takes to do this. If your HDD is constantly correcting errors at the oxide layer, it will be reeeeeallly sllllow.

You can try running SMART tests on it, though that’s not guaranteed to show the problem.

Well, it’s not “a” disk: it’s a HW RAID of about dozen (server grade) drives

smartctl can see through several different types of RAID controller to the underlying physical disks via its -d option.

Alfred von Campe

10:23 p.m.

On Feb 3, 2016, at 17:10, Warren Young wrote:

...

smartctl can see through several different types of RAID controller to the underlying physical disks via its -d option.

This is what I have:

Vendor: IBM Product: ServeRAID M5110e Revision: 3.19 User Capacity: 1,494,996,746,240 bytes [1.49 TB] Logical block size: 512 bytes Logical Unit id: 0x60050760408e81b018be601809efd11c Serial number: 001cd1ef091860be18b0818e40600705 Device type: disk Local Time is: Wed Feb 3 17:13:34 2016 EST Device does not support SMART

Error Counter logging not supported Device does not support Self Test logging

I guess I am stuck since it says it doesn’t support SMART. Or is there some way to get some status from this “disk” to see if it’s really the root cause of my performance issues. I think I would have seen something in /var/log/messages if there was a critical issue.

Alfred

James Hogarth

4 Feb 4 Feb

12:30 a.m.

On 3 Feb 2016 22:24, "Alfred von Campe" alfred@von-campe.com wrote:

...

On Feb 3, 2016, at 17:10, Warren Young wrote:

...
smartctl can see through several different types of RAID controller to

the underlying physical disks via its -d option.

...

This is what I have:

# smartctl --all /dev/sda smartctl 5.43 2012-06-30 r3573 [i686-linux-2.6.32-573.12.1.el6.i686]

(local build)

...

Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor: IBM Product: ServeRAID M5110e Revision: 3.19 User Capacity: 1,494,996,746,240 bytes [1.49 TB] Logical block size: 512 bytes Logical Unit id: 0x60050760408e81b018be601809efd11c Serial number: 001cd1ef091860be18b0818e40600705 Device type: disk Local Time is: Wed Feb 3 17:13:34 2016 EST Device does not support SMART

Error Counter logging not supported Device does not support Self Test logging

I guess I am stuck since it says it doesn’t support SMART. Or is there

some way to get some status from this “disk” to see if it’s really the root cause of my performance issues. I think I would have seen something in /var/log/messages if there was a critical issue.

...

Severely degraded hardware RAID performance can often be caused by things like a failed cache battery.

There is usually some sort of tool to interrogate the device to check things like cache behaviour.

Warren Young

5 Feb 5 Feb

1:23 a.m.

On Feb 3, 2016, at 3:23 PM, Alfred von Campe alfred@von-campe.com wrote:

...

On Feb 3, 2016, at 17:10, Warren Young wrote:

...
smartctl can see through several different types of RAID controller to the underlying physical disks via its -d option.

This is what I have:

# smartctl --all /dev/sda smartctl 5.43 2012-06-30 r3573 [i686-linux-2.6.32-573.12.1.el6.i686] (local build) Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor: IBM Product: ServeRAID M5110e

A bit of Googling says that’s an LSI 2208 based card.

So, try smartctl -a -d megaraid,0

If that works, you should be able to walk through each disk by incrementing that trailing number. Then, you can add -t flags to do active tests.

3474

Age (days ago)

3476

Last active (days ago)

discuss@lists.centos.org

7 comments

4 participants

tags (0)

participants (4)

Alfred von Campe
James Hogarth
m.roth＠5-cent.us
Warren Young