[CentOS] IO causing major performance issues

Thu Nov 15 22:39:06 UTC 2007
redhat at mckerrs.net <redhat at mckerrs.net>

----- Original Message ----- 
From: "Antonio Varni" <avarni at estalea.com> 
To: centos at centos.org 
Sent: Friday, November 16, 2007 9:06:52 AM (GMT+1000) Australia/Brisbane 
Subject: [CentOS] IO causing major performance issues 

Hello everyone. 

I'm wondering what other people's experiences are WRT systems becoming 
unresponsive (unable to ssh in, etc) for brief periods of time when 
a large amount of IO is being performed. It's really starting to 
cause a problem for us. We're on Dell PowerEdge 1955 blades - but this same 
issue has caused us problems on PE1950, PE1850, PE1750 servers. 

We're running Centos 4.5 right now. I know Centos 5 includes ionice, more 
io scheduler/elevator selections like deadlock/etc. Perhaps that would 
fix this issue. We're running the latest PERC firmware. 

The specific issue I'm referring to at this point is on a system running 
mysql. All mysql data files are on a netapp filer but mysql's tmp directory 
is on local disk. Whenever a lot of temp tables are created (and thus 
written and deleted from local disk quickly) we can't even log in to the 
machine - and our monitoring system gets all freaked out and we get 
lots of pages, etc... FYI this is two disks with hardware raid 1. 

Is it just me? Or is this specific to Dell systems, or is this just 
the state of the Linux kernel these days? Is there some magical patch 
I can apply to make this issue go away :) 


Thanks in advance for any insight into this issue. 

Antonio 



-- 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
antonio varni 
[ technology ] 

ESTALEA, L.P. 
629 State Street #222 
Santa Barbara, CA 93101 
v 805.252.0115 
f 805.899.2697 
e avarni at estalea.com 
w www.estalea.com 
_______________________________________________ 
CentOS mailing list 
CentOS at centos.org 
http://lists.centos.org/mailman/listinfo/centos 

-- 
This message has been scanned for viruses and 
dangerous content by MailScanner, and is 
believed to be clean. 




I have noticed similar behaviour on all sort of linuxes (in particular, ssh into the box is really slow when it's doing IO) and wondered why, but never really thought about investigating any further. 

Unfortunately, I do a lot of work with solaris and the funny thing is that I have *never* seen a solaris kernel exhibit this sort of behaviour. Even if it is installed on normal IDE/SATA disks. And, in fact, even if installed on the exact same hardware. 


Now I'm curious.....especially given that I'm right in the middle of pushing to get rid of solaris in favour of RHEL. 











-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/centos/attachments/20071116/c2c2a24d/attachment-0005.html>