[CentOS] IO causing major performance issues

Of course IO can swamp the file system. My point is that the kernel should
at least give enough time-slices to the other processes (like sshd) so
we can still log in.  It's not asking a lot from the kernel - to just log in via ssh really.

-- 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
antonio varni
[ technology ]

ESTALEA, L.P.
629 State Street #222
Santa Barbara, CA 93101
v 805.252.0115
f 805.899.2697
e avarni at estalea.com
w www.estalea.com

On Thu, 15 Nov 2007, Ross S. W. Walker wrote:

> redhat at mckerrs.net wrote:
> > Antonio Varni wrote:
> > > 
> > > Hello everyone.
> > > 
> > > I'm wondering what other people's experiences are WRT systems becoming
> > > unresponsive (unable to ssh in, etc) for brief periods of time when
> > > a large amount of IO is being performed.  It's really starting to
> > > cause a problem for us.  We're on Dell PowerEdge 1955 blades 
> > > - but this same
> > > issue has caused us problems on PE1950, PE1850, PE1750 servers.
> > > 
> > > We're running Centos 4.5 right now. I know Centos 5 includes 
> > > ionice, more
> > > io scheduler/elevator selections like deadlock/etc. Perhaps that would
> > > fix this issue.  We're running the latest PERC firmware.
> > > 
> > > The specific issue I'm referring to at this point is on a 
> > > system running
> > > mysql. All mysql data files are on a netapp filer but mysql's 
> > > tmp directory
> > > is on local disk.  Whenever a lot of temp tables are created (and thus
> > > written and deleted from local disk quickly) we can't even 
> > > log in to the
> > > machine - and our monitoring system gets all freaked out and we get
> > > lots of pages, etc... FYI this is two disks with hardware raid 1.
> > > 
> > > Is it just me? Or is this specific to Dell systems, or is this just
> > > the state of the Linux kernel these days? Is there some magical patch
> > > I can apply to make this issue go away :)
> > > 
> > > 
> > > Thanks in advance for any insight into this issue.
> > > 
> > > Antonio
> > 
> > I have noticed similar behaviour on all sort of linuxes (in 
> > particular, ssh into the box is really slow when it's doing 
> > IO) and wondered why, but never really thought about 
> > investigating any further.
> > 
> > Unfortunately, I do a lot of work with solaris and the funny 
> > thing is that I have *never* seen a solaris kernel exhibit 
> > this sort of behaviour. Even if it is installed on normal 
> > IDE/SATA disks. And, in fact, even if installed on the exact 
> > same hardware.
> > 
> > 
> > Now I'm curious.....especially given that I'm right in the 
> > middle of pushing to get rid of solaris in favour of RHEL.
> 
> It really depends what the system is doing, what services you are
> running and how you have it configured.
> 
> You had Solaris installed, what services was it running?
> 
> You had Linux installed, what services was it running?
> 
> Database temp tables and logs can generate an enormous amount of
> io which can swamp the file systems of any system.
> 
> I have seen it on Windows and Linux, so I don't see why Solaris
> would be any different.
> 
> You could always try a different scheduler to see if that helps,
> for instance if you are using 'cfq' try 'deadline'.
> 
> -Ross
> 
> ______________________________________________________________________
> This e-mail, and any attachments thereto, is intended only for use by
> the addressee(s) named herein and may contain legally privileged
> and/or confidential information. If you are not the intended recipient
> of this e-mail, you are hereby notified that any dissemination,
> distribution or copying of this e-mail, and any attachments thereto,
> is strictly prohibited. If you have received this e-mail in error,
> please immediately notify the sender and permanently delete the
> original and any copy or printout thereof.
> 
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>