Of course IO can swamp the file system. My point is that the kernel should at least give enough time-slices to the other processes (like sshd) so we can still log in. It's not asking a lot from the kernel - to just log in via ssh really. -- _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ antonio varni [ technology ] ESTALEA, L.P. 629 State Street #222 Santa Barbara, CA 93101 v 805.252.0115 f 805.899.2697 e avarni at estalea.com w www.estalea.com On Thu, 15 Nov 2007, Ross S. W. Walker wrote: > redhat at mckerrs.net wrote: > > Antonio Varni wrote: > > > > > > Hello everyone. > > > > > > I'm wondering what other people's experiences are WRT systems becoming > > > unresponsive (unable to ssh in, etc) for brief periods of time when > > > a large amount of IO is being performed. It's really starting to > > > cause a problem for us. We're on Dell PowerEdge 1955 blades > > > - but this same > > > issue has caused us problems on PE1950, PE1850, PE1750 servers. > > > > > > We're running Centos 4.5 right now. I know Centos 5 includes > > > ionice, more > > > io scheduler/elevator selections like deadlock/etc. Perhaps that would > > > fix this issue. We're running the latest PERC firmware. > > > > > > The specific issue I'm referring to at this point is on a > > > system running > > > mysql. All mysql data files are on a netapp filer but mysql's > > > tmp directory > > > is on local disk. Whenever a lot of temp tables are created (and thus > > > written and deleted from local disk quickly) we can't even > > > log in to the > > > machine - and our monitoring system gets all freaked out and we get > > > lots of pages, etc... FYI this is two disks with hardware raid 1. > > > > > > Is it just me? Or is this specific to Dell systems, or is this just > > > the state of the Linux kernel these days? Is there some magical patch > > > I can apply to make this issue go away :) > > > > > > > > > Thanks in advance for any insight into this issue. > > > > > > Antonio > > > > I have noticed similar behaviour on all sort of linuxes (in > > particular, ssh into the box is really slow when it's doing > > IO) and wondered why, but never really thought about > > investigating any further. > > > > Unfortunately, I do a lot of work with solaris and the funny > > thing is that I have *never* seen a solaris kernel exhibit > > this sort of behaviour. Even if it is installed on normal > > IDE/SATA disks. And, in fact, even if installed on the exact > > same hardware. > > > > > > Now I'm curious.....especially given that I'm right in the > > middle of pushing to get rid of solaris in favour of RHEL. > > It really depends what the system is doing, what services you are > running and how you have it configured. > > You had Solaris installed, what services was it running? > > You had Linux installed, what services was it running? > > Database temp tables and logs can generate an enormous amount of > io which can swamp the file systems of any system. > > I have seen it on Windows and Linux, so I don't see why Solaris > would be any different. > > You could always try a different scheduler to see if that helps, > for instance if you are using 'cfq' try 'deadline'. > > -Ross > > ______________________________________________________________________ > This e-mail, and any attachments thereto, is intended only for use by > the addressee(s) named herein and may contain legally privileged > and/or confidential information. If you are not the intended recipient > of this e-mail, you are hereby notified that any dissemination, > distribution or copying of this e-mail, and any attachments thereto, > is strictly prohibited. If you have received this e-mail in error, > please immediately notify the sender and permanently delete the > original and any copy or printout thereof. > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos >