On 09/11/14 09:58, Dave Johansen wrote: > On Mon, Sep 2, 2013 at 12:40 PM, Ron E <ron at questavolta.com> wrote: > >> Dear List, >> >> We have noticed a variety of reproducible conditions working with sparse >> files on multiple servers under load with CentOS 6.4. >> >> The short story is that processes that read / write sparse files with >> large "holes" can generate an IO storm. Oddly, this only happens with holes >> and not with the sections of the files that contain data. >> >> We have seen extremely high IO load for example copying a 40 or 80gb >> sparse file that only has a few gigs of data in it. Attempts to lower the >> io priority and cpu priority of these processes do not make any measurable >> difference. (ionice, nice) This has been observed with processes such as: >> >> cp >> rsync >> sha1sum >> >> The server does have to be under some load to reproduce the necessary >> conditions. The cases we have seen involve servers running 10-30 guests >> under kvm. Load is in acceptable norms when the processes are run, such as >> load avg 5-15 on a 24 core (12 core with HT enabled) server. We also verify >> before starting such a process that the spindle with the file we're working >> on is not being unduly hammered by another process. >> >> These servers have one hardware raid controller each (Dell H700 controller >> with write cache enabled) and multiple raid arrays (separate sets of >> physical spindles). Interestingly, the IO storm is not limited to the array >> / spindles where the sparse file resides but affects all IO on that server. >> >> We have looked extensively and not found any account of a similar issue. >> We have seen this on configurations that are 'plain vanilla' enough to >> think that this is not something specific to our environment. >> >> Wondering if anyone else has seen this and if any suggestions on gathering >> more data / troubleshooting. We wonder if we've found either a raid >> controller driver issue, an OS issue or some other such thing. What seems >> to point in this direction is that even with ionice -c3 which should >> prevent the process from using IO unless the storage is idle, an io storm >> which appears to saturate the entire raid bus on a given server can occur. >> > Did you ever figure anything out from this? I've noticed a similar sort of > issue on some of our machines, so I was curious if you found the cause of > the issue or any way to improve the situation. > > Thanks, > Dave > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos Are you sure the HDD is not too busy seeking around (investigate via iotop)? To confirm you may like to test this on a free disk (not under load, like an external USB disk).