On Mon, Sep 2, 2013 at 12:40 PM, Ron E <ron at questavolta.com> wrote: > Dear List, > > We have noticed a variety of reproducible conditions working with sparse > files on multiple servers under load with CentOS 6.4. > > The short story is that processes that read / write sparse files with > large "holes" can generate an IO storm. Oddly, this only happens with holes > and not with the sections of the files that contain data. > > We have seen extremely high IO load for example copying a 40 or 80gb > sparse file that only has a few gigs of data in it. Attempts to lower the > io priority and cpu priority of these processes do not make any measurable > difference. (ionice, nice) This has been observed with processes such as: > > cp > rsync > sha1sum > > The server does have to be under some load to reproduce the necessary > conditions. The cases we have seen involve servers running 10-30 guests > under kvm. Load is in acceptable norms when the processes are run, such as > load avg 5-15 on a 24 core (12 core with HT enabled) server. We also verify > before starting such a process that the spindle with the file we're working > on is not being unduly hammered by another process. > > These servers have one hardware raid controller each (Dell H700 controller > with write cache enabled) and multiple raid arrays (separate sets of > physical spindles). Interestingly, the IO storm is not limited to the array > / spindles where the sparse file resides but affects all IO on that server. > > We have looked extensively and not found any account of a similar issue. > We have seen this on configurations that are 'plain vanilla' enough to > think that this is not something specific to our environment. > > Wondering if anyone else has seen this and if any suggestions on gathering > more data / troubleshooting. We wonder if we've found either a raid > controller driver issue, an OS issue or some other such thing. What seems > to point in this direction is that even with ionice -c3 which should > prevent the process from using IO unless the storage is idle, an io storm > which appears to saturate the entire raid bus on a given server can occur. > Did you ever figure anything out from this? I've noticed a similar sort of issue on some of our machines, so I was curious if you found the cause of the issue or any way to improve the situation. Thanks, Dave