[CentOS] heavy IO load when working with sparse files (centos 6.4)

Thu Sep 11 04:48:46 UTC 2014
dE <de.techno at gmail.com>

On 09/11/14 09:58, Dave Johansen wrote:
> On Mon, Sep 2, 2013 at 12:40 PM, Ron E <ron at questavolta.com> wrote:
>
>> Dear List,
>>
>> We have noticed a variety of reproducible conditions working with sparse
>> files on multiple servers under load with CentOS 6.4.
>>
>> The short story is that processes that read / write sparse files with
>> large "holes" can generate an IO storm. Oddly, this only happens with holes
>> and not with the sections of the files that contain data.
>>
>> We have seen extremely high IO load for example copying a 40 or 80gb
>> sparse file that only has a few gigs of data in it. Attempts to lower the
>> io priority and cpu priority of these processes do not make any measurable
>> difference. (ionice, nice) This has been observed with processes such as:
>>
>> cp
>> rsync
>> sha1sum
>>
>> The server does have to be under some load to reproduce the necessary
>> conditions. The cases we have seen involve servers running 10-30 guests
>> under kvm. Load is in acceptable norms when the processes are run, such as
>> load avg 5-15 on a 24 core (12 core with HT enabled) server. We also verify
>> before starting such a process that the spindle with the file we're working
>> on is not being unduly hammered by another process.
>>
>> These servers have one hardware raid controller each (Dell H700 controller
>> with write cache enabled) and multiple raid arrays (separate sets of
>> physical spindles). Interestingly, the IO storm is not limited to the array
>> / spindles where the sparse file resides but affects all IO on that server.
>>
>> We have looked extensively and not found any account of a similar issue.
>> We have seen this on configurations that are 'plain vanilla' enough to
>> think that this is not something specific to our environment.
>>
>> Wondering if anyone else has seen this and if any suggestions on gathering
>> more data / troubleshooting. We wonder if we've found either a raid
>> controller driver issue, an OS issue or some other such thing. What seems
>> to point in this direction is that even with ionice -c3 which should
>> prevent the process from using IO unless the storage is idle, an io storm
>> which appears to saturate the entire raid bus on a given server can occur.
>>
> Did you ever figure anything out from this? I've noticed a similar sort of
> issue on some of our machines, so I was curious if you found the cause of
> the issue or any way to improve the situation.
>
> Thanks,
> Dave
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos

Are you sure the HDD is not too busy seeking around (investigate via iotop)?

To confirm you may like to test this on a free disk (not under load, 
like an external USB disk).