[CentOS] HDD badblocks

Thu Jan 21 16:27:07 UTC 2016
Lamar Owen <lowen at pari.edu>

On 01/20/2016 01:43 PM, Chris Murphy wrote:
> On Wed, Jan 20, 2016, 7:17 AM Lamar Owen <lowen at pari.edu> wrote:
>
>> The standard Unix way of refreshing the disk contents is with 
>> badblocks' non-destructive read-write test (badblocks -n or as the 
>> -cc option to e2fsck, for ext2/3/4 filesystems). 
>
> This isn't applicable to RAID, which is what this thread is about. For
> RAID, use scrub, that's what is for.

The badblocks read/write verification would need to be done on the RAID 
member devices, not the aggregate md device, for member device level 
remap.  It might need to be done with the md offline, not sure.  Scrub?  
There is a scrub command (and package) in CentOS, but it's meant for 
secure data erasure, and is not a non-destructive thing.  Ah, you're 
talking about what md will do if 'check' or 'repair' is written to the 
appropriate location in the sysfs for the md in question.  (This info is 
in the md(4) man page).

> The badblocks method fixes nothing if the sector is persistently bad and
> the drive reports a read error.

The badblocks method will do a one-off read/write verification on a 
member device; no, it won't do it automatically, true enough.

> It fixes nothing if the command timeout is
> reached before the drive either recovers or reports a read error.

Very true.

> And even
> if it works, you're relying on ECC recovered data rather than reading a
> likely good copy from mirror or parity and writing that back to the bad
> block.

Yes, for the member drive this is true.  Since my storage here is 
primarily on EMC Clariion, I'm not sure what the equivalent to EMC's 
background verify would be for mdraid, since I've not needed that 
functionality from mdraid.  (I really don't like the term 'software 
RAID' since at some level all RAID is software RAID, whether on a 
storage processor or in the RAID controller's firmware.....).  It does 
appear that triggering a scrub from sysfs for a particular md might be 
similar functionality, and would do the remap if inconsistent data is 
found.  This is a bit different from the old Unix way, but these are 
newer times and such the way of doing things is different.

> But all of this still requires the proper configuration.
Yes, this is very true.