[CentOS] HDD badblocks

Wed Jan 20 14:16:33 UTC 2016
Lamar Owen <lowen at pari.edu>

On 01/19/2016 06:46 PM, Chris Murphy wrote:
> Hence, bad sectors accumulate. And the consequence of this often
> doesn't get figured out until a user looks at kernel messages and sees
> a bunch of hard link resets....

The standard Unix way of refreshing the disk contents is with badblocks' 
non-destructive read-write test (badblocks -n or as the -cc option to 
e2fsck, for ext2/3/4 filesystems).  The remap will happen on the 
writeback of the contents.  It's been this way with enterprise SCSI 
drives for as long as I can remember there being enterprise-class SCSI 
drives.  ATA drives caught up with the SCSI ones back in the early 90's 
with this feature.  But it's always been true, to the best of my 
recollection, that the remap always happens on a write.  The rationale 
is pretty simple: only on a write error does the drive know that it has 
the valid data in its buffer, and so that's the only safe time to put 
the data elsewhere.

> This problem affects all software raid, including btrfs raid1. The
> ideal scenario is you'll use 'smartctl -l scterc,70,70 /dev/sdX' in
> startup script, so the drive fails reads on marginally bad sectors
> with an error in 7 seconds maximum.
>
This is partly why enterprise arrays manage their own per-sector ECC and 
use 528-byte sector sizes.  The drives for these arrays make very poor 
workstation standalone drives, since the drive is no longer doing all 
the error recovery itself, but relying on the storage processor to do 
the work.  Now, the drive is still doing some basic ECC on the sector, 
but the storage processor is getting a much better idea of the health of 
each sector than when the drive's firmware is managing remap.  
Sophisticated enterprise arrays, like NetApp's, EMC's, and Nimble's, can 
do some very accurate predictions and proactive hotsparing when needed.  
That's part of what you pay for when you buy that sort of array.

But the other fact of life of modern consumer-level hard drives is that 
*errored sectors are expected* and not exceptions.  Why else would a 
drive have a TLER in the two minute range like many of the WD Green 
drives do?  And with a consumer-level drive I would be shocked if 
badblocks reported the same number each time it ran through.