On 01/19/2016 06:46 PM, Chris Murphy wrote: > Hence, bad sectors accumulate. And the consequence of this often > doesn't get figured out until a user looks at kernel messages and sees > a bunch of hard link resets.... The standard Unix way of refreshing the disk contents is with badblocks' non-destructive read-write test (badblocks -n or as the -cc option to e2fsck, for ext2/3/4 filesystems). The remap will happen on the writeback of the contents. It's been this way with enterprise SCSI drives for as long as I can remember there being enterprise-class SCSI drives. ATA drives caught up with the SCSI ones back in the early 90's with this feature. But it's always been true, to the best of my recollection, that the remap always happens on a write. The rationale is pretty simple: only on a write error does the drive know that it has the valid data in its buffer, and so that's the only safe time to put the data elsewhere. > This problem affects all software raid, including btrfs raid1. The > ideal scenario is you'll use 'smartctl -l scterc,70,70 /dev/sdX' in > startup script, so the drive fails reads on marginally bad sectors > with an error in 7 seconds maximum. > This is partly why enterprise arrays manage their own per-sector ECC and use 528-byte sector sizes. The drives for these arrays make very poor workstation standalone drives, since the drive is no longer doing all the error recovery itself, but relying on the storage processor to do the work. Now, the drive is still doing some basic ECC on the sector, but the storage processor is getting a much better idea of the health of each sector than when the drive's firmware is managing remap. Sophisticated enterprise arrays, like NetApp's, EMC's, and Nimble's, can do some very accurate predictions and proactive hotsparing when needed. That's part of what you pay for when you buy that sort of array. But the other fact of life of modern consumer-level hard drives is that *errored sectors are expected* and not exceptions. Why else would a drive have a TLER in the two minute range like many of the WD Green drives do? And with a consumer-level drive I would be shocked if badblocks reported the same number each time it ran through.