On Tue, January 19, 2016 4:48 pm, John R Pierce wrote: > On 1/19/2016 2:24 PM, Warren Young wrote: >> Itâs dying. Replace it now. > > agreed > >> On a modern hard disk, you should*never* see bad sectors, because the >> drive is busy hiding all the bad sectors it does find, then telling you >> everything is fine. > > thats not actually true. the drive will report 'bad sector' if you > try and read data that the drive simply can't read. you wouldn't want > it to return bad data and say its OK. many(most?) drives won't > actually remap to a bad sector until you write new data over that block > number, since they don't want to copy bad data without any way of > telling the OS the data is invalid. these pending remaps are listed > under smart parameter 197 Current_Pending_Sector > Apparently, you know more about modern drives than I do, but as far as I know it is a bit longer story when bad block is discovered. Here it is. Basically, bad blocks are discovered on read operation when CRC (cyclic redundancy check) sum does not match. (in fact it is a bit more sophisticated than just CRC, as modern high data density drives are trying to match some analog signal they get on read head to digitally coded upon record). When this discovery happens, firmware decides, this is a bad block, adds its new location in badblock re-allocation table (a while ago when I learned this this reallocation table was located in non-volatile memory of drive controller board). Then firmware hold all other tasks and tries to recover the information stored in bad block. It re-reads it and superimposes read results until the CRC matches and then writes recovered data into re-allocated place, or gives up after some large number of attempts, then it writes whatever garbage it ends up with into re-allocated place and reports fatal read error. This attempt of recovery of bad blocks very noticeably slows down IO on device. So, "freezing" on some IO when accessing files may be indication of developing of multiple bad blocks. Time to replace the drive. The drive (even after irrecoverable - fatal - read error) is still considered usable, only when bad block re-allocation table fills up, the drive starts reporting that it is "out of specs". On a side note: even if CRC matches, it doesn't ensure that recovered data is the same as data originally written. This is why filesystems that keep sophisticated checksums of files are getting popular (zfs to name one). Just my $0.02. Valeri > > > -- > john r pierce, recycling bits in santa cruz > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos > ++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++