On 08/11/2017 12:16 PM, Chris Murphy wrote: > On Fri, Aug 11, 2017 at 7:53 AM, Robert Nichols > <rnicholsNOSPAM at comcast.net> wrote: >> On 08/10/2017 11:06 AM, Chris Murphy wrote: >>> >>> On Thu, Aug 10, 2017, 6:48 AM Robert Moskowitz <rgm at htt-consult.com> >>> wrote: >>> >>>> >>>> >>>> On 08/09/2017 10:46 AM, Chris Murphy wrote: >>>>> >>>>> If it's a bad sector problem, you'd write to sector 17066160 and see if >>>> >>>> the >>>>> >>>>> drive complies or spits back a write error. It looks like a bad sector >>>>> in >>>>> that the same LBA is reported each time but I've only ever seen this >>>>> with >>>>> both a read error and a UNC error. So I'm not sure it's a bad sector. >>>>> >>>>> What is DID_BAD_TARGET? >>>> >>>> >>>> I have no experience on how to force a write to a specific sector and >>>> not cause other problems. I suspect that this sector is in the / >>>> partition: >>>> >>>> Disk /dev/sda: 240.1 GB, 240057409536 bytes, 468862128 sectors >>>> Units = sectors of 1 * 512 = 512 bytes >>>> Sector size (logical/physical): 512 bytes / 512 bytes >>>> I/O size (minimum/optimal): 512 bytes / 512 bytes >>>> Disk label type: dos >>>> Disk identifier: 0x0000c89d >>>> >>>> Device Boot Start End Blocks Id System >>>> /dev/sda1 2048 2099199 1048576 83 Linux >>>> /dev/sda2 2099200 4196351 1048576 82 Linux swap / >>>> Solaris >>>> /dev/sda3 4196352 468862127 232332888 83 Linux >>>> >>> >>> LBA 17066160 would be on sda3. >>> >>> dd if=/dev/sda skip=17066160 count=1 2>/dev/null | hexdump -C >>> >>> That'll read that sector and display hex and ascii. If you recognize the >>> contents, it's probably user data. Otherwise, it's file system metadata or >>> a system binary. >>> >>> If you get nothing but an I/O error, then it's lost so it doesn't matter >>> what it is, you can definitely overwrite it. >>> >>> dd if=/dev/zero of=/dev/sda seek=17066160 count=1 >> >> >> You really don't want to do that without first finding out what file is >> using >> that block. You will convert a detected I/O error into silent corruption of >> that file, and that is a much worse situation. > > Yeah he'd want to do an fsck -f and see if repairs are made, and also > rpm -Va. There *will* be legitimately modified files, so it's going to > be tedious to exactly sort out the ones that are legitimately modified > vs corrupt. If it's a configuration file, I'd say you could ignore it > but any modified binaries other than permissions need to be replaced > and is the likely culprit. > > The smartmontools page has hints on how to figure out what file is > affected by a particular sector being corrupt but the more layers are > involved the more difficult that gets. I'm not sure there's an easy to > do this with LVM in between the physical device and file system. fsck checks filesystem metadata, not the content of files. It is not going to detect that a file has had 512 bytes replaced by zeros. If the file is a non-configuration file installed from an RPM, then "rpm -Va" should flag it. LVM certainly makes the procedure harder. Figuring out what filesystem block corresponds to that LBA is still possible, but you have to examine the LV layout in /etc/lvm/backup/ and learn more than you probably wanted to know about LVM. -- Bob Nichols "NOSPAM" is really part of my email address. Do NOT delete it.