[CentOS] Errors on an SSD drive

On Thu, Aug 10, 2017, 6:48 AM Robert Moskowitz <rgm at htt-consult.com> wrote:

>
>
> On 08/09/2017 10:46 AM, Chris Murphy wrote:
> > If it's a bad sector problem, you'd write to sector 17066160 and see if
> the
> > drive complies or spits back a write error. It looks like a bad sector in
> > that the same LBA is reported each time but I've only ever seen this with
> > both a read error and a UNC error. So I'm not sure it's a bad sector.
> >
> > What is DID_BAD_TARGET?
>
> I have no experience on how to force a write to a specific sector and
> not cause other problems.  I suspect that this sector is in the /
> partition:
>
> Disk /dev/sda: 240.1 GB, 240057409536 bytes, 468862128 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk label type: dos
> Disk identifier: 0x0000c89d
>
>     Device Boot      Start         End      Blocks   Id  System
> /dev/sda1            2048     2099199     1048576   83  Linux
> /dev/sda2         2099200     4196351     1048576   82  Linux swap /
> Solaris
> /dev/sda3         4196352   468862127   232332888   83  Linux
>

LBA 17066160 would be on sda3.

dd if=/dev/sda skip=17066160 count=1 2>/dev/null | hexdump -C

That'll read that sector and display hex and ascii. If you recognize the
contents, it's probably user data. Otherwise, it's file system metadata or
a system binary.

If you get nothing but an I/O error, then it's lost so it doesn't matter
what it is, you can definitely overwrite it.

dd if=/dev/zero of=/dev/sda seek=17066160 count=1

If you want an extra confirmation, you can first do 'smartctl -t long
/dev/sda' and then after the prescribed testing time, which is listed,
check it again with 'smartct -a /dev/sda' and see if the test completed, or
if under self-test log section, it shows it was aborted and lists a number
under the LBA_of_first_error column.

> But I don't know where it is in relation to the way the drive was
> formatted in my notebook.  I think it would have been in the / partition.
>

>
> > And what do you get for
> > smartctl -x <dev>
>
> About 17KB of output?

Can you attach it as a file to the list? If the list won't accept the
attachment, put it up on fpaste.org or pastebin or something like that.
MUA's tend to nerf the output so don't paste it into an email.

> I don't know how to read what it is saying, but
> noted in the beginning:
>
> Write SCT (Get) XXX Error Recovery Control Command failed: scsi error
> badly formed scsi parameters
>
> Don't know what this means...
>
> BTW, the system is a Cubieboard2 armv7 SoC running Centos7-armv7hl. This
> is the first time I have used an SSD on a Cubie, but I know it is
> frequently done.  I would have to ask on the Cubie forum what others
> experience with SSDs have been.
>

It's very common. I think this is just an ordinary bad sector, if that LBA
value is consistent. If it's a new SSD it's slightly concerning. You can
either keep an eye on it, or put a little pressure on the manufacturer or
place of purchase that you have a bad sector and would like to swap out the
unit.

SSD's, in particular SD Cards (which you're not using, which is noted as
/dev/mmcblk0...) store you data as a probabilistic representation, and
through a lot of magic, the probability of retrieving your data correctly
from SSD is made very high. Almost deterministic.

The magic is in the firmware, and so there's some possibility any given SSD
problem is related to a firmware bug. So it's worth comparing the firmware
reported by smartctl and what the manufacturer has, and then their
changelog. Most have a way to update firmware without Windows, but don't
have images that will boot an arm board, usually the "universal" updater is
based on FreeDOS funny enough. You'd need to stick the SSD in an x86
computer to do this. Hilariously perverse, I did this with a Samsung 830
SSD a while back, sticking it into a Macbook Pro, and burned that firmware
ISO onto a DVD-RW, and it booted that Mac (using the firmware's BIOS
compatibility support module) and updated the SSD's firmware without a
problem.

Chris Murphy