[CentOS] Errors on an SSD drive

Fri Aug 11 11:41:04 UTC 2017
hw <hw at gc-24.de>

Chris Murphy wrote:
> On Wed, Aug 9, 2017, 11:55 AM Mark Haney <mark.haney at neonova.net> wrote:
>
>> To be honest, I'd not try a btrfs volume on a notebook SSD. I did that on a
>> couple of systems and it corrupted pretty quickly. I'd stick with xfs/ext4
>
> if you manage to get the drive working again.
>>
>
> Sounds like a hardware problem. Btrfs is explicitly optimized for SSD, the
> maintainers worked for FusionIO for several years of its development. If
> the drive is silently corrupting data, Btrfs will pretty much immediately
> start complaining where other filesystems will continue. Bad RAM can also
> result in scary warnings where you don't with other filesytems. And I've
> been using it in numerous SSDs for years and NVMe for a year with zero
> problems.

That´s one thing I´ve been wondering about:  When using btrfs RAID, do you
need to somehow monitor the disks to see if one has failed?

> On CentOS though, I'd get newer btrfs-progs RPM from Fedora, and use either
> an elrepo.org kernel, a Fedora kernel, or build my own latest long-term
> from kernel.org. There's just too much development that's happened since
> the tree found in RHEL/CentOS kernels.

I can´t go with a more recent kernel version before NVIDIA has updated their
drivers to no longer need fence.h (or what it was).

And I thought stuff gets backported, especially things as important as file
systems.

> Also FWIW Red Hat is deprecating Btrfs, in the RHEL 7.4 announcement.
> Support will be removed probably in RHEL 8. I have no idea how it'll affect
> CentOS kernels though. It will remain in Fedora kernels.

That would suck badly to the point at which I´d have to look for yet another
distribution.  The only one ramaining is arch.

What do they suggest as a replacement?  The only other FS that comes close is
ZFS, and removing btrfs alltogether would be taking living in the past too many
steps too far.

> Anyway, blkdiscard can be used on an SSD, whole or partition to zero them
> out. And at least recent ext4 and XFS mkfs will do a blkdisard, same as
> mksfs.btrfs.
>
>
> Chris Murphy
>
>
>
>
>
>
>> <
>> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon
>>>
>> Virus-free.
>> www.avast.com
>> <
>> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link
>>>
>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>
>> On Wed, Aug 9, 2017 at 1:48 PM, hw <hw at gc-24.de> wrote:
>>
>>> Robert Moskowitz wrote:
>>>
>>>> I am building a new system using an Kingston 240GB SSD drive I pulled
>>>> from my notebook (when I had to upgrade to a 500GB SSD drive).  Centos
>>>> install went fine and ran for a couple days then got errors on the
>>>> console.  Here is an example:
>>>>
>>>> [168176.995064] sd 0:0:0:0: [sda] tag#14 FAILED Result:
>>>> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>>>> [168177.004050] sd 0:0:0:0: [sda] tag#14 CDB: Read(10) 28 00 01 04 68 b0
>>>> 00 00 08 00
>>>> [168177.011615] blk_update_request: I/O error, dev sda, sector 17066160
>>>> [168487.534510] sd 0:0:0:0: [sda] tag#17 FAILED Result:
>>>> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>>>> [168487.543576] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 01 04 68 b0
>>>> 00 00 08 00
>>>> [168487.551206] blk_update_request: I/O error, dev sda, sector 17066160
>>>> [168787.813941] sd 0:0:0:0: [sda] tag#20 FAILED Result:
>>>> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>>>> [168787.822951] sd 0:0:0:0: [sda] tag#20 CDB: Read(10) 28 00 01 04 68 b0
>>>> 00 00 08 00
>>>> [168787.830544] blk_update_request: I/O error, dev sda, sector 17066160
>>>>
>>>> Eventually, I could not do anything on the system.  Not even a 'reboot'.
>>>> I had to do a cold power cycle to bring things back.
>>>>
>>>> Is there anything to do about this or trash the drive and start anew?
>>>>
>>>
>>> Make sure the cables and power supply are ok.  Try the drive in another
>>> machine
>>> that has a different controller to see if there is an incompatibility
>>> between
>>> the drive and the controller.
>>>
>>> You could make a btrfs file system on the whole device: that should say
>>> that
>>> a trim operation is performed for the whole device.  Maybe that helps.
>>>
>>> If the errors persist, replace the drive.  I悲 use Intel SSDs because they
>>> seam to have the least problems with broken firmwares.  Do not use SSDs
>>> with
>>> hardware RAID controllers unless the SSDs were designed for this
>>> application.
>>>
>>>
>>> _______________________________________________
>>> CentOS mailing list
>>> CentOS at centos.org
>>> https://lists.centos.org/mailman/listinfo/centos
>>>
>>>
>>
>>
>> --
>> [image: photo]
>> Mark Haney
>> Network Engineer at NeoNova
>> 919-460-3330 <(919)%20460-3330> (opt 1) • mark.haney at neonova.net
>> www.neonova.net <https://neonova.net/>
>> <https://www.facebook.com/NeoNovaNNS/>  <https://twitter.com/NeoNova_NNS>
>> <http://www.linkedin.com/company/neonova-network-services>
>> _______________________________________________
>> CentOS mailing list
>> CentOS at centos.org
>> https://lists.centos.org/mailman/listinfo/centos
>>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
>