[CentOS] Btrfs going forward, was: Errors on an SSD drive

Fri Aug 11 22:14:35 UTC 2017
Chris Murphy <lists at colorremedies.com>

On Fri, Aug 11, 2017 at 11:17 AM, Mark Haney <mark.haney at neonova.net> wrote:
> On Fri, Aug 11, 2017 at 1:00 PM, Chris Murphy <lists at colorremedies.com>
> wrote:
>
>> Changing the subject since this is rather Btrfs specific now.
>>
>>
>>
>> >>
>> >> Sounds like a hardware problem. Btrfs is explicitly optimized for SSD,
>> the
>> >> maintainers worked for FusionIO for several years of its development. If
>> >> the drive is silently corrupting data, Btrfs will pretty much
>> immediately
>> >> start complaining where other filesystems will continue. Bad RAM can
>> also
>> >> result in scary warnings where you don't with other filesytems. And I've
>> >> been using it in numerous SSDs for years and NVMe for a year with zero
>> >> problems.
>> >
>> >
>>
>>
>> LMFAO. Trust me, I tried several SSDs with BTRFS over the last couple of
>> years and had trouble the entire time. I constantly had to scrub the drive,
>> had freezes under moderate load and general nastiness.  If that's
>> 'optimized for SSDs', then something is very wrong with the definition of
>> optimized.  Not to mention the fact that BTRFS is not production ready for
>> anything, and I'm done trying to use it and going with XFS or EXT4
>> depending on my need.


Could you get your quoting in proper order? The way you did this looks
like I wrote the above steaming pile rant.

Whoever did write it, it's ridiculous, meaning it's worthy of
ridicule. From the provably unscientific and non-technical, to
craptasticly snotty writing "not to mention the fact" and then
proceeding to mention it. That's just being an idiot, and then framing
it.

Where are your bug reports? That question is a trap if you haven't in
fact filed any bugs, in particular upstream.



> As for a hardware problem, the drives were ones purchased in Lenovo
> professional workstation laptops, and, while you do get lemons
> occasionally, I tried 4 different ones of the exact same model and had the
> exact same issues.  Its highly unlikely I'd get 4 of the same brand to have
> hardware issues.

In fact it's highly likely because a.) it's a non-scientific sample
and b.) the hardware is intentionally identical. If the firmware is

 For SSDs all the sauce is in the firmware. If the model and firmware
were all the same, it is more likely to be a firmware bug than it is
to be a Btrfs bug. There are absolutely cases where Btrfs runs into
problems that other file systems don't, because Btrfs is designed to
detect them and others aren't. There's a reason why XFS and ext4 have
added metadata checksumming in recent versions. Hardware lies.
Firmware has bugs and it causes problems. And it can be months before
it materializes into a noticeable problem.

https://lwn.net/Articles/698090/

Btrfs tends to complain early and often when it encounters confusion,
It also will go read only sooner than other file systems in order to
avoid corrupting the file system. Almost always a normal mount will
automatically fallback to the most recent consistent state. Sometimes
it needs to be mounted with -o usebackuproot option. And still in
fewer cases it will need to be mounted read only, where other file
systems won't even tolerate that in the same situation.

The top two complaints I have about Btrfs is a.) what to do when a
normal mount doesn't work, it's really non-obvious what you *should*
do and in what order because there are many specialized tools for
different problems, so if your file system doesn't mount normally you
are really best off going straight to the upstream list and asking for
help, which is sorta shitty but that's the reality; b.) there are
still some minority workloads where users have to micromanage the file
system with a filtered balance to avoid a particular variety of bogus
enospc. Most of the enospc problems are fixed with some changes in
kernel 4.1 and 4.8. The upstream expert users are discussing some sort
of one size fits all user space filtered (meaning partial) balance so
regular users don't have to micromanage. It's completely a legitimate
complaint that having to micromanage a file system is b.s. This has
been a particularly difficult problem, and it's been around for a long
enough time that I think a lot of normal workloads that would have run
into problems have been masked (no problem) because so many users have
gotten into the arguably bad habit of doing their own filtered
balances.

But as for Btrfs having some inherent flaw that results in corrupt
file systems, it's silly. There are thousands of users in many
production workloads using this file system and they'd have given up a
long time ago, including myself.


>Once I went back to ext4 on those systems I could run the
> devil out of them and not see any freezes under even heavy load, nor any
> other hardware related items.  In fact, the one I used at my last job was
> given to me on my way out and it's now being used by my daughter. It's been
> upgraded from Fedora 23 to 26 without a hitch.  On ext4.  Say what you
> want, BTRFS is a very bad filesystem in my experience.


Read this.
https://www.spinics.net/lists/linux-btrfs/msg67308.html

If there was some inherent problem with Btrfs and SSDs, as you've
asserted, that wouldn't be possible. And that's an example with quota
support enabled, that's my big surprise. There are some performance
implications with Btrfs quotas, and it's a relatively new feature, but
that is a very good report.

-- 
Chris Murphy