[CentOS] Looking for a life-save LVM Guru

Sat Feb 28 23:29:21 UTC 2015
Valeri Galtsev <galtsev at kicp.uchicago.edu>

On Sat, February 28, 2015 4:22 pm, Chris Murphy wrote:
> On Sat, Feb 28, 2015 at 1:26 PM, Valeri Galtsev
> <galtsev at kicp.uchicago.edu> wrote:
>> Indeed. That is why: no LVMs in my server room. Even no software RAID.
>> Software RAID relies on the system itself to fulfill its RAID function;
>> what if kernel panics before software RAID does its job? Hardware RAID
>> (for huge filesystems I can not afford to back up) is what only makes
>> sense for me. RAID controller has dedicated processors and dedicated
>> simple system which does one simple task: RAID.
>
> Biggest problem is myriad defaults aren't very well suited for
> multiple device configurations. There are a lot of knobs in Linux and
> on the drives and in hardware RAID cards. None of this is that simple.
>
> Drives, and hardware RAID cards are subject to firmware bugs, just as
> we have software bugs in the kernel. We know firmware bugs cause
> corruption.

Speaking of which: Only good hardware cards are the ones I would use, and
only good external RAID boxes. Over last decade and a half I never had
trouble due to firmware bugs of RAIDs. What I use is:

1. 3ware (mostly)
2. LSI megaraid (a few, I don't like their user interface and poor
notification abilities)
3. Areca (also a few, better UI than that of LSI)

External RAID boxes: Infortrend

I never will go for cheepy fake RAID (adaptec is one off the top of my
head). Also, it was not my choice but I had to deal with Hm... not good
external RAID boxes: by Promise, and by Raid.com to mention two.

You are implying that firmware of hardware RAID cards is somehow buggier
than software of software RAID plus Linux kernel (sorry if I
misinterpreted your point). I disagree: embedded system of RAID card and
RAID function they have to fulfill are much simpler than everything
involved into software RAID. Therefore, with the same effort invested,
firmware of (good) hardware is less buggy. And again, Linux kernel can be
panicked more likely than trivial embedded system of hardware RAID
card/box. At least my experience over decade and a half confirms that.

I have heard horror stories from people who used the same good hardware I
mentioned (3ware). However, when I went in each case deep into detail I
discovered that they just didn't have all necessary set up correctly,
which it trivial as a matter of fact. Namely: common mistake in all cases
was: not setting RAID verify cron task (it is set on the RAID
configuration level). I have my raids verified once a week. If you don't
verify them for a year, what happens then: you don't discover individual
drive degradation until it is too late and larger number than the level of
redundancy are kicked out because of fatal failures. Even then 3ware when
it is already not redundant doesn't kick out newly failing drives, just
makes RAID read-only, so you still can salvage something. Anyway, these
horror stories were purely poor sysadmin's job IMHO.

> Not all hardware RAID cards are the same, some are total
> junk. Many others get you vendor lock in due to proprietary metadata
> written to the drives. You can't get your data off if the card dies,
> you have to buy a similar model card sometimes with the same firmware
> version in order to regain access.

I would not consider that a disadvantage. I still have to see a 3ware card
dead (yes, you can burn that if you plug it into slot with gross
misalignment like tilt). And with 3ware, later model will accept drives
originally making up RAID on older model, only it will make RAID read
only, thus you can salvage your data first, then you can re-create RAID
with this new card's (metadata standard). I guess, I may have different
philosophy than you do. If I use RAID card, I choose indeed good one. Once
I use the good one, I feel no need moving drives to card made by different
manufacturer. And last, yet important thing: if you have to use these
drives with different card (even just different model by the same
manufacturer) then you better re-create RAID from scratch on this new
card. If you value your data...

Just my $0.02

Valeri

++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++