[CentOS] HP ProLiant DL380 G5

Valeri Galtsev galtsev at kicp.uchicago.edu
Thu Aug 21 23:35:02 UTC 2014


On Thu, August 21, 2014 5:32 pm, GKH wrote:
> Valeri,
>
> I hope you realize that your arguments for hardware RAID
> all depend on everything working just right.
>
> If something goes wrong with a disk (on HW RAID)
> you can't just simply take out the disk, move it to another
> computer and maybe do some forensics.

If the drive that is member of hardware RAID failed (say timed out while
dealing with reallocation of badblock), it is kicked out of the RAID, I
get notification by 3ware daemon, hot unplug the drive, hot plug in
replacement (of same or larger size), and start rebuild RAID through
firmware utility, or GUI interface, or controller starts RAID rebuild
automatically if configured so. The system on the machine has no idea
about all this and keeps running happily.

I do no forensics on failed drives. I run manufacturer drive fitness test
(whichever it is called by particular manufacturer), if drive passes, I
usually reuse it, if it fails test, I send it to drive manufacturer for
warranty replacement or toss if it is out of warranty.

>
> The formatting of disks on HW RAID is transparent to Linux.
> Therefore my disks are all RAID or not.
>
> What if I wanted to mix and match?

With 3ware RAID controller you can export one or more of attached drives
directly to the system. They can not participate in [hardware] RAID then.

> Maybe I don't want my swap
> RAID for performance.

Speaking of swap: RAM is large and cheap these days. I do not use swap on
machines with 32 GB of RAM or larger. On multitasking system you have to
switch between processes, which is more often than every millisecond.
Imagine now you need to swap in or out 32GB during some of the switching
between processes. Your system will be on its knees just because of that.
Unless you have very special block device which is almost as fast as RAM,
which you better just have added to the address space of RAM (from kernel
point of view), so we are not speaking of these devices as of hard drives.

>
> The idea of taking my data (which is controlled by an OSS
> Operating System, Linux) and putting it behind a closed source
> and closed system RAID controller is appalling to me.

Me too. Yet all of us use them. Example: hard drive firmware. Which has a
bit more sophisticated function than just take data and write track or
other way around. It detects badblocks, reallocated them, recovers as much
as possible information that was inside that bad block. Another example:
video card, say, NVIDIA one. It has processors inside that run software
that is flashed on its non-volatile memory. Now, with Nvidia cards
sometimes you have to use proprietary driver (if you have more or less
sophisticated display arrangement). Which is closed code binary driver.
What you compile when you install that is just an interface between closed
code driver and this particular version of kernel. And this code runs not
in the card itself, but under your system. And the list goes on. Not to
mention our cell phones...

The only time I was mad about firmware of some controller was one
particular version of via PATA controller that had a bug which was leading
to drive corruption...

>
> It comes down to this: Linux knows where and when to position
> the heads of disks in order to max performance. If a
> RAID controller is in the middle, whatever algorithm
> Linux is using is no longer valid. The RAID controller
> is the one who makes the I/O decisions.

Yes, but... And some of what I'll say was already mentioned in this
thread. You can tweak RAID controller to align stripe sizes with optimal
data chunks of the drives. Furthermore, you can have cache (battery backed
up), which increases device speed tremendously (30 times I saw for
something particular - just off the top of my head). Of course, RAM is
used as a cache for software RAID, but to the contrary to RAID controller
cache RAM content vanishes with power loss. You don't seem to be the
person who had to recover after [software] RAID cache loss. And _I_
definitely will not like to be the one. Hence I use hardware RAID, with
optimized stripe size, and optimized filesystem block size, and with cache
in RAID controller that is battery backed up. If you beat me in IO with
software RAID, I will live with that. As I do not like to give up
reliability. Not at as small extra cost as I have to pay for hardware
RAID. (And I do not include into hardware RAID these fake "raid" cards
that rely on "driver" - which are indeed software RAID cards. 3ware never
fell that low to make/sell any of such junk. Somebody who knows LSI better
than I do will probably say the same about them).

Again, just my 2c.

Valeri

>
> Sorry, this is not something I want to live with.
>
> GKH
>
>
>>
>> On Thu, August 21, 2014 3:54 pm, Matt wrote:
>>>> Hate to change the conversation here but that's why I hate hardware
>>>> RAID.
>>
>> I love hardware RAID. 3ware more than others. In case of hardware RAID
>> it
>> is tiny specialized system (firmware) that is doing RAID function. In
>> the
>> specialized CPU (I should have called it differently) inside hardware
>> RAID
>> controller. Independent on the rest of computer, and needing only power
>> to
>> keep going. Tiny piece of code, very simple function. It is really hard
>> to
>> introduce bugs into these. Therefore you unlikely will have problem on
>> device level. To find the status of the device and its components
>> (physical drives) you always can use utility that comes from hardware
>> vendor, you can have even web interface if it is 3ware.
>>
>>>> If it was software RAID, Linux would always tell you what's going on.
>>
>> It does. And so does hardware RAID device. And most of them (3ware in
>> particular) do not do offline (i.e. delaying boot) check/rebuild, but
>> they
>> do it online (they are being operational in degraded state, and do
>> necessary rebuild with IO present on the device, they just export
>> themselves to Linux kernel with the warning of being degraded RAID
>> during
>> boot).
>>
>> Software RAID, however, has a disadvantage (more knowledgeable people
>> will
>> correct me wherever necessary). Software RAID function is executed by
>> main
>> CPU. Under very sophisticated system (linux kernel), as one of the
>> processes (even if it is real time process), on the system that is
>> switching between processes. Therefore, RAID task for software RAID
>> lives
>> in much more dangerous environment. Now, if it never finishes (say,
>> kernel
>> panics due to something else), you get inconsistent device (software
>> RAID
>> one), and it is much much much harder task to bring that to to some
>> extent
>> consistent state than, e.g., to bring back dirty filesystem that lives
>> on
>> the sane device. This is why we still pay for hardware RAID devices. I
>> do.
>>
>> Just my 2c.
>>
>> Valeri
>>
>> ++++++++++++++++++++++++++++++++++++++++
>> Valeri Galtsev
>> Sr System Administrator
>> Department of Astronomy and Astrophysics
>> Kavli Institute for Cosmological Physics
>> University of Chicago
>> Phone: 773-702-4247
>> ++++++++++++++++++++++++++++++++++++++++
>> _______________________________________________
>> CentOS mailing list
>> CentOS at centos.org
>> http://lists.centos.org/mailman/listinfo/centos
>>
>
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>


++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++



More information about the CentOS mailing list