RAID 6 - opinions - Discuss - lists.centos.org

List overview All Threads
Download

newer

RAID 6 - opinions

older

CentOS-announce Digest, Vol 98,...

m.roth＠5-cent.us

11 Apr 2013 11 Apr '13

3:36 p.m.

I'm setting up this huge RAID 6 box. I've always thought of hot spares, but I'm reading things that are comparing RAID 5 with a hot spare to RAID 6, implying that the latter doesn't need one. I *certainly* have enough drives to spare in this RAID box: 42 of 'em, so two questions: should I assign one or more hot spares, and, if so, how many?

mark

Show replies by date

Joseph Spenner

11 Apr 11 Apr

3:51 p.m.

...

From: "m.roth@5-cent.us" m.roth@5-cent.us

...

To: CentOS mailing list centos@centos.org Sent: Thursday, April 11, 2013 8:36 AM Subject: [CentOS] RAID 6 - opinions

...

I'm setting up this huge RAID 6 box. I've always thought of hot spares, but I'm reading things that are comparing RAID 5 with a hot spare to RAID 6, implying that the latter doesn't need one. I *certainly* have enough drives to spare in this RAID box: 42 of 'em, so two questions: should I assign one or more hot spares, and, if so, how many?

A RAID5 with a hot spare isn't really the same as a RAID6. For those not familiar with this, a RAID5 in degraded mode (after it lost a disk) will suffer a performance hit, as well as while it rebuilds from a hot spare. A RAID6 after losing a disk will not suffer. So, depending on your need for performance, you'll need to decide. As far as having a spare disk on a RAID6, I'd say it's not necessary. As long as you have some mechanism in place to inform you if/when a disk fails, you'll not suffer any performance hit.

John Doe

4:03 p.m.

From: Joseph Spenner joseph85750@yahoo.com

...

A RAID5 with a hot spare isn't really the same as a RAID6. For those not familiar with this, a RAID5 in degraded mode (after it lost a disk) will suffer a performance hit, as well as while it rebuilds from a hot spare. A RAID6 after losing a disk will not suffer. So, depending on your need for performance, you'll need to decide. As far as having a spare disk on a RAID6, I'd say it's not necessary. As long as you have some mechanism in place to inform you if/when a disk fails, you'll not suffer any performance hit.

Also, if you lose a disk, the RAID6 can lose a second disk anytime without problem. The RAID5 cannot until the hot spare has fully replaced the dead disk (which can take a while). And, I believe RAID6 algorithm might be (a little) more demanding/slow than RAID5. Check also RAID50 and 60 if your controller permits it...

Keith Keller

7:20 p.m.

On 2013-04-11, Joseph Spenner joseph85750@yahoo.com wrote:

...

...
From: "m.roth@5-cent.us" m.roth@5-cent.us

...
To: CentOS mailing list centos@centos.org Sent: Thursday, April 11, 2013 8:36 AM Subject: [CentOS] RAID 6 - opinions

...
I'm setting up this huge RAID 6 box. I've always thought of hot spares, but I'm reading things that are comparing RAID 5 with a hot spare to RAID 6, implying that the latter doesn't need one. I *certainly* have enough drives to spare in this RAID box: 42 of 'em, so two questions: should I assign one or more hot spares, and, if so, how many?

As another poster mentioned, I'd even break this up into multiple RAID6 arrays. One big honking 42 drive array, if they're large disks, will take forever to rebuild after a failure.

...

As far as having a spare disk on a RAID6, I'd say it's not necessary. As long as you have some mechanism in place to inform you if/when a disk fails, you'll not suffer any performance hit.

With this many drives, I'd designate at least one as a global spare anyway. Yes, you lose some capacity, but you have even more cushion if, say, you're out of town for a week, a drive fails, and your backup person is sick. One possible configuration is to create three RAID6 arrays with 11 drives each (or one or two with 12 instead), and group them using LVM. You could also simply create one RAID6 with the capacity you need for the next few months, then create new arrays and add them to your volume group as you need them. This has the added bonus that you look like a genius for deploying new capacity so quickly. :) Recently I acquired a half-empty storage array, so that I can add larger drives as they become available instead of being tied to drive sizes of today.

...

A RAID5 with a hot spare isn't really the same as a RAID6. For those not familiar with this, a RAID5 in degraded mode (after it lost a disk) will suffer a performance hit, as well as while it rebuilds from a hot spare. A RAID6 after losing a disk will not suffer.

I seem to remember reading on the linux RAID mailing list that, at least for linux md RAID6 (which the OP may not be using), performance on a RAID6 with one missing drive is slightly worse than optimal RAID5. I could be wrong however, and perhaps a hardware RAID controller doesn't have this deficiency.

--keith

-- kkeller@wombat.san-francisco.ca.us

Digimer

6:34 p.m.

On 04/11/2013 11:36 AM, m.roth@5-cent.us wrote:

...

I'm setting up this huge RAID 6 box. I've always thought of hot spares, but I'm reading things that are comparing RAID 5 with a hot spare to RAID 6, implying that the latter doesn't need one. I *certainly* have enough drives to spare in this RAID box: 42 of 'em, so two questions: should I assign one or more hot spares, and, if so, how many?
    mark

I was building a home NAS over the holidays and had the same question (well, not hot spare, but 5 vs. 6). A good friend on mine pointed me to the following article;

http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162

I was using 6x 3 TB drives, so I decided to opt for RAID 6. About a month ago, a drive cacked out and I was *very* relieved to know that I was covered until I replaced the disk and it finished rebuilding.

If you have 42 disks, I'd not even think twice and I would use RAID level 6. If fact, with such a large number, I'd almost be tempted to break it into two separate RAID level 6 arrays and use something like LVM to pool their space, just to hedge my bets.

-- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?

John R Pierce

7:05 p.m.

On 4/11/2013 8:36 AM, m.roth@5-cent.us wrote:

...

I'm setting up this huge RAID 6 box. I've always thought of hot spares, but I'm reading things that are comparing RAID 5 with a hot spare to RAID 6, implying that the latter doesn't need one. I*certainly* have enough drives to spare in this RAID box: 42 of 'em, so two questions: should I assign one or more hot spares, and, if so, how many?

John's First Rule of Raid. when a drive fails 2-3 years downstream, replacements will be unavailable. If you had bought cold spares and stored them, odds are too high they will be lost when you need them.

John's Second Rule of Raid. No single raid should be much over 10-12 disks, or the rebuild times become truly hellacious.

John's Third Rule of Raid. allow 5-10% hot spares.

so, with 42 disks, 10% would be ~4 spares, which leaves 38. 5% would be 2 spares, allowing 40 disks.

40 divided by 4 == 10. You could format that as 10 raid6's, and stripe those (aka raid6+0 or raid60), and use 2 hot spares. Alternately, 3*13 == 39, leaving three hotspares, so 3 stripes of 13 disks with 3 hot spares is an alternative.

I did some testing of very large raids using LSI Logic 9261-8i MegaRAID SAS2 cards driving 36 3TB SATA3 disks. With 3 x 11 disk RAID6 (and 3 hot spares), a failed disk took about 12 hours to restripe with the rebuilding set to medium priority, and the raid essentially idle.

if you're using XFS on this very large file system (which I *would* recommend), do be sure to use a LOT of ram, like 48GB... while regular operations might not need it, XFS's fsck process is fairly memory intensive on a very large volume with millions of files.

-- john r pierce 37N 122W somewhere on the middle of the left coast

m.roth＠5-cent.us

7:30 p.m.

John R Pierce wrote:

...

On 4/11/2013 8:36 AM, m.roth@5-cent.us wrote:

...
I'm setting up this huge RAID 6 box. I've always thought of hot spares, but I'm reading things that are comparing RAID 5 with a hot spare to RAID 6, implying that the latter doesn't need one. I*certainly* have

enough

...

...
drives to spare in this RAID box: 42 of 'em, so two questions: should I assign one or more hot spares, and, if so, how many?

John's First Rule of Raid. when a drive fails 2-3 years downstream, replacements will be unavailable. If you had bought cold spares and stored them, odds are too high they will be lost when you need them.

John's Second Rule of Raid. No single raid should be much over 10-12 disks, or the rebuild times become truly hellacious.

John's Third Rule of Raid. allow 5-10% hot spares.

so, with 42 disks, 10% would be ~4 spares, which leaves 38. 5% would be 2 spares, allowing 40 disks.

<snip>

...

I did some testing of very large raids using LSI Logic 9261-8i MegaRAID SAS2 cards driving 36 3TB SATA3 disks. With 3 x 11 disk RAID6 (and 3 hot spares), a failed disk took about 12 hours to restripe with the rebuilding set to medium priority, and the raid essentially idle.

if you're using XFS on this very large file system (which I *would* recommend), do be sure to use a LOT of ram, like 48GB... while regular operations might not need it, XFS's fsck process is fairly memory intensive on a very large volume with millions of files.

Ok, listening to all of this, I've also been in touch with a tech from the vendor*, who had a couple of suggestions: first, two RAID sets with two global hot spares.

I've just spoken with my manager, and we're going with that, then one of the tech's other suggestions was three volume sets on top of the two RAID sets, so we'll have what look like three drives/LUNs of about 13+TB each.

All your comments were very appreciated, and gave me a lot more confidence in this setup. We will be using ext4, btw - I don't get to try out XFS on this $$$$$$ baby.

mark

* Unpaid plug: we bought this from AC&NC: their own price was cheaper than either of the two resellers I spoke to (three quotes required), they seem pretty hungry (but have been around a while, given the number of old boxes we have), and they respond *very* quickly to problems for support.

m.roth＠5-cent.us

8:20 p.m.

m.roth@5-cent.us wrote: <snip>

...

Ok, listening to all of this, I've also been in touch with a tech from the vendor*, who had a couple of suggestions: first, two RAID sets with two global hot spares.

I've just spoken with my manager, and we're going with that, then one of the tech's other suggestions was three volume sets on top of the two RAID sets, so we'll have what look like three drives/LUNs of about 13+TB each.

<snip> Followup comment: I created the two RAID sets, then started to create the volume sets... and realized I didn't know if it was *possible*, much less desirable, to have a volume set that spanned two RAID sets. Talked it over with my manager, and I redid it as three RAID sets, one volume set each.

Maybe the initialization will be done tomorrow.... <g>

mark

John R Pierce

8:46 p.m.

On 4/11/2013 1:20 PM, m.roth@5-cent.us wrote:

...

Followup comment: I created the two RAID sets, then started to create the volume sets... and realized I didn't know if it was*possible*, much less desirable, to have a volume set that spanned two RAID sets. Talked it over with my manager, and I redid it as three RAID sets, one volume set each.

sure. throw all the RAIDs into a single LVM volume group, and then stripe 3 logical volumes's across that volume group

-- john r pierce 37N 122W somewhere on the middle of the left coast

John R Pierce

8:24 p.m.

On 4/11/2013 12:30 PM, m.roth@5-cent.us wrote:

...

Ok, listening to all of this, I've also been in touch with a tech from the vendor*, who had a couple of suggestions: first, two RAID sets with two global hot spares.

I would test how long a drive rebuild takes on a 20 disk RAID6. I suspect, very long, like over 24 hours, assuming a fast controller and sufficient channel bandwidth.

-- john r pierce 37N 122W somewhere on the middle of the left coast

Joseph Spenner

8:36 p.m.

________________________________ From: John R Pierce pierce@hogranch.com To: centos@centos.org Sent: Thursday, April 11, 2013 1:24 PM Subject: Re: [CentOS] RAID 6 - opinions

On 4/11/2013 12:30 PM, m.roth@5-cent.us wrote:

...

...
Ok, listening to all of this, I've also been in touch with a tech from the vendor*, who had a couple of suggestions: first, two RAID sets with two global hot spares.

I would test how long a drive rebuild takes on a 20 disk RAID6. I suspect, very long, like over 24 hours, assuming a fast controller and sufficient channel bandwidth.

----

But isn't that one of the benefits of RAID6? (not much degraded/latency effect during a rebuild, less impact on performance during rebuild, so longer times are acceptable?)

______________________________________________________________________ If life gives you lemons, keep them-- because hey.. free lemons. "♥ Sticker" fixer: http://microflush.org/stuff/stickers/heartFix.html

John R Pierce

8:43 p.m.

On 4/11/2013 1:36 PM, Joseph Spenner wrote:

...

But isn't that one of the benefits of RAID6? (not much degraded/latency effect during a rebuild, less impact on performance during rebuild, so longer times are acceptable?)

trouble comes in 3s.

-- john r pierce 37N 122W somewhere on the middle of the left coast

David C. Miller

11:13 p.m.

----- Original Message -----

...

From: "Joseph Spenner" joseph85750@yahoo.com To: "CentOS mailing list" centos@centos.org Sent: Thursday, April 11, 2013 1:36:29 PM Subject: Re: [CentOS] RAID 6 - opinions

From: John R Pierce pierce@hogranch.com To: centos@centos.org Sent: Thursday, April 11, 2013 1:24 PM Subject: Re: [CentOS] RAID 6 - opinions

On 4/11/2013 12:30 PM, m.roth@5-cent.us wrote:

...
...
Ok, listening to all of this, I've also been in touch with a tech from the vendor*, who had a couple of suggestions: first, two RAID sets with two global hot spares.

I would test how long a drive rebuild takes on a 20 disk RAID6. I suspect, very long, like over 24 hours, assuming a fast controller and sufficient channel bandwidth.

Just for reference, I have a 24 x 2TB SATAIII using CentOS 6.4 Linux MD RAID6 with two of those 24 disks as hotspares. The drives are in a Supermicro external SAS/SATA box connected to another Supermicro 1U computer with an i3-2125 CPU @ 3.30GHz and 16GB ram. The connection is via a 6Gbit mini SAS cable to an LSI 9200 HBA. Before I deployed it into production I tested how long it would take to rebuild the raid from one of the hot spares and it took a little over 9 hours. I have two 15TB LVM's on it formatted EXT4 with the rest used for LVM snapshot space if needed. Using dd to write a large file to one of the partitions I see about 480MB/s. If I rsync from one partition to another I get just under 200MB/s.

dd if=/dev/zero of=/backup/5GB.img count=5000 bs=1M 5000+0 records in 5000+0 records out 5242880000 bytes (5.2 GB) copied, 10.8293 s, 484 MB/s

David.

Keith Keller

11:34 p.m.

On 2013-04-11, David C. Miller millerdc@fusion.gat.com wrote:

...

Just for reference, I have a 24 x 2TB SATAIII using CentOS 6.4 Linux MD RAID6 with two of those 24 disks as hotspares. The drives are in a Supermicro external SAS/SATA box connected to another Supermicro 1U computer with an i3-2125 CPU @ 3.30GHz and 16GB ram. The connection is via a 6Gbit mini SAS cable to an LSI 9200 HBA. Before I deployed it into production I tested how long it would take to rebuild the raid from one of the hot spares and it took a little over 9 hours.

I did a similar test on a 3ware controller. Apparently those cards have a feature that allows the controller to remember which sectors on the disks it has written, so that on a rebuild it only reexamines those sectors. This greatly reduces rebuild time on a mostly empty array, but it means that a good test would almost fill the array, then attempt a rebuild. I definitely saw a difference in rebuild times as I filled the array. (In 3ware/LSI world this is sometimes called "rapid RAID recovery".)

In checking my archives, it looks like a rebuild on an almost full 50TB array (24 disks) took about 16 hours. That's still pretty respectable. I didn't repeat the experiment, unfortunately.

I don't know if your LSI controller has a similar feature, but it's worth investigating.

--keith

-- kkeller@wombat.san-francisco.ca.us

David C. Miller

12 Apr 12 Apr

12:04 a.m.

----- Original Message -----

...

From: "Keith Keller" kkeller@wombat.san-francisco.ca.us To: centos@centos.org Sent: Thursday, April 11, 2013 4:34:20 PM Subject: Re: [CentOS] RAID 6 - opinions

On 2013-04-11, David C. Miller millerdc@fusion.gat.com wrote:

...
Just for reference, I have a 24 x 2TB SATAIII using CentOS 6.4 Linux MD RAID6 with two of those 24 disks as hotspares. The drives are in a Supermicro external SAS/SATA box connected to another Supermicro 1U computer with an i3-2125 CPU @ 3.30GHz and 16GB ram. The connection is via a 6Gbit mini SAS cable to an LSI 9200 HBA. Before I deployed it into production I tested how long it would take to rebuild the raid from one of the hot spares and it took a little over 9 hours.

I did a similar test on a 3ware controller. Apparently those cards have a feature that allows the controller to remember which sectors on the disks it has written, so that on a rebuild it only reexamines those sectors. This greatly reduces rebuild time on a mostly empty array, but it means that a good test would almost fill the array, then attempt a rebuild. I definitely saw a difference in rebuild times as I filled the array. (In 3ware/LSI world this is sometimes called "rapid RAID recovery".)

In checking my archives, it looks like a rebuild on an almost full 50TB array (24 disks) took about 16 hours. That's still pretty respectable. I didn't repeat the experiment, unfortunately.

I don't know if your LSI controller has a similar feature, but it's worth investigating.

--keith

The LSI 9200's I use are nothing more than a dumb $300 host bus adapter. No RAID levels or special features. I prefer to NOT use hardware RAID controllers when I can. With a generic HBA the hard drives are seen raw to the OS. You can use smartctl to poll and test the drives just like they were connected to a generic SATA bus on the motherboard. The tools built into Linux(smartd & md) are better suited and more flexible at reporting problems and handling every level of RAID. It also makes migrating the array to another system trivial. I don't have to worry about finding the exact same RAID controller. Just a no frills SAS/SATA HBA will do.

David.

John R Pierce

12:25 a.m.

On 4/11/2013 5:04 PM, David C. Miller wrote:

...

The LSI 9200's I use are nothing more than a dumb $300 host bus adapter. No RAID levels or special features. I prefer to NOT use hardware RAID controllers when I can. With a generic HBA the hard drives are seen raw to the OS. You can use smartctl to poll and test the drives just like they were connected to a generic SATA bus on the motherboard. The tools built into Linux(smartd & md) are better suited and more flexible at reporting problems and handling every level of RAID. It also makes migrating the array to another system trivial. I don't have to worry about finding the exact same RAID controller. Just a no frills SAS/SATA HBA will do.

yeah, until a disk fails on a 40 disk array and the chassis LEDs on the backplane don't light up to indicate which disk it is and your operations monkey pulls the wrong one and crash the whole raid.

have fun with that!

if you can figure out how to get the drive backplane status LEDs to work on Linux with a 'dumb' controller plugged into a drive backplane, PLEASE WRITE IT UP ON A WIKI SOMEWHERE!!! everything I've seen leaves this gnarly task as an exercise to the reader. With a card like a 9261-8i, it just works automatically.

also, hardware raid controllers WITH battery backed (or flash backed) cache can greatly speed up small block write operations like directory entry creates, database writes, etc.

-- john r pierce 37N 122W somewhere on the middle of the left coast

David Miller

5:01 a.m.

On Apr 11, 2013, at 5:25 PM, John R Pierce pierce@hogranch.com wrote:

...

On 4/11/2013 5:04 PM, David C. Miller wrote:

...
The LSI 9200's I use are nothing more than a dumb $300 host bus adapter. No RAID levels or special features. I prefer to NOT use hardware RAID controllers when I can. With a generic HBA the hard drives are seen raw to the OS. You can use smartctl to poll and test the drives just like they were connected to a generic SATA bus on the motherboard. The tools built into Linux(smartd & md) are better suited and more flexible at reporting problems and handling every level of RAID. It also makes migrating the array to another system trivial. I don't have to worry about finding the exact same RAID controller. Just a no frills SAS/SATA HBA will do.

yeah, until a disk fails on a 40 disk array and the chassis LEDs on the backplane don't light up to indicate which disk it is and your operations monkey pulls the wrong one and crash the whole raid.

have fun with that!

if you can figure out how to get the drive backplane status LEDs to work on Linux with a 'dumb' controller plugged into a drive backplane, PLEASE WRITE IT UP ON A WIKI SOMEWHERE!!! everything I've seen leaves this gnarly task as an exercise to the reader. With a card like a 9261-8i, it just works automatically.

also, hardware raid controllers WITH battery backed (or flash backed) cache can greatly speed up small block write operations like directory entry creates, database writes, etc.

You simply match up the Linux /dev/sdX designation with the drives serial number using smartctl. When I first bring the array online I have a script that greps out the drives serial numbers from smartctl and creates a neat text file with the mappings. When either smartd or md complain about a drive I remove the drive from the RAID using mdadm and then pull the drive based on the mapping file. Drive 0 in those SuperMicro SAS/SATA arrays are always the lowest drive letter and goes up from there. If a drive is replaced I just update the text file accordingly. You can also print out the drive serial numbers and put them on the front of the removable drive cages. It is not as elegant as a blinking LED but it works just as well. I have been doing it like this for 6 plus years now with a few dozen SuperMicro arrays. I have never pulled a wrong drive.

David.

Keith Keller

5:48 a.m.

New subject: [OT] RAID 6 - opinions

On 2013-04-12, David Miller millerdc@fusion.gat.com wrote:

...

On Apr 11, 2013, at 5:25 PM, John R Pierce pierce@hogranch.com wrote:

...
yeah, until a disk fails on a 40 disk array and the chassis LEDs on the backplane don't light up to indicate which disk it is and your operations monkey pulls the wrong one and crash the whole raid.

[snip]

...

You simply match up the Linux /dev/sdX designation with the drives serial number using smartctl. When I first bring the array online I have a script that greps out the drives serial numbers from smartctl and creates a neat text file with the mappings. When either smartd or md complain about a drive I remove the drive from the RAID using mdadm and then pull the drive based on the mapping file. Drive 0 in those SuperMicro SAS/SATA arrays are always the lowest drive letter and goes up from there. If a drive is replaced I just update the text file accordingly. You can also print out the drive serial numbers and put them on the front of the removable drive cages. It is not as elegant as a blinking LED but it works just as well. I have been doing it like this for 6 plus years now with a few dozen SuperMicro arrays. I have never pulled a wrong drive.

I think that there is at least one potential problem, and possibly more, with your method.

1) It only takes once forgetting to update the mapping file to screw things up for yourself. Some people are the type who will never forget to do that. I'm (unfortunately) not. (Actually, I guess it takes twice, since if you have only one slot not up to date, you could use the serial numbers to map all but the one drive, and that's the suspect drive. I wouldn't want to trust that process.)

2) Drive assignments can be dynamic. If you pull the tray in port 0, which was sda (for example), you're not necessarily guaranteed that the replacement drive will be sda. It might be assigned the next available sdX. I have seen this in certain failure situations. (As an aside, how does the kernel handle more than 26 hard drive devices? sdaa? sdA?)

1a and 2a) Printing serial numbers and taping them to the tray is much less error-prone, but also more time consuming. If you have a label printer that certainly makes things easier.

3) If you have someone else pulling drives for you, they may not have access to the mapping file, and/or may not be willing or under contract to print a new tray label and replace it. It's way less error-prone to tell an "operations monkey" to pull the blinky drive than to hope you read the mapping file correctly, and relay the correct location to the monkey. (The ops monkey may not have login rights on your server, so you also can't rely on him being able to look at the mapping file himself.) If you're the only person who will ever pull drives, this isn't such a huge problem.

That's not to say that your methods can't work--obviously they can if you haven't had any mistakes in many years. But the combination of a BBU-backed write cache and an identify blink makes a dedicated hardware RAID controller a big win for me. (I do also use md RAID, even on hardware RAID controllers, where flexibility and portability are more important than performance.)

--keith

-- kkeller@wombat.san-francisco.ca.us

John R Pierce

6:11 a.m.

New subject: [OT] RAID 6 - opinions

On 4/11/2013 10:48 PM, Keith Keller wrote:

...

(As an aside, how does the kernel handle more than 26 hard drive devices? sdaa? sdA?)

sdaa, sdab, sdac, ... sdba, sdbb, sdbc.... etc etc.

and yes, if you have 40+ disks as JBOD, its a bloody mess, especially if linux udev starts getting creative.

many of the systems I design get deployed in remote DCs and are installed, managed and operated by local personnel where I have no clue as to the levels of their skills, so its in my best interest to make the procedures as simple and failsafe as possible. when faced with a wall of 20 storage servers, each with 48 disks, good luck with finding that 20 digit alphanumeric serial number "3KT190V20000754280ED" ... uh HUH, thats assuming all 960 disks got just the right sticker put on the caddies. 'replace the drives with the red blinking lights' is much simpler than 'figure out what /dev/sdac is on server 12'

-- john r pierce 37N 122W somewhere on the middle of the left coast

Lamar Owen

23 Apr 23 Apr

3:43 p.m.

New subject: [OT] RAID 6 - opinions

On 04/12/2013 02:11 AM, John R Pierce wrote:

...

many of the systems I design get deployed in remote DCs and are installed, managed and operated by local personnel where I have no clue as to the levels of their skills, so its in my best interest to make the procedures as simple and failsafe as possible. when faced with a wall of 20 storage servers, each with 48 disks, good luck with finding that 20 digit alphanumeric serial number "3KT190V20000754280ED" ... uh HUH, thats assuming all 960 disks got just the right sticker put on the caddies. 'replace the drives with the red blinking lights' is much simpler than 'figure out what /dev/sdac is on server 12'

This is what I love about real RAID controllers or real storage array systems, like NetApp, EMC, and others. Not only does the faulted drive light up amber, but the shelf/DAE also lights up amber.

I told an EMC VP a week or so ago that 'anybody can throw a bunch of drives together, but that's not what really makes an array work.' The software that alerts you and does the automatic hotsparing (even across RAID groups (using EMC terminology)) is where the real value is. A bunch of big drives all lopped together can be a pain to troubleshoot indeed.

I've done arrays with a bunch of COTS drives; and I've done EMC. Capex is easier to justify than opex in a grant-funded situation, and that's why in 2007 we bought our first EMC Clariions (44TB worth, not a lot by today's standards), since the grant would fund the capex but not the opex, and I've not regretted it once since. One of those Clariion CX3-10c's has been continuously available since placed into service in 2007, even through OS (EMC FLARE) upgrades/updates and a couple of drive faults.

Lamar Owen

3:33 p.m.

On 04/12/2013 01:01 AM, David Miller wrote:

...

You simply match up the Linux /dev/sdX designation with the drives serial number using smartctl. When I first bring the array online I have a script that greps out the drives serial numbers from smartctl and creates a neat text file with the mappings. When either smartd or md complain about a drive I remove the drive from the RAID using mdadm and then pull the drive based on the mapping file. Drive 0 in those SuperMicro SAS/SATA arrays are always the lowest drive letter and goes up from there. If a drive is replaced I just update the text file accordingly. You can also print out the drive serial numbers and put them on the front of the removable drive cages. It is not as elegant as a blinking LED but it works just as well. I have been doing it like this for 6 plus years now with a few dozen SuperMicro arrays. I have never pulled a wrong drive.

It's great the the Supermicro controllers can do this, but I know from experience that in the general case with multiple controllers and on CentOS 6 this will not work. Just a quick caveat on that.....

Michael Schumacher

13 Apr 13 Apr

3:08 p.m.

hi,

...

yeah, until a disk fails on a 40 disk array and the chassis LEDs on the backplane don't light up to indicate which disk it is and your operations monkey pulls the wrong one and crash the whole raid.

that is why I put a label on every drive tray that is visible without pulling the disk. That label carries the serial number, so that the monkey can double check the disk serial before pulling it. In fact, I was the silly monkey once, so I am careful now :-)

best regards --- Michael

Miranda Hawarden-Ogata

12 Apr 12 Apr

2:08 a.m.

On 2013/04/11 10:36 AM, Joseph Spenner wrote:

...

From: John R Pierce pierce@hogranch.com To: centos@centos.org Sent: Thursday, April 11, 2013 1:24 PM Subject: Re: [CentOS] RAID 6 - opinions

On 4/11/2013 12:30 PM, m.roth@5-cent.us wrote:

...
...
Ok, listening to all of this, I've also been in touch with a tech from the vendor*, who had a couple of suggestions: first, two RAID sets with two global hot spares.

I would test how long a drive rebuild takes on a 20 disk RAID6. I suspect, very long, like over 24 hours, assuming a fast controller and sufficient channel bandwidth.

But isn't that one of the benefits of RAID6? (not much degraded/latency effect during a rebuild, less impact on performance during rebuild, so longer times are acceptable?)

If life gives you lemons, keep them-- because hey.. free lemons. "♥ Sticker" fixer: http://microflush.org/stuff/stickers/heartFix.html _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Besides performance, the longer your rebuild takes, the more vulnerable you are to additional disk failure taking out your array. We've lost arrays that way in the past, pre-RAID6, lost two disks within a 6-hour period, and there went the array since the rebuild wasn't complete. RAID6 means you can handle 2 disk failures, but the third one will drop your array, if I'm remembering correctly. And the larger the number of disks, the higher the chance that you'll have disk failures...

Thanks! Miranda

Keith Keller

2:55 a.m.

On 2013-04-12, Miranda Hawarden-Ogata hawarden@ifa.hawaii.edu wrote:

...

RAID6 means you can handle 2 disk failures, but the third one will drop your array, if I'm remembering correctly. And the larger the number of disks, the higher the chance that you'll have disk failures...

Yes, and yes. But different configurations of other RAID levels will give you different levels of protection--not "better" or "worse", because that needs to be evaluated in context.

For example, as has been noted, RAID6 can lose up to two drives, and the third lost drive loses the array [0]. A 12-drive RAID10, with six two-drive RAID1 components, can lose up to six drives, but only the right six drives--losing both drives of one RAID1 loses the entire array. On the other side of things, rebuilding a 12-drive RAID6 will take much longer than rebuilding one RAID1 component of a RAID10. And as one more example, a 12-drive RAID50, with three four-drive RAID5 components, can lose up to three drives, one from each component, but two drives from one RAID5 loses the array. Rebuild times will be longer than RAID10 but shorter than RAID6. (There are also performance questions, which I know little about.)

RAID6 is certainly the most efficient way, space-wise, to allocate drives such that you can lose up to two drives before losing the array. So if maximizing storage space is the primary concern, greater than performance, RAID6 is likely the best choice. But, as is often repeated here, on the md RAID list, and elsewhere, ***RAID IS NOT A BACKUP SOLUTION!!!*** If you care about your data you need to back it up elsewhere. Do *not* rely solely on RAID to keep your data safe! All sorts of bad things can happen: a flaky controller can cause filesystem problems, and a badly defective controller can completely destroy the array. RAID allows you to tolerate some failure, but it can't save your data from catastrophe.

--keith

[0] "loses the array" here means that it won't be mountable without some sort of expensive drive recovery process.

-- kkeller@wombat.san-francisco.ca.us

Adrian Sevcenco

5:30 a.m.

On 04/11/2013 06:36 PM, m.roth@5-cent.us wrote:

...

I'm setting up this huge RAID 6 box. I've always thought of hot spares, but I'm reading things that are comparing RAID 5 with a hot spare to RAID 6, implying that the latter doesn't need one. I *certainly* have enough drives to spare in this RAID box: 42 of 'em, so two questions: should I

we use several of this kind of boxes (but with 45 trays) and our experience was that the optimum volume size was 12 hdds (3 X 12 + 9) which will reduce the 45 disks to a actual size of 37 disks (a 12 disk volume is 40 TB size ... in event of a broken hdd it takes 1 day to recover.. more than 12 disks and i dont (want to) know how long it would take) and we don't use hot spares.

HTH, Adrian

4489

Age (days ago)

4501

Last active (days ago)

discuss@lists.centos.org

24 comments

12 participants

tags (0)

participants (12)

Adrian Sevcenco
David C. Miller
David Miller
Digimer
John Doe
John R Pierce
Joseph Spenner
Keith Keller
Lamar Owen
m.roth＠5-cent.us
Michael Schumacher
Miranda Hawarden-Ogata