[CentOS] [OT] RAID 6 - opinions

Tue Apr 23 15:43:32 UTC 2013
Lamar Owen <lowen at pari.edu>

On 04/12/2013 02:11 AM, John R Pierce wrote:
> many of the systems I design get deployed in remote DCs and are 
> installed, managed and operated by local personnel where I have no 
> clue as to the levels of their skills, so its in my best interest to 
> make the procedures as simple and failsafe as possible. when faced 
> with a wall of 20 storage servers, each with 48 disks, good luck with 
> finding that 20 digit alphanumeric serial number 
> "3KT190V20000754280ED" ... uh HUH, thats assuming all 960 disks got 
> just the right sticker put on the caddies. 'replace the drives with 
> the red blinking lights' is much simpler than 'figure out what 
> /dev/sdac is on server 12' 

This is what I love about real RAID controllers or real storage array 
systems, like NetApp, EMC, and others.  Not only does the faulted drive 
light up amber, but the shelf/DAE also lights up amber.

I told an EMC VP a week or so ago that 'anybody can throw a bunch of 
drives together, but that's not what really makes an array work.' The 
software that alerts you and does the automatic hotsparing (even across 
RAID groups (using EMC terminology)) is where the real value is.  A 
bunch of big drives all lopped together can be a pain to troubleshoot 
indeed.

I've done arrays with a bunch of COTS drives; and I've done EMC. Capex 
is easier to justify than opex in a grant-funded situation, and that's 
why in 2007 we bought our first EMC Clariions (44TB worth, not a lot by 
today's standards), since the grant would fund the capex but not the 
opex, and I've not regretted it once since. One of those Clariion 
CX3-10c's has been continuously available since placed into service in 
2007, even through OS (EMC FLARE) upgrades/updates and a couple of drive 
faults.