On 04/12/2013 02:11 AM, John R Pierce wrote: > many of the systems I design get deployed in remote DCs and are > installed, managed and operated by local personnel where I have no > clue as to the levels of their skills, so its in my best interest to > make the procedures as simple and failsafe as possible. when faced > with a wall of 20 storage servers, each with 48 disks, good luck with > finding that 20 digit alphanumeric serial number > "3KT190V20000754280ED" ... uh HUH, thats assuming all 960 disks got > just the right sticker put on the caddies. 'replace the drives with > the red blinking lights' is much simpler than 'figure out what > /dev/sdac is on server 12' This is what I love about real RAID controllers or real storage array systems, like NetApp, EMC, and others. Not only does the faulted drive light up amber, but the shelf/DAE also lights up amber. I told an EMC VP a week or so ago that 'anybody can throw a bunch of drives together, but that's not what really makes an array work.' The software that alerts you and does the automatic hotsparing (even across RAID groups (using EMC terminology)) is where the real value is. A bunch of big drives all lopped together can be a pain to troubleshoot indeed. I've done arrays with a bunch of COTS drives; and I've done EMC. Capex is easier to justify than opex in a grant-funded situation, and that's why in 2007 we bought our first EMC Clariions (44TB worth, not a lot by today's standards), since the grant would fund the capex but not the opex, and I've not regretted it once since. One of those Clariion CX3-10c's has been continuously available since placed into service in 2007, even through OS (EMC FLARE) upgrades/updates and a couple of drive faults.