Les Mikesell wrote: > Somewhere along the line they switch from a CentOS base to rpath for > better package management, but I haven't followed them since. Yeah the version I had at the time was based on rPath, I think they changed to something else yet again in the past year or so. > trusted it and it was only used for backups. So, I no longer believe > that paying a lot for a device that is supposed to have a good > reputation is a sure thing - or that having a support phone number is > going to make things better. Everyone has different war stories, I guess... Oh absolutely, nothing is a sure thing, on two separate occasions last year we had a disk failure take out an entire storage array (I speculate that fiber errors flooded the bus and that took the controllers off line), this was on low end crap storage. One of our vendors OEM's low end IBM storage for some of their customers and they reported similar events on that stuff. In 2004 the company I was at had a *massive* outage on our EMC array(CX600), some pretty significant data loss(~60 hours of downtime in the first week alone), in the end it was traced to administrator(wasn't me at the time) error. A misconfiguration of the system allowed both controllers to go down simultaneously. Such an error is not possible to make on more modern systems(phew). I don't know what the specific configuration was but the admin fessed up to it a couple years later. Which is why most vendors will try to push for a 2nd array and doing some sort of replication, there's only one system in the world that I know of that puts their money behind 100% uptime and that is the multi million $ systems from Hitachi. They claim they've never had to pay up for any claims. Most other array makers don't make their systems to handle more than 99.999% uptime on the high end. And probably 99.99% on the mid range. BUT under most circumstances a good storage array provides far better availability than anything someone can build on their own for most applications. Where good typically means the system would be sold starting at north of $50k. I like my own storage array because it can have up to 4 controllers running in active-active mode(right now it has 2, getting another 2 installed in a few weeks). Recently a software update was installed that allows the system to re-mirror itself to another controller(s) in the system in the event of a controller failure. Normally in a dual controller system if a controller goes down the system goes into write-through mode to ensure data integrity which can destroy performance, with this feature that doesn't happen, and the system still ensures data integrity by making sure all data is written to two locations before the write is acknowledged to the host. It goes well beyond that though, it automatically lays data out so that it can survive a full shelf(up to 40 drives) failing without skipping a beat. RAID rebuilds are very fast(up to 10x faster than other systems), the drives are connected to a switched back plane, there are no fiber loops on the system, every shelf of disks is directly connected to the controllers via two fiber ports. In the event of a power failure there is an internal disk in each controller that the system writes it's cache out to, so no worries about a power outage lasting longer than the batteries(typically 48-72 hours). And of course since everything is written twice, when the power goes out you store two copies of that cache on the internal disks, in the event one disk happens to fail (hopefully both don't) at the precisely wrong moment. The drives themselves are in vibration absorbing sleds, vibration is the #1 cause of failure on disks according to a report I read from Seagate. http://portal.aphroland.org/~aphro/chassis-architecture.png http://www.techopsguys.com/2009/11/20/enterprise-sata-disk-reliability/ I have had two soft failures on the system since we got it, one time a fiber channel port had a sort of core dump, and another where a system process crashed, both were recovered automatically without user intervention and no noticeable impact other than the email alerts to me. No guarantees it won't burst into flames one day, but I do sleep a lot better at night with this system vs the last one. My vendor also recently introduced an interesting solution for replication which involves 3 arrays providing synchronous long distance replication, it works like this: (while all arrays must be from the same vendor they do not need to be identical in any way) Array 1 sits in facility A Array 2 sits in facility B (up to ~130 miles away, or 1.3ms RTT) Array 3 sits in facility C (up to 3000 miles away, or 150ms RTT) Array 1 is synchronously replicating to facility B (hence distance limitations), and asynchronously replicating to facility C at defined intervals. In the event facility A or Array 1 blows up, Array 3 in facility C automatically connects to Array 2 and has it send all of the data up to the point Array 1 went down, I think you can get as close as something like a few milliseconds from the disaster that took out Array 1, and get all of the data to Array 3. Setting it up takes about 30 minutes, and it's all automatic. Prior to this setting up such a solution would cost waaaaaaaaaay more, as you'd only find it in the most high end systems. It's going to be many times cheaper to get a 2nd array and replicate than it is to try to design/build a single system that offers 100% uptime. Entry level pricing of this particular array starts at maybe $130k, can go probably as high as $2-3M if you load it up with software(more of half the cost can be software add ons). So it's not in the same league as most NetApp, or Equallogic, or even EMC/HDS gear. Their low end stuff starts at probably $70k. nate