On Wed, Dec 12, 2012 at 1:52 PM, Matt Garman <matthew.garman at gmail.com> wrote: >> > I agree with all that. Problem is, there is a higher risk of storage > failure with RAID-10 compared to RAID-6. Does someone have the real odds here? I think the big risks are always that you have unnoticed bad sectors on the remaining mirror/parity drive when you lose a disk or that you keep running long enough to develop them before replacing it. > We do have good, reliable > *data* backups, but no real hardware backup. Our current service > contract on the hardware is next business day. That's too much down > time to tolerate with this particular system. > > As I typed that, I realized we technically do have a hardware > backup---the other server I mentioned. But even the time to restore > from backup would make a lot of people extremely unhappy. > > How do most people handle this kind of scenario, i.e. can't afford to > have a hardware failure for any significant length of time? Have a > whole redundant system in place? I would have to "sell" the idea to > management, and for that, I'd need to precisely quantify our situation > (i.e. my initial question). The simple-minded approach is to have a spare chassis and some spare drives to match your critical boxes. The most likely thing to go are the drives so all you have to do is rebuild the raid. In less likely event of a chassis failure, you can swap the drives into a spare a lot faster than copying the data. You only need a few spares to cover the likely failures across many production boxes but storage servers might be a special case with a different chassis type. You are still going to have some downtime with this approach, though - and it works best where you have operations staff on site to do the swaps. Also, you need to test it to be sure you understand what you have to change to make the system come up with new NIC's, etc. -- Les Mikesell lesmikesell at gmail.com