[CentOS] home directory server performance issues

On Wed, Dec 12, 2012 at 1:52 PM, Matt Garman <matthew.garman at gmail.com> wrote:
>>
> I agree with all that.  Problem is, there is a higher risk of storage
> failure with RAID-10 compared to RAID-6.

Does someone have the real odds here?  I think the big risks are
always that you have unnoticed bad sectors on the remaining
mirror/parity drive when you lose a disk or that you keep running long
enough to develop them before replacing it.

> We do have good, reliable
> *data* backups, but no real hardware backup.  Our current service
> contract on the hardware is next business day.  That's too much down
> time to tolerate with this particular system.
>
> As I typed that, I realized we technically do have a hardware
> backup---the other server I mentioned.  But even the time to restore
> from backup would make a lot of people extremely unhappy.
>
> How do most people handle this kind of scenario, i.e. can't afford to
> have a hardware failure for any significant length of time?  Have a
> whole redundant system in place?  I would have to "sell" the idea to
> management, and for that, I'd need to precisely quantify our situation
> (i.e. my initial question).

The simple-minded approach is to have a spare chassis and some spare
drives to match your critical boxes.  The most likely thing to go are
the drives so all you have to do is rebuild the raid.  In less likely
event of a chassis failure, you can swap the drives into a spare a lot
faster than copying the data.  You only need a few spares to cover the
likely failures across many production boxes but storage servers might
be a special case with a different chassis type.  You are still going
to have some downtime with this approach, though - and it works best
where you have operations staff on site to do the swaps.   Also, you
need to test it to be sure you understand what you have to change to
make the system come up with new NIC's, etc.

-- 
   Les Mikesell
      lesmikesell at gmail.com