[CentOS] Clustering

Fri Feb 5 22:15:11 UTC 2010
nate <centos at linuxpowered.net>

Les Mikesell wrote:

> Somewhere along the line they switch from a CentOS base to rpath for
> better package management, but I haven't followed them since.

Yeah the version I had at the time was based on rPath, I think
they changed to something else yet again in the past year or
so.

> trusted it and it was only used for backups.   So, I no longer believe
> that paying a lot for a device that is supposed to have a good
> reputation is a sure thing - or that having a support phone number is
> going to make things better.  Everyone has different war stories, I guess...

Oh absolutely, nothing is a sure thing, on two separate occasions
last year we had a disk failure take out an entire storage array
(I speculate that fiber errors flooded the bus and that took the
controllers off line), this was on low end crap storage. One of
our vendors OEM's low end IBM storage for some of their customers
and they reported similar events on that stuff.

In 2004 the company I was at had a *massive* outage on our EMC
array(CX600), some pretty significant data loss(~60 hours of
downtime in the first week alone), in the end it was traced
to administrator(wasn't me at the time) error. A
misconfiguration of the system allowed both controllers to go
down simultaneously. Such an error is not possible to make on
more modern systems(phew). I don't know what the specific
configuration was but the admin fessed up to it a couple years
later.

Which is why most vendors will try to push for a 2nd array and doing
some sort of replication, there's only one system in the world that
I know of that puts their money behind 100% uptime and that is the
multi million $ systems from Hitachi. They claim they've never
had to pay up for any claims.

Most other array makers don't make their systems to handle more
than 99.999% uptime on the high end. And probably 99.99% on the
mid range.

BUT under most circumstances a good storage array provides far
better availability than anything someone can build on their
own for most applications. Where good typically means the system
would be sold starting at north of $50k.

I like my own storage array because it can have up to 4 controllers
running in active-active mode(right now it has 2, getting another
2 installed in a few weeks). Recently a software update was installed
that allows the system to re-mirror itself to another controller(s)
in the system in the event of a controller failure.

Normally in a dual controller system if a controller goes down the
system goes into write-through mode to ensure data integrity which
can destroy performance, with this feature that doesn't happen,
and the system still ensures data integrity by making sure all data
is written to two locations before the write is acknowledged to
the host.

It goes well beyond that though, it automatically lays data out
so that it can survive a full shelf(up to 40 drives) failing without
skipping a beat. RAID rebuilds are very fast(up to 10x faster than
other systems), the drives are connected to a switched back plane,
there are no fiber loops on the system, every shelf of disks is
directly connected to the controllers via two fiber ports. In
the event of a power failure there is an internal disk in each
controller that the system writes it's cache out to, so no worries
about a power outage lasting longer than the batteries(typically
48-72 hours). And of course since everything is written twice,
when the power goes out you store two copies of that cache on
the internal disks, in the event one disk happens to fail
(hopefully both don't) at the precisely wrong moment.

The drives themselves are in vibration absorbing
sleds, vibration is the #1 cause of failure on disks according
to a report I read from Seagate.

http://portal.aphroland.org/~aphro/chassis-architecture.png
http://www.techopsguys.com/2009/11/20/enterprise-sata-disk-reliability/

I have had two soft failures on the system since we got it, one time
a fiber channel port had a sort of core dump, and another where
a system process crashed, both were recovered automatically without
user intervention and no noticeable impact other than the email
alerts to me.

No guarantees it won't burst into flames one day, but I do sleep
a lot better at night with this system vs the last one.

My vendor also recently introduced an interesting solution for
replication which involves 3 arrays providing synchronous long
distance replication, it works like this:

(while all arrays must be from the same vendor they do not need
to be identical in any way)

Array 1 sits in facility A
Array 2 sits in facility B (up to ~130 miles away, or 1.3ms RTT)
Array 3 sits in facility C (up to 3000 miles away, or 150ms RTT)

Array 1 is synchronously replicating to facility B (hence distance
limitations), and asynchronously replicating to facility C at
defined intervals. In the event facility A or Array 1 blows up,
Array 3 in facility C automatically connects to Array 2 and has it
send all of the data up to the point Array 1 went down, I think
you can get as close as something like a few milliseconds from
the disaster that took out Array 1, and get all of the data to
Array 3.

Setting it up takes about 30 minutes, and it's all automatic.

Prior to this setting up such a solution would cost waaaaaaaaaay
more, as you'd only find it in the most high end systems.

It's going to be many times cheaper to get a 2nd array and replicate
than it is to try to design/build a single system that offers 100%
uptime.

Entry level pricing of this particular array starts at maybe $130k,
can go probably as high as $2-3M if you load it up with software(more
of half the cost can be software add ons). So it's not in the same
league as most NetApp, or Equallogic, or even EMC/HDS gear.
Their low end stuff starts at probably $70k.

nate