--On Friday, February 10, 2012 01:49:05 PM -0600 Les Mikesell lesmikesell@gmail.com wrote:
I suppose it is possible for a NIC to fail, but I can't recall actually ever seeing it. I've seen lots of complicated failover schemes introduce new problems and their own failure modes [...]
+1.
Redundancy is cool. Redundancy, when needed and properly implemented, can work and can save your bacon. However, it is expensive, time consuming, and significantly increases both the complexity of a system and the skill needed to analyze problems (or for that matter predict them and plan for mitigation strategies). It also needs to be exercised on a regular basis or, when you need it, you'll find that someone has made a bad configuration change that prohibits failover.
I, also, have not seen a properly tested NIC fail in quite a few years. (I'm discounting bad NIC models that don't pass evaluation.) Of course, just because I've not seen it doesn't mean it can't happen, but I also don't usually worry about having a redundant SERIAL back-channel for cluster hearbeat operations, which used to be considered as the only reasonable way to do things.
I do have clusters where bonding is in use but those have helped not so much in avoiding NIC failures as they do in allowing the machines to continue operating as the network team brings down part of the redundant switch network for maintenance (or to replace a failed switch, or when some fool decides that they can unplug a network cable briefly so that they can move other cables around).
Devin