[CentOS] How to update CentOS 5.4 to 5.6?

Fri Sep 23 15:09:11 UTC 2011
Lamar Owen <lowen at pari.edu>

On Thursday, September 22, 2011 07:20:15 AM Rudi Ahlers wrote:
> surely a few versions of the OS won't take up that much space? 1TB &
> 2TB HDD's these day cost a few dollars so I don't think that's the
> real reason. And it can't be bandwidth either since the files are
> mirrored to many other servers around the globe.

You have the two pieces to the puzzle; put them together.  "Many other servers" times "a few dollars" equals "many more than a few dollars."

Enterprise grade 1TB disks are not a few dollars.  And I for one don't want the master copy of CentOS sitting on consumer-cheap drives.  This being the Community *ENTERPRISE* Operating System, after all..... of course, the following is mildly off-topic, but if being *enterprise* is important....

To make things even halfway reliable with 1TB drives you need RAID 6; this will require 4 drives at minimum to make it worthwhile.  Better is a RAID6 with 8 500GB drives, as rebuild time will be less when a drive faults (I don't say "if" for a failure, it is mostly surely "when").  

Double faults on RAID 5 become a serious issue with larger drives; RAID 6 handles double faults with any two drives (RAID1/0 can handle some double faults, but not all).  There are a number of online articles about these things.  Triple parity will soon enough be required, and arrays that can operate in degraded mode with no performance hit or data loss will just about have to be required.

RAID5/1 can work fairly well, but you need six disks to make that worthwhile.  

Hot-sparing is a necessity with arrays of this size and larger, as hot-sparing when a drive shows signs of impending fault is much less wearing on the non-faulted drives of the array (which may fault during the rebuild, which is a Bad Thing), and it is faster to copy to a hot-spare from the soon-to-be-faulted drive than it is to rebuild.  

Reliable fault prediction without many false positives does require specialized firmware on the drive to do right; that's part of what you pay for when you buy a drive from a vendor such as EMC or NetworkAppliance.  And that's just part of the reason that a drive on the 1TB range from one of those vendors is typically over $1,000 (typical fibre-channel costs are $2,500 for the current middle of the road drives; the new SAS drives being used aren't that much less expensive).  With an enterprise array you also get background verify (scrubbing) that keeps check on the health of the system and ferrets out unrecoverable errors more reliably than consumer hardware PC-based systems do.  

The dirty little secret of hard drives is that errors are occurring in drive reads all the time; that's why there is ECC on every sector processed by the drive (enterprise arrays typically do this ECC on the controller and not on the drive, using 520- or 522- byte sector drives).  Many sectors on the drive will error on reads; ECC catches the vast majority; it's when the ECC fails that you get a retry, and the TLER value is used for multiple retries and waits, and when those all fail you get an unrecoverable error (failure on write will cause a remap).  

Consumer drives won't necessarily report those correctable errors, and they will try far longer to read the data than an enterprise drive designed for array use will. Enterprise drives are expected to report sector health completely and accurately to the controller, which then makes the decision to remap or to fault; consumer drives will present 'I'm totally perfect' while hiding the error from the OS (some even hide errors from the SMART data; I'll not mention a vendor, but I have seen drives that reported but a few remaps that when surface-tested had many thousands of URE's).

Solid State Drives are more reliable, especially in a read-mostly situation, but 1TB worth of SSD is quite expensive.  But they have their own problems.

Adding a terabyte or two to an existing enterprise-class array is far more than a few dollars; a few years ago when I purchased some 750GB drives for an array I spent $2,500 per drive for five drives ($12.5K); these were added to an existing five drives that had been in RAID5, but were expanded to RAID6.  This added 2.5TB or so to the array (a 750GB drive will not hold 750GB of data, of course; that's the raw capacity; the actual data capacity is in the 690GB range; converting from RAID5 to RAID6 effectively spent one full drive on the second parity that makes RAID6 do its thing, so effectively I added 4*690GB or so of storage).  That's a wonderful ~$5,000 per terabyte of actual usable storage that is many times more reliable than a single $100 1TB drive would be.

Now, on to the other issue.  If you want a 5.6 that stays there (and gets security-only updates without going to the next point release), you really should go to Scientific Linux, since they do exactly that.  That's one of the differences between SL and CentOS, so you do have a choice.  Both are quality EL rebuilds with different philosophies about several things; I like having the choice, and I like it that SL and CentOS are different.  Different is not a bad thing.