[CentOS] HP ProLiant DL380 G5

Fri Aug 22 18:04:07 UTC 2014
Les Mikesell <lesmikesell at gmail.com>

On Thu, Aug 21, 2014 at 6:17 PM, John R Pierce <pierce at hogranch.com> wrote:

>>> >Yes, but try a software RAID when you have intermittently bad RAM.
>>> >I've been there.  Mirrored disks that were almost, but not quite,
>>> >mirrors.
>> try any file system when you've got flakey ram.    data thats not quite
>> what you wanted, oh boy.

Yes, but if you fix the RAM, fsck the disk, and rewrite the data you
sort of expect it to work again.  In this case with the mirrors
randomly mismatching but marked as good, fsck would read the good one
in some spots when checking but later the system would read the bad
one.  In hindsight the reason is obvious but it took me a while to see
why the box still crashed every few weeks.

> which, btw, is why I insist on ECC for servers.  and really prefer ZFS
> where each block of each part of a raid is checksummed and timestamped,
> so when scrub finds mismatching blocks, it can know which one is correct.

I thought this was supposed to be ECC with 1-bit correction - and I
thought that was supposed to mean that if it couldn't correct it would
just stop, but it didn't.  It took about 3 days of a memtest-86 run to
hit the problem and show that it was RAM - and it has run for many
subsequent years since swapping it all.   But, the only reason that
box is still around is that it is an enormous tower case and the only
thing I had with enough drive bays for what I was doing then.

-- 
  Les Mikesell
     lesmikesell at gmail.com