Re: [CentOS] debugging RAM issues

14 Mar 2012


      On Wed, Mar 14, 2012 at 2:35 PM, John R Pierce pierce@hogranch.com wrote:
...
On 03/14/12 12:16 PM, Les Mikesell wrote:
...
If you were running software RAID1 on that box, don't trust anything
on the drives now.   Maybe even if you weren't, but it is especially
weird when alternate reads randomly revive bad data that you thought
had been fixed already.
and the worst part is, even if you found mismatching blocks on the
mirrors, there's no way to know which one is the 'good' one, as there's
no block checksumming or anything like that with conventional RAID.
this is a major reason I *insist* on ECC for any sort of server other
than a lightweight home system.   ECC memory will detect bit failures so
you KNOW something is funky.
I _thought_ the server where I had this problem was supposed to have
had 1-bit error correction and I also thought that if the error
couldn't be corrected with ECC  it was supposed to crash instead of
continuing.  But maybe it had the wrong kind of RAM installed or
something that disabled the ECC.
-- 
   Les Mikesell
     lesmikesell@gmail.com

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] debugging RAM issues