Thank you Lance, We will change the memory to see if it is resolving the problem. that storage only has basic linux kernel , which unfortunately does not carry memtest86. Lance Davis wrote: > On Thu, 11 Oct 2007, Centos wrote: > >> do you think replacing ram will solve our problem ? > > assuming it is RAM gone faulty and not some other issue then it should. > >> how can I make sure it is the ram ? > > memtest86 ?? > > Regards > Lance > >> >> >> Lance Davis wrote: >>> On Thu, 11 Oct 2007, Centos wrote: >>> >>> > The ECC errors only happens when I am transferring data from other >>> > storage to this one that we get error. >>> > it only happens when it is writing data to it. >>> >>> Well that is when it is detected ... >>> >>> As I said ECC RAM errors are concerned with an error in storage - >>> not an >>> error in transmission. >>> >>> Regards >>> Lance >>> >>> >>> >>> > > > Lance Davis wrote: >>> > > On Thu, 11 Oct 2007, Centos wrote: >>> > > > > > was wondering if it is safe to use the device, until we >>> receive RAM. >>> > > > that device is our main storage. >>> > > > > does data retransmit when ECC errors happen. >>> > > > I don't want to have data corruption. >>> > > > > You are not talking about data transission - but storage >>> > > > > If two or more bit errors occur then ECC is not able to >>> correct them > > and >>> > > you are likely to get data corruption. >>> > > > > Regards >>> > > Lance >>> > > > > > > > > Matthew Miller wrote: >>> > > > > On Thu, Oct 11, 2007 at 09:57:12AM -0300, Centos wrote: >>> > > > > > > > Has any one have any experience in ECC RAM Errors. >>> > > > > > we are seeing ECC fault Errors but I am not sure if it can >>> be > > > > related to > RAM it self or >>> > > > > > it is related to bad connection and noise. >>> > > > > > please let me know if you have a good document regarding >>> ECC > > Errors, >>> > > > > > specially I want to know if data will be retransmitted >>> when > > error > > > happens. >>> > > > > > 02:00:31, Thursday, 10/11/2007 >>> > > > > > : EXCEPTION: ECC Error Interrupt (Two or more Bit Error) >>> > > > > > 0C18:00020001 0C68:00000000 Lcause:74630001 Lerr:1C855F82 >>> > > > > > > > > > Change the memory; see if the errors persist. >>> > > > > > > > >>>>>> _______________________________________________ >>> > > > CentOS mailing list >>> > > > CentOS at centos.org >>> > > > http://lists.centos.org/mailman/listinfo/centos >>> > > > > >>>> _______________________________________________ >>> > CentOS mailing list >>> > CentOS at centos.org >>> > http://lists.centos.org/mailman/listinfo/centos >>> > >> >> _______________________________________________ >> CentOS mailing list >> CentOS at centos.org >> http://lists.centos.org/mailman/listinfo/centos >> >> >