on 10/15/2007 5:16 AM Centos spake the following: > Thanks every one for help and response. > > I just noticed that these errors might be soft error, because only > happens when I overload the > storage with copying simultaneously large files on the same port and > scsi controller, so I was thinking > it should be ECC speed to calculation of the parity or ram shortage. > > hardware supposed to take care of ECC erros and also device should > be panic or hang by seeing these error, but device just keep going. > > what do you think ? > I have had systems so overloaded that I couldn't log in on an ssh session, but when the load cleared, there weren't any ECC errors. I still think you have a hardware problem, and just because it takes a high load now doesn't mean that it is OK. A faulty timing capacitor on the motherboard can cause all sorts of corruption in memory, and it will probably deteriorate over time. You need to methodically test the memory by running memory tests, and then moving ram and testing again. Or replace the hardware if it is mission critical. -- MailScanner is like deodorant... You hope everybody uses it, and you notice quickly if they don't!!!!