Thanks Guys!
Your advice helped me fix the problem.
Yes, it was the motherboard that was the issue. I update the firmware and must have had some microcode fixes to support my CPU (John mentioned the memory controller is in the CPU for Xeon 5500).
Now upon reboot using 2.6.18-194.3.1.el5 no errors are found in mcelog.
I will do some further testing, but think that I'm in the clear.
Thank you so much! I spent hours googling trying to find a solution to this, couldn't find the error reported anywhere else. Glad to have some people I can turn to for advice.
All the best, eric
Tsuyoshi Nagata wrote:
Hi! Eric (2010/06/22 13:11), Eric Deis wrote:
Transaction: Address/Command error
Its mother board (memory controller) problem. Its *not* DIMM problem.(memtest can't detect this error.) your data transfer(read/write) sometimes met bit errors. This is Nehalem cpu's error detecting feature.(MCE)
Try new mother board, or your MB always indicates this error in latest kernel, Its time to buy certified vendors hardware.
Supermicro's MB is not certified hardware, but she just indicates hardware problem.