Vladimir Budnev wrote:
Hello community.
We are running, Centos 4.8 on SuperMicro SYS-6026T-3RF with 2xIntel Xeon E5630 and 8xKingston KVR1333D3D4R9S/4G
For some time we have lots of MCE in mcelog and we cant find out the reason.
The only thing that shows there (when it shows, since sometimes it doesn't seem to) is a hardware error. You *WILL* be replacing hardware, sometime soon, like yesterday.
"Normal" is not: *ANYTHING* here is Bad News. First, you've got DIMMs failing. CPU 53, assuming this system doesn't have 53+ physical CPUs, means that you have x-core systems, so you need to divide by x, so that if it's a 12-core system with 6 physical chips, that would make it DIMM 8 associated with that physical CPU. <snip>
One more interesting thins is the following output: [root@zuno]# cat /var/log/mcelog |grep CPU|sort|awk '{print $2}'|uniq 32 33 34 35 50 51 52 53
Those numbers are always the same.
Bad news: you have *two* DIMMs failing, one associated with the physical CPU that has core 53, and another associated with the physical CPU that has cores 32-35.
Talk to your OEM support to help identify which banks need replacing, and/or find a motherboard diagram.
mark, who has to deal *again* with one machine with the same problem....