Vladimir Budnev wrote: > 2011/3/21 <m.roth at 5-cent.us> >> Vladimir Budnev wrote: >> > Hello community. >> > >> > We are running, Centos 4.8 on SuperMicro SYS-6026T-3RF with 2xIntel >> > Xeon E5630 and 8xKingston KVR1333D3D4R9S/4G >> > >> > For some time we have lots of MCE in mcelog and we cant find out the >> > reason. >> >> The only thing that shows there (when it shows, since sometimes it >> doesn't seem to) is a hardware error. You *WILL* be replacing hardware, sometime >> soon, like yesterday. <snip> >> Bad news: you have *two* DIMMs failing, one associated with the physical >> CPU that has core 53, and another associated with the physical CPU that >> has cores 32-35. <snip> > Last night we'v made some research to find out which RAM modules bugged. > > To be noticed we have 8 modules 4G each. <snip> > Finally we'v placed last 2 modules...and no errors. It should be noticed > that at that step we have exactly the same modules placement as before > experiment. > > Sounds strange, but at first glance looks like smthg was wrong with > modules placement. But we cant realise why the problem didnt show for the first > days, even month of server running. Noone touched server HW, so i have no > idea what was that. > > Now we are just waiting will there be errors again. I'm sure there will. Reseating the memory may have done something, but there will, I'll wager. Here's a question out of left field: who was the manufacturer of the 4G DIMMs? Not Supermicro, but the DIMMs themselves? mark