Vladimir Budnev wrote: > 2011/3/22 <m.roth at 5-cent.us> >> Vladimir Budnev wrote: >> > 2011/3/21 <m.roth at 5-cent.us> >> >> Vladimir Budnev wrote: >> >> > >> >> > We are running, Centos 4.8 on SuperMicro SYS-6026T-3RF with 2xIntel >> >> > Xeon E5630 and 8xKingston KVR1333D3D4R9S/4G >> >> > >> >> > For some time we have lots of MCE in mcelog and we cant find out >> >> > the reason. >> >> >> >> The only thing that shows there (when it shows, since sometimes it >> >> doesn't seem to) is a hardware error. You *WILL* be replacing >> >> hardware, sometime soon, like yesterday. >> <snip> >> >> Bad news: you have *two* DIMMs failing, one associated with the >> >> physical CPU that has core 53, and another associated with the physical CPU >> >> that has cores 32-35. >> <snip, memory reseating> >> > Now we are just waiting will there be errors again. >> >> I'm sure there will. Reseating the memory may have done something, but >> there will, I'll wager. > > mark, you are absolutely right :) Approximetely 1h ago errors appeared. > They appeared only once since reboot, but they r back. Hi there :( > > The good idea is that CPU numbers changed, so now we have cpu 1,2,3 and > 18,19,20,21.We definetely moved "broken" modules to another slots. > Anyway bad dimm is really a good news for us instead of e.g. motherboard. <snip> > Is it possible to determine which physical dimms correspond to those cpus > noticed in mce messagees? We have two rows of slots(6 slot for each row) > one for cpu1 and second for cpu2. Used slots marked as > cpu1-a1,cpu1-a2,cpu1-a3,cpu1-b1 and cpu2-a1,cpu2-a2,cpu2-a3,cpu2-b1. > > I remeber that you adviced to divide cpu number on physical core count. We > have 2 quad core proc, so 8 cpu. 1/8=0 Is it cpu-a1 slot or depends on > situation? I hope we will find those bustards ourselvs but hint would be > great. > > And one more thing i cant funderstand ... if there is,say, 8 "cpu numbers" > per each memory module(in our situation), why we see only 4 numbers and > not 8 e.g. 0,1,2,3,4,5,6,7 ? I'm now confused about a lot: originally, you mentioned 53 - 57, was it? That doesn't add up, since you say you have 2 quad core processors, for a total of 8 cpus, and each of those processors have 6 banks, which would mean each processor should only see six (directly). Where I'm confused is how you could have cores 32-35, or 53-whatsit, when you only have 8 cores in two processors. > >> Here's a question out of left field: who was the manufacturer of the 4G >> DIMMs? Not Supermicro, but the DIMMs themselves? >> > This is Kingston KVR1333D3D4R9S/4G if i got the question Oh, ok. I was wondering if they were Hynix - I've seen a good number of bad 4G and 8G DIMMs from them recently, and that across three different OEMs and model DIMMs. mark