>> >> > We are running, Centos 4.8 on SuperMicro SYS-6026T-3RF with 2xIntel
>> >> > Xeon E5630 and 8xKingston KVR1333D3D4R9S/4G
>> >> >
>> >> > For some time we have lots of MCE in mcelog and we cant find out
>> >> > the reason.
>> >>
>> >> The only thing that shows there (when it shows, since sometimes it
>> >> doesn't seem to) is a hardware error. You *WILL* be replacing
>> >> hardware, sometime soon, like yesterday.
>> <snip>
>> >> Bad news: you have *two* DIMMs failing, one associated with the
>> >> physical CPU that has core 53, and another associated with the
physical CPU
>> >> that has cores 32-35.
>> <snip, memory reseating>
>> > Now we are just waiting will there be errors again.
>>
>> I'm sure there will. Reseating the memory may have done something, but
>> there will, I'll wager.
>
> mark, you are absolutely right :) Approximetely 1h ago errors appeared.
> They appeared only once since reboot, but they r back. Hi there :(
>
> The good idea is that CPU numbers changed, so now we have cpu 1,2,3 and
> 18,19,20,21.We definetely moved "broken" modules to another slots.
> Anyway bad dimm is really a good news for us instead of e.g. motherboard.
<snip>
> Is it possible to determine which physical dimms correspond to those cpus
> noticed in mce messagees? We have two rows of slots(6 slot for each row)
> one for cpu1 and second for cpu2. Used slots marked as
> cpu1-a1,cpu1-a2,cpu1-a3,cpu1-b1 and cpu2-a1,cpu2-a2,cpu2-a3,cpu2-b1.
>
> I remeber that you adviced to divide cpu number on physical core count. We
> have 2 quad core proc, so 8 cpu. 1/8=0 Is it cpu-a1 slot or depends on
> situation? I hope we will find those bustards ourselvs but hint would be
> great.
>
> And one more thing i cant funderstand ... if there is,say, 8 "cpu numbers"
> per each memory module(in our situation), why we see only 4 numbers and
> not 8 e.g. 0,1,2,3,4,5,6,7 ?
I'm now confused about a lot: originally, you mentioned 53 - 57, was it?
That doesn't add up, since you say you have 2 quad core processors, for a
total of 8 cpus, and each of those processors have 6 banks, which would
mean each processor should only see six (directly). Where I'm confused is
how you could have cores 32-35, or 53-whatsit, when you only have 8 cores
in two processors.
2 cpu each 8 cores and HT support. So 16 at max i think. for such way is it ok?
I really lost the idea line with those cpu to memory bank mappings...