Vladimir Budnev wrote:
2011/3/22 m.roth@5-cent.us
Vladimir Budnev wrote:
2011/3/22 m.roth@5-cent.us
Vladimir Budnev wrote:
2011/3/21 m.roth@5-cent.us
Vladimir Budnev wrote: > > We are running, Centos 4.8 on SuperMicro SYS-6026T-3RF with > 2xIntel Xeon E5630 and 8xKingston KVR1333D3D4R9S/4G > > For some time we have lots of MCE in mcelog and we cant find out > the reason.
The only thing that shows there (when it shows, since sometimes it doesn't seem to) is a hardware error. You *WILL* be replacing hardware, sometime soon, like yesterday.
<snip>
We have 2 quad core proc, so 8 cpu. 1/8=0 Is it cpu-a1 slot or
depends on
situation? I hope we will find those bustards ourselvs but hint would be great.
And one more thing i cant funderstand ... if there is,say, 8 "cpu numbers" per each memory module(in our situation), why we see only 4
numbers
and not 8 e.g. 0,1,2,3,4,5,6,7 ?
I'm now confused about a lot: originally, you mentioned 53 - 57, was it? That doesn't add up, since you say you have 2 quad core processors, for a total of 8 cpus, and each of those processors have 6 banks, which would mean each processor should only see six (directly). Where I'm confused is how you could have cores 32-35, or 53-whatsit, when you only have 8 cores in two processors.
2 cpu each 8 cores and HT support. So 16 at max i think. for such way is it ok?
Huh? Above, you say "2 quad core proc" - that's 8 cores over two processor chips. HT support doesn't figure into it; if you use dmidecode or lshw, I believe it will show you 8 cores, not 16.
I really lost the idea line with those cpu to memory bank mappings...
Each processor will directly see the DIMMs associate with it, so that the banks associated with each processor will be what directly affects the cores. So, if you see something like Mar 20 05:01:35 <system name> kernel: Northbridge Error, node 0, core: 5 (these processors are 8-core), it means that one of the DIMMs in bank 0, 0-3, is bad. You should see __ |_0| 0 1 2 3 __ |_1| 0 1 2 3
or whatever on the m/b, so one of the top ones there is affected. Is that any clearer?
mark