2011/3/22 <m.roth at 5-cent.us> > Vladimir Budnev wrote: > > 2011/3/22 <m.roth at 5-cent.us> > >> Vladimir Budnev wrote: > >> > 2011/3/22 <m.roth at 5-cent.us> > >> >> Vladimir Budnev wrote: > >> >> > 2011/3/21 <m.roth at 5-cent.us> > >> >> >> Vladimir Budnev wrote: > >> >> >> > > >> >> >> > We are running, Centos 4.8 on SuperMicro SYS-6026T-3RF with > >> >> >> > 2xIntel Xeon E5630 and 8xKingston KVR1333D3D4R9S/4G > >> >> >> > > >> >> >> > For some time we have lots of MCE in mcelog and we cant find out > >> >> >> > the reason. > >> >> >> > >> >> >> The only thing that shows there (when it shows, since sometimes it > >> >> >> doesn't seem to) is a hardware error. You *WILL* be replacing > >> >> >> hardware, sometime soon, like yesterday. > >> >> <snip> > >> > We have 2 quad core proc, so 8 cpu. 1/8=0 Is it cpu-a1 slot or > depends on > >> > situation? I hope we will find those bustards ourselvs but hint would > >> > be great. > >> > > >> > And one more thing i cant funderstand ... if there is,say, 8 "cpu > >> > numbers" per each memory module(in our situation), why we see only 4 > numbers > >> > and not 8 e.g. 0,1,2,3,4,5,6,7 ? > >> > >> I'm now confused about a lot: originally, you mentioned 53 - 57, was it? > >> That doesn't add up, since you say you have 2 quad core processors, for > >> a total of 8 cpus, and each of those processors have 6 banks, which > would > >> mean each processor should only see six (directly). Where I'm confused > >> is how you could have cores 32-35, or 53-whatsit, when you only have 8 > >> cores in two processors. > > > > 2 cpu each 8 cores and HT support. So 16 at max i think. for such way is > > it ok? > > Huh? Above, you say "2 quad core proc" - that's 8 cores over two processor > chips. HT support doesn't figure into it; if you use dmidecode or lshw, I > believe it will show you 8 cores, not 16. > Was a typo, sorry. 2 CPU and each one has 4 cores so totally 8 cores. > > I really lost the idea line with those cpu to memory bank mappings... > > Each processor will directly see the DIMMs associate with it, so that the > banks associated with each processor will be what directly affects the > cores. So, if you see something like > Mar 20 05:01:35 <system name> kernel: Northbridge Error, node 0, core: 5 > (these processors are 8-core), it means that one of the DIMMs in bank 0, > 0-3, is bad. > You should see > __ > |_0| 0 1 2 3 > __ > |_1| 0 1 2 3 > > or whatever on the m/b, so one of the top ones there is affected. Is that > any clearer? First of all big thnx for helping mark. In your example everything is ok. But i am lost with what we have. Previously we recieved messages like i post in the first mail: CPU 51 BANK 8 TSC 8511e3ca77dc MISC 274d587f00006141 ADDR 807044840 STATUS cc0055000001009f MCGSTATU And always there were same cpu numbers. I really dont know why do mcleog show such numbers but thats what we have.Always Bank 8 and there were 32,33,34,45 and 50,51,52,53 numbers in CPU field. You encouraged us that it is a dimm problem and we decide to make a little research which i described up the thread. During that wev replaced DIMM moduels between slots, so now we have BANK 8 and cpu 1,2,3 and 18,29,20,21. It really seems that some how those numbers connected with RAM modules. But... as i sad we have following slots CPU1 cpu1-a1 cpu1-a2 cpu1-a3 cpu1-b1 cpu1-b2 cpu1-b3 CPU2 cpu2-a1 cpu2-a2 cpu2-a3 cpu2-b1 cpu2-b2 cpu2-b3 We have modules placed in such way: +------------+------------+------------+------------+------------+------------+------------+ | | V | V | V | V | free | free | +------------+------------+------------+------------+------------+------------+------------+ | CPU1 | cpu1-a1| cpu1-a2 | cpu1-a3 | cpu1-b1 | cpu1-b2| cpu1-b3 | +------------+------------+------------+------------+------------+------------+------------+ +------------+------------+------------+------------+------------+------------+------------+ | | V | V | V | V | free | free | +------------+------------+------------+------------+------------+------------+------------+ | CPU2 | cpu2-a1| cpu2-a2 | cpu2-a3 | cpu2-b1 | cpu1-b2| cpu1-b3 | +------------+------------+------------+------------+------------+------------+------------+ Definetely there is something with memory banks,becasue replacinbg moudels changed the mce messages, but what exactly...or iv interpreted all wrong? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos/attachments/20110322/9844a2bd/attachment-0005.html>