2011/3/22 <m.roth at 5-cent.us> > Vladimir Budnev wrote: > > 2011/3/22 <m.roth at 5-cent.us> > >> Vladimir Budnev wrote: > >> > 2011/3/21 <m.roth at 5-cent.us> > >> >> Vladimir Budnev wrote: > >> >> > > >> >> > We are running, Centos 4.8 on SuperMicro SYS-6026T-3RF with 2xIntel > >> >> > Xeon E5630 and 8xKingston KVR1333D3D4R9S/4G > >> >> > > >> >> > For some time we have lots of MCE in mcelog and we cant find out > >> >> > the reason. > >> >> > >> >> The only thing that shows there (when it shows, since sometimes it > >> >> doesn't seem to) is a hardware error. You *WILL* be replacing > >> >> hardware, sometime soon, like yesterday. > >> <snip> > >> >> Bad news: you have *two* DIMMs failing, one associated with the > >> >> physical CPU that has core 53, and another associated with the > physical CPU > >> >> that has cores 32-35. > >> <snip, memory reseating> > >> > Now we are just waiting will there be errors again. > >> > >> I'm sure there will. Reseating the memory may have done something, but > >> there will, I'll wager. > > > > mark, you are absolutely right :) Approximetely 1h ago errors appeared. > > They appeared only once since reboot, but they r back. Hi there :( > > > > The good idea is that CPU numbers changed, so now we have cpu 1,2,3 and > > 18,19,20,21.We definetely moved "broken" modules to another slots. > > Anyway bad dimm is really a good news for us instead of e.g. > motherboard. > <snip> > > Is it possible to determine which physical dimms correspond to those cpus > > noticed in mce messagees? We have two rows of slots(6 slot for each row) > > one for cpu1 and second for cpu2. Used slots marked as > > cpu1-a1,cpu1-a2,cpu1-a3,cpu1-b1 and cpu2-a1,cpu2-a2,cpu2-a3,cpu2-b1. > > > > I remeber that you adviced to divide cpu number on physical core count. > We > > have 2 quad core proc, so 8 cpu. 1/8=0 Is it cpu-a1 slot or depends on > > situation? I hope we will find those bustards ourselvs but hint would be > > great. > > > > And one more thing i cant funderstand ... if there is,say, 8 "cpu > numbers" > > per each memory module(in our situation), why we see only 4 numbers and > > not 8 e.g. 0,1,2,3,4,5,6,7 ? > > I'm now confused about a lot: originally, you mentioned 53 - 57, was it? > That doesn't add up, since you say you have 2 quad core processors, for a > total of 8 cpus, and each of those processors have 6 banks, which would > mean each processor should only see six (directly). Where I'm confused is > how you could have cores 32-35, or 53-whatsit, when you only have 8 cores > in two processors. > 2 cpu each 8 cores and HT support. So 16 at max i think. for such way is it ok? I really lost the idea line with those cpu to memory bank mappings... > > >> Here's a question out of left field: who was the manufacturer of the 4G > >> DIMMs? Not Supermicro, but the DIMMs themselves? > >> > > This is Kingston KVR1333D3D4R9S/4G if i got the question > > Oh, ok. I was wondering if they were Hynix - I've seen a good number of > bad 4G and 8G DIMMs from them recently, and that across three different > OEMs and model DIMMs. > > mark > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos/attachments/20110322/b66e945a/attachment-0005.html>