[CentOS] Cant find out MCE reason (CPU 35 BANK 8)

Tue Mar 22 15:40:05 UTC 2011
Vladimir Budnev <vladimir.budnev at gmail.com>

2011/3/22 <m.roth at 5-cent.us>

> Vladimir Budnev wrote:
> > 2011/3/22 <m.roth at 5-cent.us>
> >
> >> Vladimir Budnev wrote:
> >> > 2011/3/22 <m.roth at 5-cent.us>
> >> >> Vladimir Budnev wrote:
> >> >> > 2011/3/22 <m.roth at 5-cent.us>
> >> >> >> Vladimir Budnev wrote:
> >> >> >> > 2011/3/22 <m.roth at 5-cent.us>
> >> >> >> >> Vladimir Budnev wrote:
> >> >> >> >> > 2011/3/21 <m.roth at 5-cent.us>
> >> >> >> >> >> Vladimir Budnev wrote:
> >> >> >> >> >> >
> >> >> >> >> >> > We are running, Centos 4.8 on SuperMicro SYS-6026T-3RF
> >> with
> >> >> >> >> >> > 2xIntel Xeon E5630 and 8xKingston KVR1333D3D4R9S/4G
> >> >> >> >> >> >
> >> The next thing you should do, if you don't have them, is go to
> >> <http://www.supermicro.com/support/manuals/> and d/l the manual, and
> see
> >> what it says about DIMMs.
> >
> > If you meaned to check whether those DIMM modules a compatible with
> mother
> > board , its ok. Kingstin KVR1333D3D4R9S is in tested list
> >
> http://www.supermicro.com/support/resources/memory/display.cfm?sz=4.0&mspd=1.333&mtyp=33&id=89A8A9B9E45453813BB99586F1BAE93F
> >
> No, what you need to see is a) whether what you did was valid (for the
> Supermicro m/b on the server I'm working on right now, the manual says the
> a-banks must *ALWAYS* be populated...), and b) you might find some
> troubleshooting info to help you identify which DIMMs are the problem.
>

Roger that. Our bad :(


> > And can you say something about cpu wild numbers and determing which
> dimms
> > are bugged? didnt you mean some post ago that on x core system we must
> > divide cpu value on core numbers to get DIMM slot? e.g. CPU 32/8 cores
> ->4
> > slot?
>
> Nope. From your original post:
> > > > One more interesting thins is the following output:
> > > [root at zuno]# cat /var/log/mcelog |grep CPU|sort|awk '{print $2}'|uniq
> > > 32
> > > 33
> > > 34
> > > 35
> > > 50
> > > 51
> > > 52
> > > 53
>
> So with 2 4-core Xeons, I don't understand how you can get 3x and 5x.
> Could you post some raw messages, either from /var/log/message or from
> /var/log/mcelog?
>

sure here they are before "night party":
MCE 24
CPU 52 BANK 8 TSC 372a290717a
MISC 68651f800001186 ADDR 7dd2ad840
STATUS cc0002800001009f MCGSTATUS 0
MCE 25
CPU 32 BANK 8 TSC 372a29073cb
MISC 68651f800001186 ADDR 7dd2ad840
STATUS cc0002800001009f MCGSTATUS 0
MCE 26
CPU 50 BANK 8 TSC 372a29064ca
MISC 68651f800001186 ADDR 7dd2ad840
STATUS cc0002800001009f MCGSTATUS 0
MCE 27
CPU 33 BANK 8 TSC 372a2907e5c
MISC 68651f800001186 ADDR 7dd2ad840
STATUS cc0002800001009f MCGSTATUS 0
MCE 28
CPU 35 BANK 8 TSC 372a29088f1
MISC 68651f800001186 ADDR 7dd2ad840
STATUS cc0002800001009f MCGSTATUS 0
MCE 29
CPU 53 BANK 8 TSC 372a2908e82
MISC 68651f800001186 ADDR 7dd2ad840
STATUS cc0002800001009f MCGSTATUS 0
MCE 30
CPU 51 BANK 8 TSC 372a290899f
MISC 68651f800001186 ADDR 7dd2ad840
STATUS cc0002800001009f MCGSTATUS 0
MCE 31
CPU 34 BANK 8 TSC 423243c7aa5
MISC 2275a96d0000098f ADDR 7e7540ac0
STATUS cc001f000001009f MCGSTATUS 0


and here after:

MCE 0
CPU 18 BANK 8 TSC 608709adcc62
MISC c6673a0400001181 ADDR 2f4cf4f40
STATUS cc0000800001009f MCGSTATUS 0
MCE 1
CPU 2 BANK 8 TSC 608709adcbcb
MISC c6673a0400001181 ADDR 2f4cf4f40
STATUS cc0000800001009f MCGSTATUS 0
MCE 2
CPU 20 BANK 8 TSC 608709adcb59
MISC c6673a0400001181 ADDR 2f4cf4f40
STATUS cc0000800001009f MCGSTATUS 0
MCE 3
CPU 1 BANK 8 TSC 608709add9b0
MISC c6673a0400001181 ADDR 2f4cf4f40
STATUS cc0000800001009f MCGSTATUS 0
MCE 4
CPU 3 BANK 8 TSC 608709ade3ab
MISC c6673a0400001181 ADDR 2f4cf4f40
STATUS cc0000800001009f MCGSTATUS 0
MCE 5
CPU 19 BANK 8 TSC 608709ade850
MISC c6673a0400001181 ADDR 2f4cf4f40
STATUS cc0000800001009f MCGSTATUS 0
MCE 6
CPU 21 BANK 8 TSC 608709ade4ea
MISC c6673a0400001181 ADDR 2f4cf4f40
STATUS cc0000800001009f MCGSTATUS 0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/centos/attachments/20110322/638fa511/attachment-0005.html>