[CentOS] Cant find out MCE reason (CPU 35 BANK 8)

Tue Mar 22 15:50:35 UTC 2011
m.roth at 5-cent.us <m.roth at 5-cent.us>

Vladimir Budnev wrote:
> 2011/3/22 <m.roth at 5-cent.us>
>> Vladimir Budnev wrote:
>> > 2011/3/22 <m.roth at 5-cent.us>
>> >> Vladimir Budnev wrote:
>> >> > 2011/3/22 <m.roth at 5-cent.us>
>> >> >> Vladimir Budnev wrote:
>> >> >> > 2011/3/22 <m.roth at 5-cent.us>
>> >> >> >> Vladimir Budnev wrote:
>> >> >> >> > 2011/3/22 <m.roth at 5-cent.us>
>> >> >> >> >> Vladimir Budnev wrote:
>> >> >> >> >> > 2011/3/21 <m.roth at 5-cent.us>
>> >> >> >> >> >> Vladimir Budnev wrote:
>> >> >> >> >> >> >
>> >> >> >> >> >> > We are running, Centos 4.8 on SuperMicro SYS-6026T-3RF
>> >> >> << >> >> > with 2xIntel Xeon E5630 and 8xKingston KVR1333D3D4R9S/4G
>> >> >> >> >> >> >
>> >> The next thing you should do, if you don't have them, is go to
>> >> <http://www.supermicro.com/support/manuals/> and d/l the manual, and
>> >> see what it says about DIMMs.
>> >
>> > If you meaned to check whether those DIMM modules a compatible with
>> > motherboard , its ok. Kingstin KVR1333D3D4R9S is in tested list
>> >
>> http://www.supermicro.com/support/resources/memory/display.cfm?sz=4.0&mspd=1.333&mtyp=33&id=89A8A9B9E45453813BB99586F1BAE93F
>> >
>> No, what you need to see is a) whether what you did was valid (for the
>> Supermicro m/b on the server I'm working on right now, the manual says
>> the a-banks must *ALWAYS* be populated...), and b) you might find some
>> troubleshooting info to help you identify which DIMMs are the problem.
>
> Roger that. Our bad :(

Std. sysadmin reply: RTFM! <g>
>
>> > And can you say something about cpu wild numbers and determing which
>> > dimms are bugged? didnt you mean some post ago that on x core system
we must
>> > divide cpu value on core numbers to get DIMM slot? e.g. CPU 32/8 cores
>> ->4 slot?
<snip>
>> So with 2 4-core Xeons, I don't understand how you can get 3x and 5x.
>> Could you post some raw messages, either from /var/log/message or from
>> /var/log/mcelog?
>>
>
> sure here they are before "night party":
> MCE 24
> CPU 52 BANK 8 TSC 372a290717a
> MISC 68651f800001186 ADDR 7dd2ad840
> STATUS cc0002800001009f MCGSTATUS 0
> MCE 25
<snip>
At this point, I throw up my hands. I have *no* idea how they could get
numbers like CPU 52, unless something's wrong in the o/s - I mean, you are
running 64 bit, right?

          mark