On 6/20/08, Alwin Roosen <alwin.roosen@webline.be> wrote:
Hi,


CentOS release 5 (Final)
Kernel 2.6.18-53.1.21.el5 on an i686

ws174 login: CPU 1: Machine Check Exception: 0000000000000005
CPU 0: Machine Check Exception: 0000000000000004
Bank 3: f62000020002010a at 0000000032c93500
Bank 5: f20000300c000e0f
Kernel panic - not syncing: CPU context corrupt
Bank 3: f62000020002010a



Alwin -->

I would be very, very "surprised" *IF* this wasn't hardware
related.

Dave Jones wrote a nice little program to help decode this:

$ parsemce -b 3 -s f62000020002010a -e 5 -a 0000000032c93500
Status: (5) Machine Check in progress.
Restart IP valid.
parsebank(3): f62000020002010a @ 32c93500
        External tag parity error
        CPU state corrupt. Restart not possible
        Address in addr register valid
        Error enabled in control register
        Error not corrected.
        Error overflow
        Memory hierarchy error
        Request: Generic error
        Transaction type : Generic
        Memory/IO : I/O

and:

$ parsemce -b 5 -s f20000300c000e0f -e 4 -a 0
Status: (4) Machine Check in progress.
Restart IP invalid.
parsebank(5): f20000300c000e0f @ 0
        External tag parity error
        CPU state corrupt. Restart not possible
        Error enabled in control register
        Error not corrected.
        Error overflow
        Bus and interconnect error
        Participation: Generic
        Timeout: Request did not timeout
        Request: Generic error
        Transaction type : Invalid
        Memory/IO : Other


Dag's Repo has the new memtest86+ 2.01 RPM.  I'd pull it and
let it run overnight.  While memtest86+ is good, I've recently had
cases where is didn't find (obvious) memory errors.

I've also seen things like SATA disks drive cause MCEs.

This one looks like you're taking memory parity errors somewhere
in the path to the CPU.  On you BIOS, check you Events log for
any "interesting" entries, too.

Hope this helps ...

   -rak-