On 6/20/08, Alwin Roosen <alwin.roosen@webline.be> wrote:
Hi,
CentOS release 5 (Final)
Kernel 2.6.18-53.1.21.el5 on an i686
ws174 login: CPU 1: Machine Check Exception: 0000000000000005
CPU 0: Machine Check Exception: 0000000000000004
Bank 3: f62000020002010a at 0000000032c93500
Bank 5: f20000300c000e0f
Kernel panic - not syncing: CPU context corrupt
Bank 3: f62000020002010a
Alwin -->
I would be very, very "surprised" *IF* this wasn't hardware
related.
Dave Jones wrote a nice little program to help decode this:
$ parsemce -b 3 -s f62000020002010a -e 5 -a 0000000032c93500
Status: (5) Machine Check in progress.
Restart IP valid.
parsebank(3): f62000020002010a @ 32c93500
External tag parity error
CPU state corrupt. Restart not possible
Address in addr register valid
Error enabled in control register
Error not corrected.
Error overflow
Memory hierarchy error
Request: Generic error
Transaction type : Generic
Memory/IO : I/O
and:
$ parsemce -b 5 -s f20000300c000e0f -e 4 -a 0
Status: (4) Machine Check in progress.
Restart IP invalid.
parsebank(5): f20000300c000e0f @ 0
External tag parity error
CPU state corrupt. Restart not possible
Error enabled in control register
Error not corrected.
Error overflow
Bus and interconnect error
Participation: Generic
Timeout: Request did not timeout
Request: Generic error
Transaction type : Invalid
Memory/IO : Other
Dag's Repo has the new memtest86+ 2.01 RPM. I'd pull it and
let it run overnight. While memtest86+ is good, I've recently had
cases where is didn't find (obvious) memory errors.
I've also seen things like SATA disks drive cause MCEs.
This one looks like you're taking memory parity errors somewhere
in the path to the CPU. On you BIOS, check you Events log for
any "interesting" entries, too.
Hope this helps ...
-rak-