On 6/20/08, Alwin Roosen alwin.roosen@webline.be wrote:
Hi,
CentOS release 5 (Final) Kernel 2.6.18-53.1.21.el5 on an i686
ws174 login: CPU 1: Machine Check Exception: 0000000000000005 CPU 0: Machine Check Exception: 0000000000000004 Bank 3: f62000020002010a at 0000000032c93500 Bank 5: f20000300c000e0f Kernel panic - not syncing: CPU context corrupt Bank 3: f62000020002010a
Alwin -->
I would be very, very "surprised" *IF* this wasn't hardware related.
Dave Jones wrote a nice little program to help decode this:
$ parsemce -b 3 -s f62000020002010a -e 5 -a 0000000032c93500 Status: (5) Machine Check in progress. Restart IP valid. parsebank(3): f62000020002010a @ 32c93500 External tag parity error CPU state corrupt. Restart not possible Address in addr register valid Error enabled in control register Error not corrected. Error overflow Memory hierarchy error Request: Generic error Transaction type : Generic Memory/IO : I/O
and:
$ parsemce -b 5 -s f20000300c000e0f -e 4 -a 0 Status: (4) Machine Check in progress. Restart IP invalid. parsebank(5): f20000300c000e0f @ 0 External tag parity error CPU state corrupt. Restart not possible Error enabled in control register Error not corrected. Error overflow Bus and interconnect error Participation: Generic Timeout: Request did not timeout Request: Generic error Transaction type : Invalid Memory/IO : Other
Dag's Repo has the new memtest86+ 2.01 RPM. I'd pull it and let it run overnight. While memtest86+ is good, I've recently had cases where is didn't find (obvious) memory errors.
I've also seen things like SATA disks drive cause MCEs.
This one looks like you're taking memory parity errors somewhere in the path to the CPU. On you BIOS, check you Events log for any "interesting" entries, too.
Hope this helps ...
-rak-