[CentOS] Server spontaneously rebooting under RHEL-4

Wed Mar 29 12:08:16 UTC 2006
Benjamin J. Weiss <benjamin at birdvet.org>

James Olin Oden wrote:

>On 3/28/06, BRUCE STANLEY <bruce.stanley at prodigy.net> wrote:
><snip>
>  
>
>>There could even be a simpler reason for this problem.
>>We had a server do this very thing under  REHL-3  and it
>>turned out to be hardware related.
>>
>>The servive technicians came in and reset the the memory, CPU,
>>replaced the CPU fan, and reset the bios.
>>
>>    
>>
>One thing that I have seen occur more often with 2.6 kernels is
>catching of MCE's (Machine Check Exceptions).  The MCE's are the
>processors way of saying something is extremely wrong that it can
>detect.   This typically will cause a panic though not causing a
>reboot.  OTH, If your hardware also has support for a watchdog then
>shortly after the panic a reboot would occur.
>
>I'm not saying that this is what is actually happening, but just that
>along the lines of what has been said thus far, this would make sense.
> If indeed this is the case, maybe the panic output is in
>/var/log/messages.
>
>Cheers...james
>  
>
Well, so far it looks like something is wrong with our memory 
subsystem.  I updated all the BIOS' and ran Smart Disk diagnostics.  I'm 
getting an ECC error on module 4, whether I have RAM in the slot or not!

We're calling HP support, I'm sure we'll be able to get it fixed.

Thanks, all!

Ben