[CentOS] how to debug hardware lockups?

Sat Nov 15 19:59:54 UTC 2008
Rudi Ahlers <rudiahlers at gmail.com>

On Sat, Nov 15, 2008 at 8:17 PM, nate <centos at linuxpowered.net> wrote:
> Rudi Ahlers wrote:
>
>> Unfortunately, I can't leave a monitor attached to the server all the
>> time. The server is in a shared cabinet @ a 3rd party ISP, and they
>> lock the cabinets once we're done working with it. The last lockup was
>> about 6 days ago, and previous one about 8 days ago. There's no
>> consitancy.
>>
>> How can I redirect all console output to a file instead?
>
> Configure a serial console, connect the console to another
> system and use something like minicom to log the console to a file.
> You can't really log to the local system in this situation as
> you likely won't capture the event(if you did you would of
> seen the error in the system logs)
>
> In my experience most of these kinds of problems are related
> to bad ram.
>
> If your running CentOS 4.x configure netdump to send the kernel
> dumps to another server, if your using CentOS 5.x configure
> diskdump(?) to store the dump to local disk.
>
> Run memtest86 on the system for a few days, replace the system
> with a known working one so you can take the broken system off
> site from the ISP for diagnostics.
>
> I like running cerberus http://sourceforge.net/projects/va-ctcs/
> as a burn-in tool, if the system can survive that running for
> a couple days it should be good. In running against a hundred or
> so systems I don't recall it taking longer than a few hours
> to crash the system if there was a problem.
>
> nate
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>

That machine doesn't have a serial port (why do vendors think serial
ports are obsolete????), so is there any other way to send to logs to
a different machine then?

-- 

Kind Regards
Rudi Ahlers