[CentOS] Re: how to debug hardware lockups?

Tue Nov 18 00:50:00 UTC 2008
Scott Silva <ssilva at sgvwater.com>

on 11-15-2008 11:59 AM Rudi Ahlers spake the following:
> On Sat, Nov 15, 2008 at 8:17 PM, nate <centos-T6AQWPvKiI1cRAk/VAjCeQ at public.gmane.org> wrote:
>> Rudi Ahlers wrote:
>>
>>> Unfortunately, I can't leave a monitor attached to the server all the
>>> time. The server is in a shared cabinet @ a 3rd party ISP, and they
>>> lock the cabinets once we're done working with it. The last lockup was
>>> about 6 days ago, and previous one about 8 days ago. There's no
>>> consitancy.
>>>
>>> How can I redirect all console output to a file instead?
>> Configure a serial console, connect the console to another
>> system and use something like minicom to log the console to a file.
>> You can't really log to the local system in this situation as
>> you likely won't capture the event(if you did you would of
>> seen the error in the system logs)
>>
>> In my experience most of these kinds of problems are related
>> to bad ram.
>>
>> If your running CentOS 4.x configure netdump to send the kernel
>> dumps to another server, if your using CentOS 5.x configure
>> diskdump(?) to store the dump to local disk.
>>
>> Run memtest86 on the system for a few days, replace the system
>> with a known working one so you can take the broken system off
>> site from the ISP for diagnostics.
>>
>> I like running cerberus http://sourceforge.net/projects/va-ctcs/
>> as a burn-in tool, if the system can survive that running for
>> a couple days it should be good. In running against a hundred or
>> so systems I don't recall it taking longer than a few hours
>> to crash the system if there was a problem.
>>
>> nate
>>
>> _______________________________________________
>> CentOS mailing list
>> CentOS at centos.org
>> http://lists.centos.org/mailman/listinfo/centos
>>
> 
> That machine doesn't have a serial port (why do vendors think serial
> ports are obsolete????), so is there any other way to send to logs to
> a different machine then?
> 
Does it have any out of bandwidth management like Dell's drac or HP's ILO?


-- 
MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
URL: <http://lists.centos.org/pipermail/centos/attachments/20081117/4726596d/attachment-0004.sig>