[CentOS] Kernel Panic on HP/Compaq ProLiant G7

Thu Mar 24 17:44:04 UTC 2011
Alain Péan <alain.pean at lpp.polytechnique.fr>

Le 24/03/2011 18:30, Dave Windsor a écrit :
> On 3/24/2011 12:37 PM, Alain Péan wrote:
>> Le 24/03/2011 16:03, Windsor Dave L. (AdP/TEF7.1) a écrit :
>>> <snipped>
>>> Code: 00 00 00 00 00 00 00 00 70 4d 4f 9d 00 81 ff ff 98 e4 4b dc
>>> RIP  [<ffff8100dc435cf0>]
>>>    RSP<ffff81001529fd18>
>>> CR2: ffff8100dc435cf0
>>>    <0>Kernel panic - not syncing: Fatal exception
>>>
>>> <snipped>
>>> I am trying to determine if this is pointing to a hardware or software issue.  Some of the Google results suggested using a Centosplus kernel - is this a good idea?
>>>
>>> The server is a HP DL380 G7 Server with 4 GB RAM (1 DIMM 1333 MHz), one 4-core CPU (2133 MHz), 4 built-in Broadcom "NetExtreme II BCM5709 II Gigabit Ethernet" NICs, and a P410 Smart Array Controller.  The P410 and the system BIOS have both been updated to the latest levels to see if that fixes the crashes, with no change.
>>>
>>> Any idea where I should look next?
>>>
>>> Thanks for any help anyone can provide!
>>>
>> The fact that it appears after two weeks or so reminds me of a bug I
>> saw on linux PowerEdge mailing list, //the "blocked for more than 120
>> seconds" timeout bug.
>> I don't know if your problem is related, but if it is the case you
>> should see the message in your logs.
>>
>> Do you have any high IO load, at least at some moments, on your server ?
>>
>> See :
>> http://lists.us.dell.com/pipermail/linux-poweredge/2011-March/044515.html
>>
>> In this case, using a newer kernel would be indeed it seems a good idea.
>>
>> See if it can help...
>>
>> Alain
>> //
> Alain,
>
> Today, there are not high I/O loads.  This server was intended to
> replace two older HP-UX servers.  I had just begun to migrate the
> workload to the new server when the crashes began to occur.  There are
> some minor, sporadic I/O loads but nothing that I would think could
> trigger the bug discussed in your link.  However, I haven't measured the
> workload closely yet, so there could be spikes.
>
> Best Regards,
>
> *Dave Windsor*

Your error message, "Kernel panic - not syncing: Fatal exception" is too 
generic to give any clue. Do you see other error messages in your log ?

Did you run any hardware test (with Dell you have such utilities on DVD, 
I think they exist also on HP), to see if some hardware is failing, for 
example RAM ?

Alain

-- 
==========================================================
Alain Péan - LPP/CNRS
Administrateur Système/Réseau
Laboratoire de Physique des Plasmas - UMR 7648
Observatoire de Saint-Maur
4, av de Neptune, Bat. A
94100 Saint-Maur des Fossés
Tel : 01-45-11-42-39 - Fax : 01-48-89-44-33
==========================================================