[CentOS] Kernel Panic on HP/Compaq ProLiant G7

Thu Mar 24 17:30:01 UTC 2011
Dave Windsor <Dave.Windsor at us.bosch.com>

On 3/24/2011 12:37 PM, Alain Péan wrote:
> Le 24/03/2011 16:03, Windsor Dave L. (AdP/TEF7.1) a écrit :
>> <snipped>
>> Code: 00 00 00 00 00 00 00 00 70 4d 4f 9d 00 81 ff ff 98 e4 4b dc
>> RIP  [<ffff8100dc435cf0>]
>>   RSP<ffff81001529fd18>
>> CR2: ffff8100dc435cf0
>>   <0>Kernel panic - not syncing: Fatal exception
>>
>> <snipped>
>> I am trying to determine if this is pointing to a hardware or software issue.  Some of the Google results suggested using a Centosplus kernel - is this a good idea?
>>
>> The server is a HP DL380 G7 Server with 4 GB RAM (1 DIMM 1333 MHz), one 4-core CPU (2133 MHz), 4 built-in Broadcom "NetExtreme II BCM5709 II Gigabit Ethernet" NICs, and a P410 Smart Array Controller.  The P410 and the system BIOS have both been updated to the latest levels to see if that fixes the crashes, with no change.
>>
>> Any idea where I should look next?
>>
>> Thanks for any help anyone can provide!
>>
>
> The fact that it appears after two weeks or so reminds me of a bug I 
> saw on linux PowerEdge mailing list, //the "blocked for more than 120 
> seconds" timeout bug.
> I don't know if your problem is related, but if it is the case you 
> should see the message in your logs.
>
> Do you have any high IO load, at least at some moments, on your server ?
>
> See :
> http://lists.us.dell.com/pipermail/linux-poweredge/2011-March/044515.html
>
> In this case, using a newer kernel would be indeed it seems a good idea.
>
> See if it can help...
>
> Alain
> //
> -- 
> ==========================================================
> Alain Péan - LPP/CNRS
> Administrateur Système/Réseau
> Laboratoire de Physique des Plasmas - UMR 7648
> Observatoire de Saint-Maur
> 4, av de Neptune, Bat. A
> 94100 Saint-Maur des Fossés
> Tel : 01-45-11-42-39 - Fax : 01-48-89-44-33
> ==========================================================
Alain,

Today, there are not high I/O loads.  This server was intended to 
replace two older HP-UX servers.  I had just begun to migrate the 
workload to the new server when the crashes began to occur.  There are 
some minor, sporadic I/O loads but nothing that I would think could 
trigger the bug discussed in your link.  However, I haven't measured the 
workload closely yet, so there could be spikes.

Best Regards,

*Dave Windsor*

Robert Bosch LLC
Team Leader, MES Database Infrastructure Group (AdP/TEF7.1)
4421 Highway 81 North
Anderson, SC 29621 USA
_www.bosch.us _ <http://www.bosch.us>

Tel:  1 (864) 260-8459
Fax: 1 (864) 260-8422
_Dave.Windsor at us.bosch.com_ <mailto:Dave.Windsor at us.bosch.com>