Le 24/03/2011 18:30, Dave Windsor a écrit : > On 3/24/2011 12:37 PM, Alain Péan wrote: >> Le 24/03/2011 16:03, Windsor Dave L. (AdP/TEF7.1) a écrit : >>> <snipped> >>> Code: 00 00 00 00 00 00 00 00 70 4d 4f 9d 00 81 ff ff 98 e4 4b dc >>> RIP [<ffff8100dc435cf0>] >>> RSP<ffff81001529fd18> >>> CR2: ffff8100dc435cf0 >>> <0>Kernel panic - not syncing: Fatal exception >>> >>> <snipped> >>> I am trying to determine if this is pointing to a hardware or software issue. Some of the Google results suggested using a Centosplus kernel - is this a good idea? >>> >>> The server is a HP DL380 G7 Server with 4 GB RAM (1 DIMM 1333 MHz), one 4-core CPU (2133 MHz), 4 built-in Broadcom "NetExtreme II BCM5709 II Gigabit Ethernet" NICs, and a P410 Smart Array Controller. The P410 and the system BIOS have both been updated to the latest levels to see if that fixes the crashes, with no change. >>> >>> Any idea where I should look next? >>> >>> Thanks for any help anyone can provide! >>> >> The fact that it appears after two weeks or so reminds me of a bug I >> saw on linux PowerEdge mailing list, //the "blocked for more than 120 >> seconds" timeout bug. >> I don't know if your problem is related, but if it is the case you >> should see the message in your logs. >> >> Do you have any high IO load, at least at some moments, on your server ? >> >> See : >> http://lists.us.dell.com/pipermail/linux-poweredge/2011-March/044515.html >> >> In this case, using a newer kernel would be indeed it seems a good idea. >> >> See if it can help... >> >> Alain >> // > Alain, > > Today, there are not high I/O loads. This server was intended to > replace two older HP-UX servers. I had just begun to migrate the > workload to the new server when the crashes began to occur. There are > some minor, sporadic I/O loads but nothing that I would think could > trigger the bug discussed in your link. However, I haven't measured the > workload closely yet, so there could be spikes. > > Best Regards, > > *Dave Windsor* Your error message, "Kernel panic - not syncing: Fatal exception" is too generic to give any clue. Do you see other error messages in your log ? Did you run any hardware test (with Dell you have such utilities on DVD, I think they exist also on HP), to see if some hardware is failing, for example RAM ? Alain -- ========================================================== Alain Péan - LPP/CNRS Administrateur Système/Réseau Laboratoire de Physique des Plasmas - UMR 7648 Observatoire de Saint-Maur 4, av de Neptune, Bat. A 94100 Saint-Maur des Fossés Tel : 01-45-11-42-39 - Fax : 01-48-89-44-33 ==========================================================