on 16:56 Thu 24 Mar, Windsor Dave L. (AdP/TEF7) (Dave.Windsor@us.bosch.com) wrote:
On 3/24/2011 4:38 PM, Dr. Ed Morbius wrote:
Dave:
on 16:03 Thu 24 Mar, Windsor Dave L. (AdP/TEF7.1) (Dave.Windsor@us.bosch.com) wrote:
Hello Everyone,
Code: 00 00 00 00 00 00 00 00 70 4d 4f 9d 00 81 ff ff 98 e4 4b dc RIP [<ffff8100dc435cf0>] RSP<ffff81001529fd18> CR2: ffff8100dc435cf0 <0>Kernel panic - not syncing: Fatal exception
This suggests that something happened in a Samba process.
<...>
- If you haven't, configure the netconsole kernel module for kernel-enabled network logging of panics.
This is a great idea. I will work on that soonest.
It really is about four times as cool as it sounds. Getting the actual panic is hugely useful.
Call HP and find out what the latest recommended BIOS and firmware upgrades for your system are. C-STATE has been a particular issue with Dell, and its' been disabled entirely in recent BIOS versions. I see below you've updated BIOS.
Scan logs for other messages, particularly panics and/or ECC issues.
I haven't seen anything ominous, although I have noticed a long
time gap between the last entry in /var/log/messages and the actual crash. Such a gap in entries is very unusual.
You can create a "timestamp" cron job. Just a
*/10 * * * * root Logger "--- TIMESTAMP ---"
... entry. At least you'll see any long dry periods.
sar is also a useful utility to look at. It should be recording and reporting systems state and resource utilization levels prior to the crash.
If you can stand the downtime, run memtest86+ at least overnight on your RAM. A reboot indicates a failed test.
Otherwise: try running with half your RAM swapped.
Check/reseat all DIMMs, sockets, and cables. Some folks caution against this on the basis of connector wear, but if you've got a problem, this may help resolve it, and I've seen boxes shipped with components poorly or even un-cabled.
We have one DIMM of 4 GB RAM, so I can't swap it out or run with
half. I have reseated it and inspected the contacts, and it looks OK. I will look at anything else with connectors.
Actually, you can. Setting 'mem=2G' at your boot prompt will cue the kernel to use only half the RAM. Now, you can't specify an offset to use the high half, unfortunately. You could also swap the DIMM with another system if you've got it and see if you still have the problems in this one (or start seeing them in the other).