On 3/24/2011 4:38 PM, Dr. Ed Morbius wrote: > Dave: > > on 16:03 Thu 24 Mar, Windsor Dave L. (AdP/TEF7.1) (Dave.Windsor at us.bosch.com) wrote: >> Hello Everyone, >> >> Code: 00 00 00 00 00 00 00 00 70 4d 4f 9d 00 81 ff ff 98 e4 4b dc >> RIP [<ffff8100dc435cf0>] >> RSP<ffff81001529fd18> >> CR2: ffff8100dc435cf0 >> <0>Kernel panic - not syncing: Fatal exception >> >> >> This suggests that something happened in a Samba process. > > Correct. > > If this is regularly happening in Samba, that would point to a problem > with your samba config (either on that host, something remotely stuffing > bad packets at you, or likley in that case, both, as bad data shouldn't > crash the host). I can have have network analyst monitor the ports for unusual bursts of traffic, although that might not catch small amounts of strange data. > > If this is happening in different programs over time, then the problem > is likely /not/ software, but hardware/firmware. > > The LKML may be able to help you on your panic; please read their bug > posting guidelines /BEFORE/ posting. > >> I have the Samba3x packages installed since we are beginning to >> introduce Win7 clients into our environment. > > What happens if you take the Win7 clients away? > >> Googling "Kernel panic - not syncing: Fatal exception" and "CentOS" > > That is the generic kernel panic message. It's going to be > spectacularly unspecific. > >> produced many hits, but nothing that seemed to exactly match my >> problem. Since this is the only G7 server I have here right now, I >> can't reproduce the problem on another machine. The G6s I have >> running the identical version of CentOS have no problems. >> >> I am trying to determine if this is pointing to a hardware or software >> issue. Some of the Google results suggested using a Centosplus kernel >> - is this a good idea? > > Dell have had numerous issues with recent server editions, it's possible > HP are as well: > > - If you haven't, configure the netconsole kernel module for > kernel-enabled network logging of panics. This is a great idea. I will work on that soonest. > > - Call HP and find out what the latest recommended BIOS and firmware > upgrades for your system are. C-STATE has been a particular issue > with Dell, and its' been disabled entirely in recent BIOS versions. > I see below you've updated BIOS. > > - Scan logs for other messages, particularly panics and/or ECC issues. I haven't seen anything ominous, although I have noticed a long time gap between the last entry in /var/log/messages and the actual crash. Such a gap in entries is very unusual. > > - If you can stand the downtime, run memtest86+ at least overnight on > your RAM. A reboot indicates a failed test. > > - Otherwise: try running with half your RAM swapped. > > - Check/reseat all DIMMs, sockets, and cables. Some folks caution > against this on the basis of connector wear, but if you've got a > problem, this may help resolve it, and I've seen boxes shipped with > components poorly or even un-cabled. We have one DIMM of 4 GB RAM, so I can't swap it out or run with half. I have reseated it and inspected the contacts, and it looks OK. I will look at anything else with connectors. > > - Does a similarly equipped system exhibit the same problems? > >> The server is a HP DL380 G7 Server with 4 GB RAM (1 DIMM 1333 MHz), >> one 4-core CPU (2133 MHz), 4 built-in Broadcom "NetExtreme II BCM5709 >> II Gigabit Ethernet" NICs, and a P410 Smart Array Controller. The >> P410 and the system BIOS have both been updated to the latest levels >> to see if that fixes the crashes, with no change. > > Ugh. Broadcom's gotten better but I prefer Intel NICs. Can't speak to > the others. And OK, you've updated BIOS. > Thanks for your help! Best Regards, Dave Windsor Robert Bosch LLC Team Leader, MES Database Infrastructure Group (AdP/TEF7.1) 4421 Highway 81 North Anderson, SC 29621 USA www.bosch.us Tel: 1 (864) 260-8459 Fax: 1 (864) 260-8422 Dave.Windsor at us.bosch.com