On Jun 8, 2009, at 9:18 AM, Peter Hopfgartner <peter.hopfgartner at r3-gis.com > wrote: > Scott Silva wrote: >> on 6-3-2009 2:27 AM Peter Hopfgartner spake the following: >> >>> Epilogue: >>> >>> I've tried to disable TSO (ethtool -K eth0 tso off), as was >>> suggested on >>> the poweredge list. This did not help. >>> >>> I've configured the machine to start with the 5.2 kernel in >>> /boot/grub/grub.conf, changing the default. It has been running >>> for 6 >>> 1/2 days, now. I would say that this helped and is what I would >>> suggest >>> to others experiencing the same problem, right now. >>> >>> Thus, current running kernel is 2.6.18-92.1.10.el5xen. >>> >>> Regards and thanks for all replies, >>> >>> Peter >>> >>> >> That sure points to a machine/kernel conflict. You could try >> getting the >> source and rebuilding to see if that solves it, or maybe a diff of >> the two >> kernel configs to see if something is different there. Maybe >> someting is added >> or turned on in the new kernel that your system doesn't like. >> >> Also, make sure your systems bioses are up to date. Not just >> motherboard, but >> any other cards that have firmware that might have an update like >> raidcard/sas >> controllers or ??? >> >> >> --- >> --------------------------------------------------------------------- >> >> _______________________________________________ >> CentOS mailing list >> CentOS at centos.org >> http://lists.centos.org/mailman/listinfo/centos >> > Dear Scott, > > unfortunately the machine is in production. Any downtime is really a > problem since it is seen directly by our customers. I would really > like > to do some active effort to isolate the problem, but my boss would cut > my head off, if I have to stop the machine. The firmware is not > current, > but according to Dell's web site I should stop almost every running > service on the machine before upgrading the firmware, and in this > case I > would again have to watch out for my head. I do really care to provide > accurate bug reports to OS projects that I use (I would guess that > 90 % > of my reports lead to a quick fix), but in this case I do have to make > an exception and keep the machine running. Do what works for now and think about a test box or VM setup for the future where you can test newer kernels before they go into production. -Ross