Scott Silva wrote:
on 6-3-2009 2:27 AM Peter Hopfgartner spake the following:
Epilogue:
I've tried to disable TSO (ethtool -K eth0 tso off), as was suggested on the poweredge list. This did not help.
I've configured the machine to start with the 5.2 kernel in /boot/grub/grub.conf, changing the default. It has been running for 6 1/2 days, now. I would say that this helped and is what I would suggest to others experiencing the same problem, right now.
Thus, current running kernel is 2.6.18-92.1.10.el5xen.
Regards and thanks for all replies,
Peter
That sure points to a machine/kernel conflict. You could try getting the source and rebuilding to see if that solves it, or maybe a diff of the two kernel configs to see if something is different there. Maybe someting is added or turned on in the new kernel that your system doesn't like.
Also, make sure your systems bioses are up to date. Not just motherboard, but any other cards that have firmware that might have an update like raidcard/sas controllers or ???
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Dear Scott,
unfortunately the machine is in production. Any downtime is really a problem since it is seen directly by our customers. I would really like to do some active effort to isolate the problem, but my boss would cut my head off, if I have to stop the machine. The firmware is not current, but according to Dell's web site I should stop almost every running service on the machine before upgrading the firmware, and in this case I would again have to watch out for my head. I do really care to provide accurate bug reports to OS projects that I use (I would guess that 90 % of my reports lead to a quick fix), but in this case I do have to make an exception and keep the machine running.
Thanks,
Peter