[CentOS] Random server reboot after update to CentOS 5.3

Mon Jun 8 15:26:44 UTC 2009

On Jun 8, 2009, at 9:18 AM, Peter Hopfgartner <peter.hopfgartner at r3-gis.com 
 > wrote:

> Scott Silva wrote:
>> on 6-3-2009 2:27 AM Peter Hopfgartner spake the following:
>>
>>> Epilogue:
>>>
>>> I've tried to disable TSO (ethtool -K eth0 tso off), as was  
>>> suggested on
>>> the poweredge list. This did not help.
>>>
>>> I've configured the machine to start with the 5.2 kernel in
>>> /boot/grub/grub.conf, changing the default. It has been running  
>>> for 6
>>> 1/2 days, now. I would say that this helped and is what I would  
>>> suggest
>>> to others experiencing the same problem, right now.
>>>
>>> Thus, current running kernel is  2.6.18-92.1.10.el5xen.
>>>
>>> Regards and thanks for all replies,
>>>
>>> Peter
>>>
>>>
>> That sure points to a machine/kernel conflict. You could try  
>> getting the
>> source and rebuilding to see if that solves it, or maybe a diff of  
>> the two
>> kernel configs to see if something is different there. Maybe  
>> someting is added
>> or turned on in the new kernel that your system doesn't like.
>>
>> Also, make sure your systems bioses are up to date. Not just  
>> motherboard, but
>> any other cards that have firmware that might have an update like  
>> raidcard/sas
>> controllers or ???
>>
>>
>> --- 
>> ---------------------------------------------------------------------
>>
>> _______________________________________________
>> CentOS mailing list
>> CentOS at centos.org
>> http://lists.centos.org/mailman/listinfo/centos
>>
> Dear Scott,
>
> unfortunately the machine is in production. Any downtime is really a
> problem since it is seen directly by our customers. I would really  
> like
> to do some active effort to isolate the problem, but my boss would cut
> my head off, if I have to stop the machine. The firmware is not  
> current,
> but according to Dell's web site I should stop almost every running
> service on the machine before upgrading the firmware, and in this  
> case I
> would again have to watch out for my head. I do really care to provide
> accurate bug reports to OS projects that I use (I would guess that  
> 90 %
> of my reports lead to a quick fix), but in this case I do have to make
> an exception and keep the machine running.

Do what works for now and think about a test box or VM setup for the  
future where you can test newer kernels before they go into production.

-Ross