Re: [CentOS] Random server reboot after update to CentOS 5.3

23 May 2009


      On Fri, May 22, 2009 at 1:17 PM, Peter Hopfgartner
peter.hopfgartner@r3-gis.com wrote:
...
JohnS wrote:
...
Now why in the world would you want to do that??? You running 5.3 as per
your earlier post and your uname shows you running the Xen Kernel.
Always run the newest kernel *unless* there are very good reasons not to
and I do not see that for your situation. Use the latest 5.3 NON Xen
Kernel to test it with.
A random kernel reboot on a production machine is a good reason, at
least from my POV. It run fine for months with 5.2 and has now problems
running with 5.3. If it is not able to run XEN, then I have to trash the
whole thing, since the ASp services hosten on the machine are within XEN
guests. No XEN - no business. And id DID run fine before the update.
...
My sejustion is unplug everything hooked to it but the power and network
cabling. Open it up while it is running, and shake the cables lightly
( don't jerk on them). External disk array, unplug it also. USB floppies
and cd drives unplug emmm all.
Is it under a heavy load? High cpu usage? Some times when there is a
power supply on the verge of dying you don't really know until disk I/O
climbs real high thus pulling loads of wattage. Pentium 4 and up cpus
are bad about this also.
No heavy load, it crashes even at times when tere is almost noload at
all. The power supplies are rwedundand and hardware monitoring tells me
they are both fine, as is the rest of the hardware of the machine.
...
Run memtest86 for a few hours not just a min or two and say ahh it's ok.
It takes time. Is there gaps in your log files like white space?
No gaps. Simply the machine restarts at a given moment. No shutdown, no
traces of a kernel panic
...
Hardware raid controller updated to latest firmware release?
Indeed, updating firmware and maybe some drivers from Dell's support
site will be the next actions.
...
Ok I guess
others can tack onto my list here as well.I wouldn't get to discouraged
because sometimes it can take days to find the problem.
Something I have been meaning to try is to see if LVM can be leveraged
to perform something like Solaris' live upgrade (of course without ZFS
it won't be as effecient) where you pin each release to their
respective sub-release version 5.0,5.1,5.2 etc, then clone the LV, put
in a new grub entry for the new sub-version release, then boot into
that cloned LV, increment the version in the repo file and yum upgrade
it to that version.
I suppose a new initrd will also need to be generated, but maybe a
script to automate it, maybe call it something like 'sysupgrade', it
can clone the root LV, mount it, upgrade the repo file, create a new
initrd, then add a grub entry.
This way if an upgrade doesn't work well for your application you can
back out for a little while until whatever is broken is fixed then
switch back to it.
Keep the root LV comparitively small, say 8GB, and just keep the prior
version, you definitely want to keep /home on a separate LV and
possibly /var depending on what apps you run.
Of course this doesn't mean one shouldn't fully test each update
before rolling it into production. If your app is mission critical,
buy two systems instead of one, so the second can be used for
redundancy and testing. If management balks at that, just say fine,
then don't complain when the production systems are down due to
inadequately tested software updates.
-Ross

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] Random server reboot after update to CentOS 5.3