[CentOS] how long to reboot server ?

Stephen Harris lists at spuddy.org
Fri Sep 3 10:59:24 UTC 2010


On Fri, Sep 03, 2010 at 08:28:57AM +0300, kalinix wrote:
> On Thu, 2010-09-02 at 16:39 -0400, Stephen Harris wrote:

> > You never upgrade the application?  The database?  Make config changes?
> > Wow... to live in such a static world :-)
> > 
> > Most of our problems aren't OS related, they're app or config
> > related... "change shared memory parameters for oracle", "start this at
> > boot time", "add new network interface"...  these all may prevent the
> > server from booting cleanly and aren't the OS's fault.  You don't want to
> > find that out during a crisis scenario!

> For this kind of issues there are testing servers and testing
> environment.

Which are fine for the testing servers...but how do you verify the change
was properly implemented into production?

> Gee people, Linux ain't windows, to get rebooted every day. Most of the
> problems you mentioned can be set on the fly, except of course hw

The problem _isn't_ the "on the fly" changes.  In fact it's because most
of this stuff can be done on the fly that implementation issues don't get
noticed until reboot time.

Here's a great example that I came across 10 years ago...

The sybase rc script would su to the sybase user to pick up the required
environment variables, then start all the databases.  Fine, no problem.
Except sometime in the past 3 years some new Sybase DBA decided to modify
the .profile used by the sybase user so that it would ask what version
of sybase to use.  So when the DBAs su'd to sybase they'd get their
variables set.  Indeed the DBAs would source this file into their own
.profile and they were all happy.  This mistake went unnoticed for years
because the machines didn't reboot... until one day there was a failure
requiring a reboot... and the machine didn't complete booting.  Why?
Because the console was waiting for someone to select the sybase version
to use.

> servers. Did you people heard about change management in the first
> place? What kind of enterprise environment is that where changes are
> made without any change process? What if such an update breaks the core

I'm glad you have perfect people who never make mistakes.  I wish we did
at my place! No amount of paperwork (and, wow, we have lots of that!)
will prevent mistakes :-(

> Anyway, what I'm worried about is seeing the "windows
> philosophy" (rebooting for cleaning memory leak - instead of killing the
> process which generates that leak, rebooting in order to update your
> applications - instead of restart only that particular application aso)
> becoming dominant in the linux world. And this is not good.

You're not seeing this.  You're seeing contingency planning and
verification that services _will_ restart after an outage with minimum
disruption.

Prior to this policy my server had been up 1300+ days and was stable.  It
didn't require patching because I'd removed all unnecessary packages and
none of the security alerts had any impact on my machine and we hadn't
encountered any OS bugs needing fixing.

I've been a Unix "geek" for 20+ years now; I don't like a 90 day reboot
policy; I just pointed out what we have, and a rationale for it.
However I don't get to tell the CIO of a fortune 100 (fortune 50;
fortune 10?) company that his policy is... questionable :-)

-- 

rgds
Stephen



More information about the CentOS mailing list