On Thu, Sep 02, 2010 at 10:29:35PM +0200, Rudi Ahlers wrote:
On 2010/09/02 07:39 PM, Stephen Harris wrote:
Indeed. At my place we reboot production machines every 90 days. Or
are meant to; I don't think management have worked out that rebooting
10,000 machines every 90 days means a lot of reboot activity!!
(The idea being to verify that services will come up after some form
of DC-wide outage; last think we want in a "business contingency" situation
is a few hundred servers not working properly 'cos the rc scripts are
broken)
Interesting..... This generally won't happen on a rock solid OS like
CentOS, unless someone really screwed up badly or it's a super-custom
build which can't be updated using normal CentOS repositories.
We don't reboot servers (CentOS at least), unless we really really need
to. For minor kernel updates that doesn't give much more than what we
need we don't reboot either. Only for more critical / major / highly
important kernel updates, or hardware upgrades do we reboot.
You never upgrade the application? The database? Make config changes?
Wow... to live in such a static world :-)
Most of our problems aren't OS related, they're app or config
related... "change shared memory parameters for oracle", "start this at
boot time", "add new network interface"... these all may prevent the
server from booting cleanly and aren't the OS's fault. You don't want to
find that out during a crisis scenario!
We do shared webhosting mainly so only really use Apache, Exim,
MySQL, PostGreSQL, etc. So I guess it's not as "enterprise" as your
situation but with hundreds of thousands of files on every server,
being updated on a regular basis I do think that our servers fall in
the same category. But then again we only use STABLE release
software where possible. And I honestly haven't come across an issue
where an rc script doesn't work properly after reboot. I've had
cased where a kernel didn't work as expected though, but we don't
reboot a server every 2 months to see if the kernel might have
failed.