On Thu, Sep 02, 2010 at 01:27:22PM -0400, Brian Mathis wrote:
Uptime is no longer a badge of honor. Typically there will have been
some kernel updates that require a reboot, so a long uptime means they
haven't been applied. Also, it is a good idea to reboot periodically
to catch anything that was not set up to start on boot correctly. A
server should always cleanly start up with all services it needs
without the need for human intervention.
Indeed. At my place we reboot production machines every 90 days. Or
are meant to; I don't think management have worked out that rebooting
10,000 machines every 90 days means a lot of reboot activity!!
(The idea being to verify that services will come up after some form
of DC-wide outage; last think we want in a "business contingency" situation
is a few hundred servers not working properly 'cos the rc scripts are
broken)