On Thu, 2010-09-02 at 16:39 -0400, Stephen Harris wrote:
On Thu, Sep 02, 2010 at 10:29:35PM +0200, Rudi Ahlers wrote:
On 2010/09/02 07:39 PM, Stephen Harris wrote:
Indeed. At my place we reboot production machines every 90 days. Or are meant to; I don't think management have worked out that rebooting 10,000 machines every 90 days means a lot of reboot activity!!
(The idea being to verify that services will come up after some form of DC-wide outage; last think we want in a "business contingency" situation is a few hundred servers not working properly 'cos the rc scripts are broken)
Interesting..... This generally won't happen on a rock solid OS like CentOS, unless someone really screwed up badly or it's a super-custom build which can't be updated using normal CentOS repositories.
We don't reboot servers (CentOS at least), unless we really really need to. For minor kernel updates that doesn't give much more than what we need we don't reboot either. Only for more critical / major / highly important kernel updates, or hardware upgrades do we reboot.
You never upgrade the application? The database? Make config changes? Wow... to live in such a static world :-)
Most of our problems aren't OS related, they're app or config related... "change shared memory parameters for oracle", "start this at boot time", "add new network interface"... these all may prevent the server from booting cleanly and aren't the OS's fault. You don't want to find that out during a crisis scenario!
For this kind of issues there are testing servers and testing environment. Gee people, Linux ain't windows, to get rebooted every day. Most of the problems you mentioned can be set on the fly, except of course hw change, although they do exist servers you can change the configuration in running state - yes, linux supports that also. On the other hand, booting every 'n' days because someone says so means either the consultant or the sysadmin is overpaid for their skills. And never, ever, ever use an automatic update tool on production servers. Did you people heard about change management in the first place? What kind of enterprise environment is that where changes are made without any change process? What if such an update breaks the core application of that company? Would you spend several hours, maybe days, to get the server back in the stable state? Anyway, what I'm worried about is seeing the "windows philosophy" (rebooting for cleaning memory leak - instead of killing the process which generates that leak, rebooting in order to update your applications - instead of restart only that particular application aso) becoming dominant in the linux world. And this is not good.