how long to reboot server ?

List overview All Threads
Download

newer

older

geany 0.19.1 rpms

openvpn

mcclnx mcc

2 Sep 2010 2 Sep '10

5:17 p.m.

we have CENTOS 5 on DELL servers. some servers have longer than one year did not reboot. Our consultant suggest we need at least reboot once every year to clean out memory junk.

What is your opinion?

Show replies by date

Tim Nelson

2 Sep 2 Sep

5:18 p.m.

----- "mcclnx mcc" mcclnx@yahoo.com.tw wrote:

...

we have CENTOS 5 on DELL servers. some servers have longer than one year did not reboot. Our consultant suggest we need at least reboot once every year to clean out memory junk.

What is your opinion?

If you're running a Windows server, yes, a period reboot is necessary to 'clean it out'. However, in Linux land, this is not typically necessary as a 'rule'. You could certainly be running applications with memory leaks or other special circumstances that warrant a clean boot.

I have several Linux boxes running a variety of flavors including CentOS, Debian, and even Redhat (think old 8.x/9.x days) with uptimes ranging between 13 months to over two years. They're running perfectly without the 'yearly reboot'.

--Tim

Hal Martin

5:24 p.m.

Unless you have zombie processes or are upgrading the kernel, IMHO there is no reason to reboot.

-Hal

On Thu, Sep 2, 2010 at 1:18 PM, Tim Nelson tnelson@rockbochs.com wrote:

...

----- "mcclnx mcc" mcclnx@yahoo.com.tw wrote:

...
we have CENTOS 5 on DELL servers. some servers have longer than one year did not reboot. Our consultant suggest we need at least reboot once every year to clean out memory junk.

What is your opinion?

If you're running a Windows server, yes, a period reboot is necessary to 'clean it out'. However, in Linux land, this is not typically necessary as a 'rule'. You could certainly be running applications with memory leaks or other special circumstances that warrant a clean boot.

I have several Linux boxes running a variety of flavors including CentOS, Debian, and even Redhat (think old 8.x/9.x days) with uptimes ranging between 13 months to over two years. They're running perfectly without the 'yearly reboot'.

--Tim _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Rob Kampen

5:30 p.m.

Brian Mathis

5:27 p.m.

Uptime is no longer a badge of honor. Typically there will have been some kernel updates that require a reboot, so a long uptime means they haven't been applied. Also, it is a good idea to reboot periodically to catch anything that was not set up to start on boot correctly. A server should always cleanly start up with all services it needs without the need for human intervention.

As for "memory junk", yes and no. This would again be related to updates. If there are long running processes that have since had updates or updates to shared libraries, they may not be using the updated libraries. It would also reset anything that might have a memory leak. However, the idea of "junk" collecting in RAM That needs to be cleaned is not really true.

On Thu, Sep 2, 2010 at 1:17 PM, mcclnx mcc mcclnx@yahoo.com.tw wrote:

...

we have CENTOS 5 on DELL servers. some servers have longer than one year did not reboot. Our consultant suggest we need at least reboot once every year to clean out memory junk.

What is your opinion?

Paul Heinlein

5:31 p.m.

On Thu, 2 Sep 2010, Brian Mathis wrote:

...

Uptime is no longer a badge of honor. Typically there will have been some kernel updates that require a reboot, so a long uptime means they haven't been applied. Also, it is a good idea to reboot periodically to catch anything that was not set up to start on boot correctly. A server should always cleanly start up with all services it needs without the need for human intervention.

The longer you go between restarts, the harder it is to identify changes on the running system that might interfere with a clean boot process. So I reboot my servers regularly.

That said, I've had workstation uptimes of 800 to 900 days... :-)

-- Paul Heinlein <> heinlein@madboa.com <> http://www.madboa.com/

Stephen Harris

5:39 p.m.

On Thu, Sep 02, 2010 at 01:27:22PM -0400, Brian Mathis wrote:

...

Uptime is no longer a badge of honor. Typically there will have been some kernel updates that require a reboot, so a long uptime means they haven't been applied. Also, it is a good idea to reboot periodically to catch anything that was not set up to start on boot correctly. A server should always cleanly start up with all services it needs without the need for human intervention.

Indeed. At my place we reboot production machines every 90 days. Or are meant to; I don't think management have worked out that rebooting 10,000 machines every 90 days means a lot of reboot activity!!

(The idea being to verify that services will come up after some form of DC-wide outage; last think we want in a "business contingency" situation is a few hundred servers not working properly 'cos the rc scripts are broken)

-- rgds Stephen

Rudi Ahlers

8:29 p.m.

On 2010/09/02 07:39 PM, Stephen Harris wrote:

...

On Thu, Sep 02, 2010 at 01:27:22PM -0400, Brian Mathis wrote:

...
Uptime is no longer a badge of honor. Typically there will have been some kernel updates that require a reboot, so a long uptime means they haven't been applied. Also, it is a good idea to reboot periodically to catch anything that was not set up to start on boot correctly. A server should always cleanly start up with all services it needs without the need for human intervention.

Indeed. At my place we reboot production machines every 90 days. Or are meant to; I don't think management have worked out that rebooting 10,000 machines every 90 days means a lot of reboot activity!!

(The idea being to verify that services will come up after some form of DC-wide outage; last think we want in a "business contingency" situation is a few hundred servers not working properly 'cos the rc scripts are broken)

Interesting..... This generally won't happen on a rock solid OS like CentOS, unless someone really screwed up badly or it's a super-custom build which can't be updated using normal CentOS repositories.

We don't reboot servers (CentOS at least), unless we really really need to. For minor kernel updates that doesn't give much more than what we need we don't reboot either. Only for more critical / major / highly important kernel updates, or hardware upgrades do we reboot.

-- Kind Regards Rudi Ahlers, SoftDux MD Website: http://www.SoftDux.com Blog: http://Blog.SoftDux.com Support: http://Billing.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 Fax: 086 609 6128

Stephen Harris

8:39 p.m.

On Thu, Sep 02, 2010 at 10:29:35PM +0200, Rudi Ahlers wrote:

...

On 2010/09/02 07:39 PM, Stephen Harris wrote:

...
Indeed. At my place we reboot production machines every 90 days. Or are meant to; I don't think management have worked out that rebooting 10,000 machines every 90 days means a lot of reboot activity!!

(The idea being to verify that services will come up after some form of DC-wide outage; last think we want in a "business contingency" situation is a few hundred servers not working properly 'cos the rc scripts are broken)

Interesting..... This generally won't happen on a rock solid OS like CentOS, unless someone really screwed up badly or it's a super-custom build which can't be updated using normal CentOS repositories.

We don't reboot servers (CentOS at least), unless we really really need to. For minor kernel updates that doesn't give much more than what we need we don't reboot either. Only for more critical / major / highly important kernel updates, or hardware upgrades do we reboot.

You never upgrade the application? The database? Make config changes? Wow... to live in such a static world :-)

Most of our problems aren't OS related, they're app or config related... "change shared memory parameters for oracle", "start this at boot time", "add new network interface"... these all may prevent the server from booting cleanly and aren't the OS's fault. You don't want to find that out during a crisis scenario!

-- rgds Stephen

Rudi Ahlers

8:49 p.m.

On 2010/09/02 10:39 PM, Stephen Harris wrote:

...

On Thu, Sep 02, 2010 at 10:29:35PM +0200, Rudi Ahlers wrote:

...
On 2010/09/02 07:39 PM, Stephen Harris wrote:

...
Indeed. At my place we reboot production machines every 90 days. Or are meant to; I don't think management have worked out that rebooting 10,000 machines every 90 days means a lot of reboot activity!!

(The idea being to verify that services will come up after some form of DC-wide outage; last think we want in a "business contingency" situation is a few hundred servers not working properly 'cos the rc scripts are broken)

Interesting..... This generally won't happen on a rock solid OS like CentOS, unless someone really screwed up badly or it's a super-custom build which can't be updated using normal CentOS repositories.

We don't reboot servers (CentOS at least), unless we really really need to. For minor kernel updates that doesn't give much more than what we need we don't reboot either. Only for more critical / major / highly important kernel updates, or hardware upgrades do we reboot.

You never upgrade the application? The database? Make config changes? Wow... to live in such a static world :-)

Most of our problems aren't OS related, they're app or config related... "change shared memory parameters for oracle", "start this at boot time", "add new network interface"... these all may prevent the server from booting cleanly and aren't the OS's fault. You don't want to find that out during a crisis scenario!

We do shared webhosting mainly so only really use Apache, Exim, MySQL, PostGreSQL, etc. So I guess it's not as "enterprise" as your situation but with hundreds of thousands of files on every server, being updated on a regular basis I do think that our servers fall in the same category. But then again we only use STABLE release software where possible. And I honestly haven't come across an issue where an rc script doesn't work properly after reboot. I've had cased where a kernel didn't work as expected though, but we don't reboot a server every 2 months to see if the kernel might have failed.

-- Kind Regards Rudi Ahlers, SoftDux MD Website: http://www.SoftDux.com Blog: http://Blog.SoftDux.com Support: http://Billing.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 Fax: 086 609 6128

Todd Denniston

9:26 p.m.

Rudi Ahlers wrote, On 09/02/2010 04:49 PM:

...

<SNIP>

...

I've had cased where a kernel didn't work as expected though, but we don't reboot a server every 2 months to see if the kernel might have failed.

surprised I have not seen anyone mention the other two things which can conspire to cause reboot trouble (with the kernel) with long uptimes 1) automatic updates by yum-updatesd 2) small (only 3) installonly_limit

If you are not careful, the last known working kernel is gone when you go to reboot. :(

I usually am mindful of both of these settings.

-- Todd Denniston Crane Division, Naval Surface Warfare Center (NSWC Crane) Harnessing the Power of Technology for the Warfighter

Ross Walker

10:51 p.m.

On Sep 2, 2010, at 5:26 PM, Todd Denniston Todd.Denniston@tsb.cranrdte.navy.mil wrote:

...

Rudi Ahlers wrote, On 09/02/2010 04:49 PM:

...
<SNIP> > I've had cased where a kernel didn't > work as expected though, but we don't reboot a server every 2 months to > see if the kernel might have failed. >

surprised I have not seen anyone mention the other two things which can conspire to cause reboot trouble (with the kernel) with long uptimes

automatic updates by yum-updatesd

small (only 3) installonly_limit

If you are not careful, the last known working kernel is gone when you go to reboot. :(

I usually am mindful of both of these settings.

I would seriously advise against using yum-updatesd on a server deployment. It will screw up at some point and that point will most likely be when you can't afford to have it screw up.

Having said that, yum-updatesd on a desktop is fine, and probably a good canary for the server updates (but I wouldn't have it go on a large desktop deployment).

-Ross

Keith Roberts

3 Sep 3 Sep

6:03 a.m.

On Thu, 2 Sep 2010, Todd Denniston wrote:

...

To: CentOS mailing list centos@centos.org From: Todd Denniston Todd.Denniston@tsb.cranrdte.navy.mil Subject: Re: [CentOS] how long to reboot server ?

Rudi Ahlers wrote, On 09/02/2010 04:49 PM:

...
<SNIP> > I've had cased where a kernel didn't > work as expected though, but we don't reboot a server every 2 months to > see if the kernel might have failed. >

surprised I have not seen anyone mention the other two things which can conspire to cause reboot trouble (with the kernel) with long uptimes

automatic updates by yum-updatesd

small (only 3) installonly_limit

If you are not careful, the last known working kernel is gone when you go to reboot. :(

My reboot times are regular, (still on F12 on this machine) but I always copy the kernel files into a subdir 'tmp-backups' so I can get them back if needed, even if yum deletes them.

Keith

...

I usually am mindful of both of these settings.

-- Todd Denniston Crane Division, Naval Surface Warfare Center (NSWC Crane) Harnessing the Power of Technology for the Warfighter _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Joseph L. Casale

2:21 p.m.

...

My reboot times are regular, (still on F12 on this machine) but I always copy the kernel files into a subdir 'tmp-backups' so I can get them back if needed, even if yum deletes them.

Huh, ok... What do you do with *just* the kernel? Let me know how that works if you ever want to boot from it? Possibly the rpm might make more sense?

Keith Roberts

3:07 p.m.

On Fri, 3 Sep 2010, Joseph L. Casale wrote:

...

To: 'CentOS mailing list' centos@centos.org From: Joseph L. Casale jcasale@activenetwerx.com Subject: Re: [CentOS] how long to reboot server ?

...
My reboot times are regular, (still on F12 on this machine) but I always copy the kernel files into a subdir 'tmp-backups' so I can get them back if needed, even if yum deletes them.

Huh, ok... What do you do with *just* the kernel? Let me know how that works if you ever want to boot from it? Possibly the rpm might make more sense?

Yes, considering the number of *.ko modules that are built against a particular kernel version :)

Kind Regards,

Keith

----------------------------------------------------------------- Websites: http://www.php-debuggers.net http://www.karsites.net http://www.raised-from-the-dead.org.uk

All email addresses are challenge-response protected with TMDA [http://tmda.net] -----------------------------------------------------------------

Les Mikesell

3:23 p.m.

On 9/3/2010 10:07 AM, Keith Roberts wrote:

...

On Fri, 3 Sep 2010, Joseph L. Casale wrote:

...
To: 'CentOS mailing list'centos@centos.org From: Joseph L. Casalejcasale@activenetwerx.com Subject: Re: [CentOS] how long to reboot server ?

...
My reboot times are regular, (still on F12 on this machine) but I always copy the kernel files into a subdir 'tmp-backups' so I can get them back if needed, even if yum deletes them.

Huh, ok... What do you do with *just* the kernel? Let me know how that works if you ever want to boot from it? Possibly the rpm might make more sense?

Yes, considering the number of *.ko modules that are built against a particular kernel version :)

Don't they get their own directory that you can preserve in a copy? I've never had yum remove the running kernel, so never had to deal with it, but always assumed that you'd be able to boot the install disk in rescue mode, let it mount the filesystems, chroot, and then be able to tell yum to install the kernel version you need. Shouldn't that work?

-- Les Mikesell lesmikesell@gmail.com

Marko Vojinovic

5:09 p.m.

On Friday, September 03, 2010 16:23:31 Les Mikesell wrote:

...

On 9/3/2010 10:07 AM, Keith Roberts wrote:

...
On Fri, 3 Sep 2010, Joseph L. Casale wrote:

...
To: 'CentOS mailing list'centos@centos.org From: Joseph L. Casalejcasale@activenetwerx.com Subject: Re: [CentOS] how long to reboot server ?

...
My reboot times are regular, (still on F12 on this machine) but I always copy the kernel files into a subdir 'tmp-backups' so I can get them back if needed, even if yum deletes them.

Huh, ok... What do you do with *just* the kernel? Let me know how that works if you ever want to boot from it? Possibly the rpm might make more sense?

Yes, considering the number of *.ko modules that are built against a particular kernel version :)

Don't they get their own directory that you can preserve in a copy? I've never had yum remove the running kernel, so never had to deal with it, but always assumed that you'd be able to boot the install disk in rescue mode, let it mount the filesystems, chroot, and then be able to tell yum to install the kernel version you need. Shouldn't that work?

AFAIK yum never removes the currently running kernel, at least not in default configuration.

HTH, :-) Marko

Les Mikesell

5:17 p.m.

On 9/3/2010 12:09 PM, Marko Vojinovic wrote:

...

On Friday, September 03, 2010 16:23:31 Les Mikesell wrote:

...
On 9/3/2010 10:07 AM, Keith Roberts wrote:

...
On Fri, 3 Sep 2010, Joseph L. Casale wrote:

...
To: 'CentOS mailing list'centos@centos.org From: Joseph L. Casalejcasale@activenetwerx.com Subject: Re: [CentOS] how long to reboot server ?

...
My reboot times are regular, (still on F12 on this machine) but I always copy the kernel files into a subdir 'tmp-backups' so I can get them back if needed, even if yum deletes them.

Huh, ok... What do you do with *just* the kernel? Let me know how that works if you ever want to boot from it? Possibly the rpm might make more sense?

Yes, considering the number of *.ko modules that are built against a particular kernel version :)

Don't they get their own directory that you can preserve in a copy? I've never had yum remove the running kernel, so never had to deal with it, but always assumed that you'd be able to boot the install disk in rescue mode, let it mount the filesystems, chroot, and then be able to tell yum to install the kernel version you need. Shouldn't that work?

AFAIK yum never removes the currently running kernel, at least not in default configuration.

Does anyone know if this is special-cased or some config setting? I recall in FC5 having a an IBM 225 that ran OK with the initial kernels but at some update would not boot the new one and many subsequent versions. I think there were more failing kernels than the number configured to keep but I was always able to recover by selecting the old working version in the grub boot menu so it looked like it was a special case. Eventually I did a bios update on the machine which let the new kernels run but broke the older ones.

-- Les Mikesell lesmikesell@gmail.com

Matthew Miller

5:34 p.m.

On Fri, Sep 03, 2010 at 12:17:37PM -0500, Les Mikesell wrote:

...

Does anyone know if this is special-cased or some config setting? I

It's special-cased.

...

recall in FC5 having a an IBM 225 that ran OK with the initial kernels but at some update would not boot the new one and many subsequent versions. I think there were more failing kernels than the number configured to keep but I was always able to recover by selecting the old working version in the grub boot menu so it looked like it was a special case. Eventually I did a bios update on the machine which let the new kernels run but broke the older ones.

You can configure the number to keep to be very large, if you want.

-- Matthew Miller mattdm@mattdm.org http://mattdm.org/

Les Mikesell

5:56 p.m.

On 9/3/2010 12:34 PM, Matthew Miller wrote:

...

On Fri, Sep 03, 2010 at 12:17:37PM -0500, Les Mikesell wrote:

...
Does anyone know if this is special-cased or some config setting? I

It's special-cased.

...
recall in FC5 having a an IBM 225 that ran OK with the initial kernels but at some update would not boot the new one and many subsequent versions. I think there were more failing kernels than the number configured to keep but I was always able to recover by selecting the old working version in the grub boot menu so it looked like it was a special case. Eventually I did a bios update on the machine which let the new kernels run but broke the older ones.

You can configure the number to keep to be very large, if you want.

I didn't particularly care about keeping the new ones that didn't boot, so special-casing would be the right thing - just harder to be sure that the default hasn't changed as you update than a config option would be.

-- Les Mikesell lesmikesell@gmail.com

Keith Roberts

6:10 p.m.

On Fri, 3 Sep 2010, Matthew Miller wrote:

...

To: CentOS mailing list centos@centos.org From: Matthew Miller mattdm@mattdm.org Subject: Re: [CentOS] how long to reboot server ?

On Fri, Sep 03, 2010 at 12:17:37PM -0500, Les Mikesell wrote:

...
Does anyone know if this is special-cased or some config setting? I

It's special-cased.

...
recall in FC5 having a an IBM 225 that ran OK with the initial kernels but at some update would not boot the new one and many subsequent versions. I think there were more failing kernels than the number configured to keep but I was always able to recover by selecting the old working version in the grub boot menu so it looked like it was a special case. Eventually I did a bios update on the machine which let the new kernels run but broke the older ones.

You can configure the number to keep to be very large, if you want.

How do you do that then Matt?

Regards,

Keith

----------------------------------------------------------------- Websites: http://www.php-debuggers.net http://www.karsites.net http://www.raised-from-the-dead.org.uk

All email addresses are challenge-response protected with TMDA [http://tmda.net] -----------------------------------------------------------------

...

-- Matthew Miller mattdm@mattdm.org http://mattdm.org/ _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Matthew Miller

6:12 p.m.

On Fri, Sep 03, 2010 at 07:10:57PM +0100, Keith Roberts wrote:

...

...
You can configure the number to keep to be very large, if you want.

How do you do that then Matt?

Set the (admittedly confusingly-named) "installonly_limit" parameter in /etc/yum.conf to something big.

-- Matthew Miller mattdm@mattdm.org http://mattdm.org/

Marko Vojinovic

8:10 p.m.

On Friday, September 03, 2010 18:34:51 Matthew Miller wrote:

...

On Fri, Sep 03, 2010 at 12:17:37PM -0500, Les Mikesell wrote:

...
Does anyone know if this is special-cased or some config setting? I

It's special-cased.

I remember the discussion on the Fedora-list about this a very long time ago, and the bottomline is roughly the following:

* when a yum update installs a new kernel, it checks if the total number of installed kernels exceeds the installonly_limit parameter * if not, everything is ok * if yes, the oldest *non-running* kernel is removed and the remaining number of kernels is checked again against installonly_limit, and the removal step is repeated if they still don't match up.

This was done precisely because it was understood that a currently running kernel can be assumed to be stable and bootable. So if you have several kernels, run a yum update while the oldest one is running, get a new kernel, the extra kernels that will get removed are those "in between". This ensures that with any multiple-kernel configuration of yum, there will be at least one kernel known to work, as a failsafe.

I believe CentOS just inherited this behavior of yum. Though I might be wrong, it seems unlikely that anyone would remove this feature from yum on purpose.

So all in all, you should never be afraid that yum will leave you only with untested kernels while updating.

HTH, :-) Marko

Todd Denniston

8:47 p.m.

Marko Vojinovic wrote, On 09/03/2010 04:10 PM:

...

On Friday, September 03, 2010 18:34:51 Matthew Miller wrote:

...
On Fri, Sep 03, 2010 at 12:17:37PM -0500, Les Mikesell wrote:

...
Does anyone know if this is special-cased or some config setting? I

It's special-cased.

...

So all in all, you should never be afraid that yum will leave you only with untested kernels while updating.

Thank you for your description of what is supposed to happen, I was not aware of this safety provision previously.

I will however probably pick a test machine and try it, just to have that reassuring feeling of having seen it in action myself, as so far I have only come to that point where the NEXT update would be the one which got me. :)

-- Todd Denniston Crane Division, Naval Surface Warfare Center (NSWC Crane) Harnessing the Power of Technology for the Warfighter

Les Mikesell

8:58 p.m.

On 9/3/2010 3:47 PM, Todd Denniston wrote:

...

Marko Vojinovic wrote, On 09/03/2010 04:10 PM:

...
On Friday, September 03, 2010 18:34:51 Matthew Miller wrote:

...
On Fri, Sep 03, 2010 at 12:17:37PM -0500, Les Mikesell wrote:

...
Does anyone know if this is special-cased or some config setting? I

It's special-cased.

...
So all in all, you should never be afraid that yum will leave you only with untested kernels while updating.

Thank you for your description of what is supposed to happen, I was not aware of this safety provision previously.

I will however probably pick a test machine and try it, just to have that reassuring feeling of having seen it in action myself, as so far I have only come to that point where the NEXT update would be the one which got me. :)

Just try booting your oldest existing kernel from the grub menu before doing an update that includes a kernel, and note that the one you are running isn't removed like it would be otherwise.

-- Les Mikesell lesmikesell@gmail.com

Ross Walker

9:15 p.m.

On Sep 3, 2010, at 4:10 PM, Marko Vojinovic vvmarko@gmail.com wrote:

...

On Friday, September 03, 2010 18:34:51 Matthew Miller wrote:

...
On Fri, Sep 03, 2010 at 12:17:37PM -0500, Les Mikesell wrote:

...
Does anyone know if this is special-cased or some config setting? I

It's special-cased.

I remember the discussion on the Fedora-list about this a very long time ago, and the bottomline is roughly the following:

when a yum update installs a new kernel, it checks if the total number of

installed kernels exceeds the installonly_limit parameter

if not, everything is ok

if yes, the oldest *non-running* kernel is removed and the remaining number

of kernels is checked again against installonly_limit, and the removal step is repeated if they still don't match up.

This was done precisely because it was understood that a currently running kernel can be assumed to be stable and bootable. So if you have several kernels, run a yum update while the oldest one is running, get a new kernel, the extra kernels that will get removed are those "in between". This ensures that with any multiple-kernel configuration of yum, there will be at least one kernel known to work, as a failsafe.

I believe CentOS just inherited this behavior of yum. Though I might be wrong, it seems unlikely that anyone would remove this feature from yum on purpose.

So all in all, you should never be afraid that yum will leave you only with untested kernels while updating.

This is good info!

What I am wondering is if there is a way to prevent new kernels from becoming the default by... default?

That way one won't be "pleasantly" surprised that after a long uptime and several updates, that on the next reboot their applications stop working because of a kernel update that hadn't been tested yet.

A way where the admin must manually choose the default kernel.

-Ross

Jerry Franz

9:22 p.m.

On 09/03/2010 02:15 PM, Ross Walker wrote:

...

This is good info!

What I am wondering is if there is a way to prevent new kernels from becoming the default by... default?

That way one won't be "pleasantly" surprised that after a long uptime and several updates, that on the next reboot their applications stop working because of a kernel update that hadn't been tested yet.

A way where the admin must manually choose the default kernel.

Look at /etc/sysconfig/kernel

-- Benjamin Franz

Keith Roberts

4 Sep 4 Sep

2:14 p.m.

On Fri, 3 Sep 2010, Ross Walker wrote:

...

To: CentOS mailing list centos@centos.org From: Ross Walker rswwalker@gmail.com Subject: Re: [CentOS] how long to reboot server ?

On Sep 3, 2010, at 4:10 PM, Marko Vojinovic vvmarko@gmail.com wrote:

...
On Friday, September 03, 2010 18:34:51 Matthew Miller wrote:

...
On Fri, Sep 03, 2010 at 12:17:37PM -0500, Les Mikesell wrote:

...
Does anyone know if this is special-cased or some config setting? I

It's special-cased.

I remember the discussion on the Fedora-list about this a very long time ago, and the bottomline is roughly the following:

when a yum update installs a new kernel, it checks if the total number of

installed kernels exceeds the installonly_limit parameter

if not, everything is ok

if yes, the oldest *non-running* kernel is removed and the remaining number

of kernels is checked again against installonly_limit, and the removal step is repeated if they still don't match up.

This was done precisely because it was understood that a currently running kernel can be assumed to be stable and bootable. So if you have several kernels, run a yum update while the oldest one is running, get a new kernel, the extra kernels that will get removed are those "in between". This ensures that with any multiple-kernel configuration of yum, there will be at least one kernel known to work, as a failsafe.

I believe CentOS just inherited this behavior of yum. Though I might be wrong, it seems unlikely that anyone would remove this feature from yum on purpose.

So all in all, you should never be afraid that yum will leave you only with untested kernels while updating.

This is good info!

What I am wondering is if there is a way to prevent new kernels from becoming the default by... default?

That way one won't be "pleasantly" surprised that after a long uptime and several updates, that on the next reboot their applications stop working because of a kernel update that hadn't been tested yet.

A way where the admin must manually choose the default kernel.

-Ross

I have 2 root partitions labelled Fedora-8-root and Fedora-12-root, and have installed GRUB to a seperate boot partition, labelled GrubBoot. ~20MB should be plenty for a boot partition, maybe less than that. GrubBoot is a logical partition in the extended partition.

There are no symlinks to grub.conf on my active root partition. the /boot/grub/ directory only contains the splash.xpm.gz image (which probably doesn't need to be there anyway).

Whenever there is a kernel upgrade, yum installs the kernel files to the active root partition. As the GrubBoot partition is not mounted, the kernel rpm scripts cannot find the grub.conf file to modify it, so that remains the same.

I have had this problem on SuSE when I had to compile the kernel with the Nvidia graphics driver after the kernel was upgraded. IIRC there was only one kernel installed at a time, so if it did not boot - hard luck!

There have been some recent kernel upgrades on Centos 5.5 and F12, but I have not even bothered to edit my grub.conf file and boot them yet!

See the man page for grub-install how to do this. It's really easy and worth the hassle to set up a seperate boot partition.

The other bonus is once you have a seperate boot partition you can do a fresh install of Linux to a root partition, and if you select the 'Do not install GRUB bootloader' option, you will still have a working GRUB installation to boot from on the GRUB boot partition. That saves some hassle as well.

This is on bare hardware, as I do not do VM's ATM.

HTH

Keith Roberts

----------------------------------------------------------------- Websites: http://www.karsites.net http://www.php-debuggers.net http://www.raised-from-the-dead.org.uk

All email addresses are challenge-response protected with TMDA [http://tmda.net] -----------------------------------------------------------------

Keith Roberts

3 Sep 3 Sep

6:21 p.m.

On Fri, 3 Sep 2010, Joseph L. Casale wrote:

...

To: 'CentOS mailing list' centos@centos.org From: Joseph L. Casale jcasale@activenetwerx.com Subject: Re: [CentOS] how long to reboot server ?

...
My reboot times are regular, (still on F12 on this machine) but I always copy the kernel files into a subdir 'tmp-backups' so I can get them back if needed, even if yum deletes them.

Huh, ok... What do you do with *just* the kernel? Let me know how that works if you ever want to boot from it? Possibly the rpm might make more sense?

A long time ago now. Maybe I was falling back to a previous kernel version that had not been uninstalled by yum, or was it YaST on SuSE Linux?

Perhaps keeping the rpm for a kernel that you know works, tucked away on the system would be a good move?

Also, if you downgrade the kernel version, would this have any effect on other packages on the system?

Kind Regards,

Keith Roberts

----------------------------------------------------------------- Websites: http://www.php-debuggers.net http://www.karsites.net http://www.raised-from-the-dead.org.uk

All email addresses are challenge-response protected with TMDA [http://tmda.net] -----------------------------------------------------------------

kalinix

5:28 a.m.

On Thu, 2010-09-02 at 16:39 -0400, Stephen Harris wrote:

...

On Thu, Sep 02, 2010 at 10:29:35PM +0200, Rudi Ahlers wrote:

...
On 2010/09/02 07:39 PM, Stephen Harris wrote:

...
Indeed. At my place we reboot production machines every 90 days. Or are meant to; I don't think management have worked out that rebooting 10,000 machines every 90 days means a lot of reboot activity!!

(The idea being to verify that services will come up after some form of DC-wide outage; last think we want in a "business contingency" situation is a few hundred servers not working properly 'cos the rc scripts are broken)

Interesting..... This generally won't happen on a rock solid OS like CentOS, unless someone really screwed up badly or it's a super-custom build which can't be updated using normal CentOS repositories.

We don't reboot servers (CentOS at least), unless we really really need to. For minor kernel updates that doesn't give much more than what we need we don't reboot either. Only for more critical / major / highly important kernel updates, or hardware upgrades do we reboot.

You never upgrade the application? The database? Make config changes? Wow... to live in such a static world :-)

Most of our problems aren't OS related, they're app or config related... "change shared memory parameters for oracle", "start this at boot time", "add new network interface"... these all may prevent the server from booting cleanly and aren't the OS's fault. You don't want to find that out during a crisis scenario!

For this kind of issues there are testing servers and testing environment. Gee people, Linux ain't windows, to get rebooted every day. Most of the problems you mentioned can be set on the fly, except of course hw change, although they do exist servers you can change the configuration in running state - yes, linux supports that also. On the other hand, booting every 'n' days because someone says so means either the consultant or the sysadmin is overpaid for their skills. And never, ever, ever use an automatic update tool on production servers. Did you people heard about change management in the first place? What kind of enterprise environment is that where changes are made without any change process? What if such an update breaks the core application of that company? Would you spend several hours, maybe days, to get the server back in the stable state? Anyway, what I'm worried about is seeing the "windows philosophy" (rebooting for cleaning memory leak - instead of killing the process which generates that leak, rebooting in order to update your applications - instead of restart only that particular application aso) becoming dominant in the linux world. And this is not good.

-- Calin Key fingerprint = 37B8 0DA5 9B2A 8554 FB2B 4145 5DC1 15DD A3EF E857 ================================================= Book: "When I talk about belief, why do you always assume I'm talking about God?" --"Serenity"

Stephen Harris

10:59 a.m.

On Fri, Sep 03, 2010 at 08:28:57AM +0300, kalinix wrote:

...

On Thu, 2010-09-02 at 16:39 -0400, Stephen Harris wrote:

...

...
You never upgrade the application? The database? Make config changes? Wow... to live in such a static world :-)

Most of our problems aren't OS related, they're app or config related... "change shared memory parameters for oracle", "start this at boot time", "add new network interface"... these all may prevent the server from booting cleanly and aren't the OS's fault. You don't want to find that out during a crisis scenario!

...

For this kind of issues there are testing servers and testing environment.

Which are fine for the testing servers...but how do you verify the change was properly implemented into production?

...

Gee people, Linux ain't windows, to get rebooted every day. Most of the problems you mentioned can be set on the fly, except of course hw

The problem _isn't_ the "on the fly" changes. In fact it's because most of this stuff can be done on the fly that implementation issues don't get noticed until reboot time.

Here's a great example that I came across 10 years ago...

The sybase rc script would su to the sybase user to pick up the required environment variables, then start all the databases. Fine, no problem. Except sometime in the past 3 years some new Sybase DBA decided to modify the .profile used by the sybase user so that it would ask what version of sybase to use. So when the DBAs su'd to sybase they'd get their variables set. Indeed the DBAs would source this file into their own .profile and they were all happy. This mistake went unnoticed for years because the machines didn't reboot... until one day there was a failure requiring a reboot... and the machine didn't complete booting. Why? Because the console was waiting for someone to select the sybase version to use.

...

servers. Did you people heard about change management in the first place? What kind of enterprise environment is that where changes are made without any change process? What if such an update breaks the core

I'm glad you have perfect people who never make mistakes. I wish we did at my place! No amount of paperwork (and, wow, we have lots of that!) will prevent mistakes :-(

...

Anyway, what I'm worried about is seeing the "windows philosophy" (rebooting for cleaning memory leak - instead of killing the process which generates that leak, rebooting in order to update your applications - instead of restart only that particular application aso) becoming dominant in the linux world. And this is not good.

You're not seeing this. You're seeing contingency planning and verification that services _will_ restart after an outage with minimum disruption.

Prior to this policy my server had been up 1300+ days and was stable. It didn't require patching because I'd removed all unnecessary packages and none of the security alerts had any impact on my machine and we hadn't encountered any OS bugs needing fixing.

I've been a Unix "geek" for 20+ years now; I don't like a 90 day reboot policy; I just pointed out what we have, and a rationale for it. However I don't get to tell the CIO of a fortune 100 (fortune 50; fortune 10?) company that his policy is... questionable :-)

-- rgds Stephen

kalinix

6:56 p.m.

On Fri, 2010-09-03 at 06:59 -0400, Stephen Harris wrote:

...

On Fri, Sep 03, 2010 at 08:28:57AM +0300, kalinix wrote:

...
On Thu, 2010-09-02 at 16:39 -0400, Stephen Harris wrote:

...
...
You never upgrade the application? The database? Make config changes? Wow... to live in such a static world :-)

Most of our problems aren't OS related, they're app or config related... "change shared memory parameters for oracle", "start this at boot time", "add new network interface"... these all may prevent the server from booting cleanly and aren't the OS's fault. You don't want to find that out during a crisis scenario!

...
For this kind of issues there are testing servers and testing environment.

Which are fine for the testing servers...but how do you verify the change was properly implemented into production?

You're kidding right? You mean you restart the production servers just to test if your application works??? IMHO this should be part of the testing scenario.

...

...
Gee people, Linux ain't windows, to get rebooted every day. Most of the problems you mentioned can be set on the fly, except of course hw

The problem _isn't_ the "on the fly" changes. In fact it's because most of this stuff can be done on the fly that implementation issues don't get noticed until reboot time.

Here's a great example that I came across 10 years ago...

The sybase rc script would su to the sybase user to pick up the required environment variables, then start all the databases. Fine, no problem. Except sometime in the past 3 years some new Sybase DBA decided to modify the .profile used by the sybase user so that it would ask what version of sybase to use. So when the DBAs su'd to sybase they'd get their variables set. Indeed the DBAs would source this file into their own .profile and they were all happy. This mistake went unnoticed for years because the machines didn't reboot... until one day there was a failure requiring a reboot... and the machine didn't complete booting. Why? Because the console was waiting for someone to select the sybase version to use.

Tipical example of BOFH. Sorry, BDBAFH :). In this caes, at least the sysadmin should be consulted (if not requested permision) to perform such a change on a production server. If, let's say, web application designer, one day decide that the application needs to run php with low security settings, he just lower the security of the whole system without asking anyone if he can do that?

...

...
servers. Did you people heard about change management in the first place? What kind of enterprise environment is that where changes are made without any change process? What if such an update breaks the core

I'm glad you have perfect people who never make mistakes. I wish we did at my place! No amount of paperwork (and, wow, we have lots of that!) will prevent mistakes :-(

It's not about of paperwork. It's about the change process which should be wery well implemented and tested, re-tested and tested again. And when you think it's done then you should re-test once more. I remember once, on an w2k3 (alas) when the first SP just get out. We had a developing team which deployed an java portal (don't ask). Anyway, I asked them to test whether we could deploy the SP on production server, as it had several important fixes. Of course they said it was tested and it was ok to deploy it. Which I did. And of course the portal was scrambled. In the end, turned out they didn't test the SP.

...

...
Anyway, what I'm worried about is seeing the "windows philosophy" (rebooting for cleaning memory leak - instead of killing the process which generates that leak, rebooting in order to update your applications - instead of restart only that particular application aso) becoming dominant in the linux world. And this is not good.

You're not seeing this. You're seeing contingency planning and verification that services _will_ restart after an outage with minimum disruption.

...

Prior to this policy my server had been up 1300+ days and was stable. It didn't require patching because I'd removed all unnecessary packages and none of the security alerts had any impact on my machine and we hadn't encountered any OS bugs needing fixing.

I've been a Unix "geek" for 20+ years now; I don't like a 90 day reboot policy; I just pointed out what we have, and a rationale for it. However I don't get to tell the CIO of a fortune 100 (fortune 50; fortune 10?) company that his policy is... questionable :-)

I know exactly what you mean. Those days managers looks only for how many colors their excel sheets has. Anyway, I stand up for my principles, proving they are right. One of them is never boot a linux unless you change the kernel or the hardware (never both on the same time).

Memory leaks, DBA issues, testing all this should be fixed either development or testing environment and only after a extensive testing deployed in production.

-- Calin Key fingerprint = 37B8 0DA5 9B2A 8554 FB2B 4145 5DC1 15DD A3EF E857 ================================================= standards, n.: The principles we use to reject other people's code.

Benjamin Donnachie

7:37 p.m.

On 3 Sep 2010, at 19:56, kalinix calin.kalinix.cosma@gmail.com wrote:

...

It's not about of paperwork. It's about the change process which should be wery well implemented and tested, re-tested and tested again. And when you think it's done then you should re-test once more.

Sounds like they should adopt ITIL! I'm upsetting a few people by bringing in change management but I won't have working servers messed about with unnecessarily!

...

I remember once, on an w2k3 (alas) when the first SP just get out. We had a developing team which deployed an java portal (don't ask).

We had something similar in my old job - had the green light to upgrade the Win2k3 Citrix boxes to the latest version of .net and it completely killed the application! Turned out development had only tested it on XP!

Thankfully we didn't trust them and caught the problem after just one server!

Take care,

Ben

Kwan Lowe

2 Sep 2 Sep

6:02 p.m.

On Thu, Sep 2, 2010 at 1:17 PM, mcclnx mcc mcclnx@yahoo.com.tw wrote:

...

we have CENTOS 5 on DELL servers. some servers have longer than one year did not reboot. Our consultant suggest we need at least reboot once every year to clean out memory junk.

What is your opinion?

As someone else mentioned, uptime is no reason to brag... That said, I manage one Linux system that's been running continuously for 1,400 days. It will be upgraded in the near future, but it's nonsense that you'd need to reboot to "clean out memory".

Robert Heller

7:16 p.m.

At Fri, 3 Sep 2010 01:17:15 +0800 (CST) CentOS mailing list centos@centos.org wrote:

...

we have CENTOS 5 on DELL servers. some servers have longer than one year did not reboot. Our consultant suggest we need at least reboot once every year to clean out memory junk.

What is your opinion?

You only need to reboot when you do kernel upgrades.

...

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- Robert Heller -- 978-544-6933 Deepwoods Software -- Download the Model Railroad System http://www.deepsoft.com/ -- Binaries for Linux and MS-Windows heller@deepsoft.com -- http://www.deepsoft.com/ModelRailroadSystem/

R-Elists

4 Sep 4 Sep

5:42 a.m.

...

we have CENTOS 5 on DELL servers. some servers have longer than one year did not reboot. Our consultant suggest we need at least reboot once every year to clean out memory junk.

What is your opinion?

maybe i missed it yet, did anyone mention the old adage...

"if it isn't broke, don't fix it?"

that would seem to apply in many scenarios...

certain exceptions noted of course, yet any good linux admin should be on very close to change logs etc anyways...

- rh

Les Mikesell

5:43 p.m.

On 9/4/10 12:42 AM, R-Elists wrote:

...

...
we have CENTOS 5 on DELL servers. some servers have longer than one year did not reboot. Our consultant suggest we need at least reboot once every year to clean out memory junk.

What is your opinion?

maybe i missed it yet, did anyone mention the old adage...

"if it isn't broke, don't fix it?"

that would seem to apply in many scenarios...

certain exceptions noted of course, yet any good linux admin should be on very close to change logs etc anyways...

It's a safe assumption that all software is always broke. Read through the changelogs of any large project if you want to see how depressingly true that is or how unlikely it would be for something to be perfect. So the question becomes one of whether you trust the people generating the updates to know whether they are needed or not. With systems like fedora where the distribution's goal is to push out new/different software in the hope of eventually advancing the state of the art, you probably shouldn't expect updates to be any more stable than what you have installed. But with RHEL/Centos where the goal of the distribution is stability the updates generally are just to fix things that need to be fixed. The only reasons to avoid them would be if you have unusual circumstances like weird hardware or you think you know more about linux than the team building and testing the updates.

-- Les Mikesell lesmikesell@gmail.com

5566

Age (days ago)

5568

Last active (days ago)

discuss@lists.centos.org

36 comments

21 participants

tags (0)

participants (21)

Benjamin Donnachie
Brian Mathis
Hal Martin
Jerry Franz
Joseph L. Casale
kalinix
Keith Roberts
Kwan Lowe
Les Mikesell
Marko Vojinovic
Matthew Miller
mcclnx mcc
Paul Heinlein
R-Elists
Rob Kampen
Robert Heller
Ross Walker
Rudi Ahlers
Stephen Harris
Tim Nelson
Todd Denniston