reboot long uptimes?

List overview All Threads
Download

newer

older

RE: [CentOS] How do I burn a CD...

Manage of firewall.

D Ivago

13 Feb 2007 13 Feb '07

11:06 a.m.

Hi,

I was just wondering if I should reboot some servers that are running over 180 days?

They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...

btw this uptime really confirms me how stable Centos 4.x really is and so I wonder how long some people's uptimes on the list are ;)

rmc

Attachments:

attachment.html (text/html — 478 bytes)

Show replies by date

Johnny Hughes

13 Feb 13 Feb

11:29 a.m.

On Tue, 2007-02-13 at 12:06 +0100, D Ivago wrote:

...

Hi,

I was just wondering if I should reboot some servers that are running over 180 days?

They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...

I only reboot on kernel upgrades, that is usually more often than 6 months. But if you don't need to reboot for that reason, I would not reboot at all.

...

btw this uptime really confirms me how stable Centos 4.x really is and so I wonder how long some people's uptimes on the list are ;)

rmc

You should consider upgrading your kernels when security updates come out ... just to be safe. Especially for machines touching the internet.

I usually upgrade my kernels because I like to use LVM snapshots for backups and that has only really started working semi-well since 4.3 and even better in 4.4 ... so most of my machines get rebooted every new kernel, which is at least 2-3 times a year (sometimes more often).

That being said, I do have a non internet facing machine that has not been rebooted since it was installed with CentOS-4.0 on it one March 1, 2005. It is an internal router on my employer's infrastructure, and has been up for almost 2 years (and was installed on the day before CentOS-4 was officially released).

Thanks, Johnny Hughes

security

11:40 a.m.

Johnny Hughes a écrit :

...

On Tue, 2007-02-13 at 12:06 +0100, D Ivago wrote:

...
Hi,

I was just wondering if I should reboot some servers that are running over 180 days?

They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...

I only reboot on kernel upgrades, that is usually more often than 6 months. But if you don't need to reboot for that reason, I would not reboot at all.

kernel and glibc.

...

...
btw this uptime really confirms me how stable Centos 4.x really is and so I wonder how long some people's uptimes on the list are ;)

rmc

You should consider upgrading your kernels when security updates come out ... just to be safe. Especially for machines touching the internet.

I usually upgrade my kernels because I like to use LVM snapshots for backups and that has only really started working semi-well since 4.3 and even better in 4.4 ... so most of my machines get rebooted every new kernel, which is at least 2-3 times a year (sometimes more often).

That being said, I do have a non internet facing machine that has not been rebooted since it was installed with CentOS-4.0 on it one March 1, 2005. It is an internal router on my employer's infrastructure, and has been up for almost 2 years (and was installed on the day before CentOS-4 was officially released).

Thanks, Johnny Hughes

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Drew Weaver

3:03 p.m.

-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Johnny Hughes Sent: Tuesday, February 13, 2007 6:30 AM To: CentOS ML Subject: Re: [CentOS] reboot long uptimes?

On Tue, 2007-02-13 at 12:06 +0100, D Ivago wrote:

...

Hi,

I was just wondering if I should reboot some servers that are running over 180 days?

They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...

I only reboot on kernel upgrades, that is usually more often than 6 months. But if you don't need to reboot for that reason, I would not reboot at all.

...

btw this uptime really confirms me how stable Centos 4.x really is and so I wonder how long some people's uptimes on the list are ;)

rmc

You should consider upgrading your kernels when security updates come out ... just to be safe. Especially for machines touching the internet.

Thanks, Johnny Hughes -------------

My uptime on some of our boxes are pretty bad, we have roughly 250 CentOS 4.x boxes here I'd say probably 25% of them initially suffer from some sort of bug with cpuspeed which causes kernel panics (until we disable cpuspeed), and then we have this other curious thing that happens with the filesystem where they will occasionally start spamming this "ext3-fs "Journal Has aborted" message until we reboot the boxes (nothing is wrong with the hardware in any of the cases).

Other than those 75 or so issues no problems at all.

-Drew

John Hinton

3:41 p.m.

Drew Weaver wrote:

...

-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Johnny Hughes Sent: Tuesday, February 13, 2007 6:30 AM To: CentOS ML Subject: Re: [CentOS] reboot long uptimes?

On Tue, 2007-02-13 at 12:06 +0100, D Ivago wrote:

...
Hi,

I was just wondering if I should reboot some servers that are running over 180 days?

They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...

About the only other reason I can think of is just to make sure it will restart when an emergency arises.

For instance, fans, drives, etc.....

Some servers will balk if a fan doesn't run. Some servers balk if a hard drive isn't up to speed. These types of things only show up during a reboot. In the case of scsi raids, hot swap drives... if a drive goes bad some equipment will require some action for the boot up to continue.. some don't.

For instance, considering RAID5 hot swappable....

If it's one drive on a raid, no biggie.. if it's two and you don't have hot spares.. that is a bigger issue. 'Scheduled' reboots, like when a new kernel comes out and you have time to be there and do something or have someone there if needed... it is a good time to be sure the self checks done by the server pass.

Basically, the longer the time before reboots, the more likely a error will occur. And it would be really bad if three or four of your drives suddenly didn't have enough strength to get up to speed... better that it is only one which can be easily swapped out.

Best, John Hinton

Drew Weaver

3:59 p.m.

-----Original Message-----

...

From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Johnny Hughes Sent: Tuesday, February 13, 2007 6:30 AM To: CentOS ML Subject: Re: [CentOS] reboot long uptimes?

On Tue, 2007-02-13 at 12:06 +0100, D Ivago wrote:

...
Hi,

I was just wondering if I should reboot some servers that are running

...

...
over 180 days?

They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3

...

...
disk checks f.e.) to reboot every six months...

About the only other reason I can think of is just to make sure it will restart when an emergency arises.

For instance, fans, drives, etc.....

For instance, considering RAID5 hot swappable....

That's not really statistically accurate.

X event occuring or not occuring has no probable impact on whether random event Y occurs.

Where X = rebooting, and y = 'something funky'.

Something funky could happen 5 minutes after the system starts, or 5 years.

-Drew

MrKiwi

14 Feb 14 Feb

9:59 p.m.

Drew Weaver wrote:

...

-----Original Message-----

...
From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Johnny Hughes Sent: Tuesday, February 13, 2007 6:30 AM To: CentOS ML Subject: Re: [CentOS] reboot long uptimes?

On Tue, 2007-02-13 at 12:06 +0100, D Ivago wrote:

...
Hi,

I was just wondering if I should reboot some servers that are running

...
...
over 180 days?

They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3

...
...
disk checks f.e.) to reboot every six months...

About the only other reason I can think of is just to make sure it will restart when an emergency arises.

For instance, fans, drives, etc.....

Some servers will balk if a fan doesn't run. Some servers balk if a hard drive isn't up to speed. These types of things only show up during a reboot. In the case of scsi raids, hot swap drives... if a drive goes bad some equipment will require some action for the boot up to continue.. some don't.

For instance, considering RAID5 hot swappable....

If it's one drive on a raid, no biggie.. if it's two and you don't have hot spares.. that is a bigger issue. 'Scheduled' reboots, like when a new kernel comes out and you have time to be there and do something or have someone there if needed... it is a good time to be sure the self checks done by the server pass.

Basically, the longer the time before reboots, the more likely a error will occur. And it would be really bad if three or four of your drives suddenly didn't have enough strength to get up to speed... better that it is only one which can be easily swapped out.

--

That's not really statistically accurate.

X event occuring or not occuring has no probable impact on whether random event Y occurs.

Where X = rebooting, and y = 'something funky'.

Something funky could happen 5 minutes after the system starts, or 5 years.

-Drew _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Drew - I don't think you are correct about those events being independent.

'X' isn't *rebooting* , it is the number of days *between* rebooting.

define/confirm some of the terms; "drive error" - errors which don't kill the drive immediately, but lurk until the next bounce. "Reboot period" - number of days between each reboot.

If the probability of a drive failing in any one 24h period is 'p', then the probability of a drive failing in a reboot period of 7 days (ie for a Windows Vista server ;) ) is 7p. If the reboot period is one year, the probability is 365p

One point John is making (i think!) is that, particularly with raid arrays, dealing with drive errors one at a time is easier than waiting until there are multiple.

The point at question ; How does a long reboot period contribute to the probability of >1 drive errors occurring at any boot event.

Statistically i believe the following is true. If (as in the above example) the probability of 1 drive failing is; p(1 drive failing)=365p assuming independent probability then the probability of 2 drives failing is; p(2 drives failing)=365p * 365p or days^2 * p^2

compare this to a 1day reboot period (ie an MS Exchange box?) p(1 drive failing)=p p(2 drives failing)=p^2

So the probability of 'problems' (ie one drive failing) is linear w respect to reboot period (days times p) The probability of 'disaster' (ie two drives failing) is massively higher with long reboot periods - 133,000 times higher for 365 days then 1 day. Of course 'p' is a very low number we hope!

These are the same calcs as for failure in RAID arrays - as non-intuitive as it may be, more drives in your array means a *greater* risk of a (any) drive failure - however you can of course mitigate the *effect* of this easily with hot-spares.

Food-for-thought? How do we mitigate the effect of multiple failures of this type? Imagine the situation where a box has been running for 10 years. We have to expect that the box will not keep BIOS time during a cold reboot - not a problem with ntp. What about BIOS on mobo/video cards/BIOS on Raid etc - can NVRAM be trusted to be 'NV' after 10 years of being hot? Obviously data is safe because of our meticulous backups???

Regards,

MrKiwi.

John Summerfield

13 Feb 13 Feb

10:23 p.m.

John Hinton wrote:

...

Drew Weaver wrote:

...
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Johnny Hughes Sent: Tuesday, February 13, 2007 6:30 AM To: CentOS ML Subject: Re: [CentOS] reboot long uptimes?

On Tue, 2007-02-13 at 12:06 +0100, D Ivago wrote:

...
Hi,

I was just wondering if I should reboot some servers that are running over 180 days?

They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...

About the only other reason I can think of is just to make sure it will restart when an emergency arises.

For instance, fans, drives, etc.....

Some servers will balk if a fan doesn't run. Some servers balk if a hard drive isn't up to speed. These types of things only show up during a reboot. In the case of scsi raids, hot swap drives... if a drive goes bad some equipment will require some action for the boot up to continue.. some don't.

How may spares do you carry?

If you want to do scheduled hardware mainennance that's one thing. Doing hardware mainennance because something broke during software maintenance is something entirely different.

If I schedule a reboot at 18:00 when hardly anyone's around, and the system fails because the hardware's suddenly broken, we're in trouble.

-- Cheers John -- spambait 1aaaaaaa@coco.merseine.nu Z1aaaaaaa@coco.merseine.nu Please do not reply off-list

Scott Silva

4:56 p.m.

Drew Weaver spake the following on 2/13/2007 7:03 AM:

...

-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Johnny Hughes Sent: Tuesday, February 13, 2007 6:30 AM To: CentOS ML Subject: Re: [CentOS] reboot long uptimes?

On Tue, 2007-02-13 at 12:06 +0100, D Ivago wrote:

...
Hi,

I was just wondering if I should reboot some servers that are running over 180 days?

They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...

I only reboot on kernel upgrades, that is usually more often than 6 months. But if you don't need to reboot for that reason, I would not reboot at all.

...
btw this uptime really confirms me how stable Centos 4.x really is and so I wonder how long some people's uptimes on the list are ;)

rmc

You should consider upgrading your kernels when security updates come out ... just to be safe. Especially for machines touching the internet.

I usually upgrade my kernels because I like to use LVM snapshots for backups and that has only really started working semi-well since 4.3 and even better in 4.4 ... so most of my machines get rebooted every new kernel, which is at least 2-3 times a year (sometimes more often).

That being said, I do have a non internet facing machine that has not been rebooted since it was installed with CentOS-4.0 on it one March 1, 2005. It is an internal router on my employer's infrastructure, and has been up for almost 2 years (and was installed on the day before CentOS-4 was officially released).

Thanks, Johnny Hughes

My uptime on some of our boxes are pretty bad, we have roughly 250 CentOS 4.x boxes here I'd say probably 25% of them initially suffer from some sort of bug with cpuspeed which causes kernel panics (until we disable cpuspeed), and then we have this other curious thing that happens with the filesystem where they will occasionally start spamming this "ext3-fs "Journal Has aborted" message until we reboot the boxes (nothing is wrong with the hardware in any of the cases).

Other than those 75 or so issues no problems at all.

-Drew

I have been seeing the ext3 errors also. I think it has something to do with the crappy Adaptec raid card in the server. I'm going to replace them with 3ware 9550's as soon as I can work out the migration.

-- MailScanner is like deodorant... You hope everybody uses it, and you notice quickly if they don't!!!!

Tom Brown

12:13 p.m.

...

I was just wondering if I should reboot some servers that are running over 180 days?

They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...

btw this uptime really confirms me how stable Centos 4.x really is and so I wonder how long some people's uptimes on the list are ;)

personally no - i never reboot unless i have to and i have some linux boxes running 4 years uptime or more.

gotcha though: be careful to tune the fs to not automagically fsck the filesystem on boot. When you have a large fileserver or mail store that even a scheduled reboot can cause hours of downtime as it runs its 'required' fsck!

Theo Band

12:31 p.m.

Tom Brown wrote:

...

gotcha though: be careful to tune the fs to not automagically fsck the filesystem on boot. When you have a large fileserver or mail store that even a scheduled reboot can cause hours of downtime as it runs its 'required' fsck!

There must be a reason to do fsck every now and then (once a year)? Could it be that some cleanup is needed even for ext3? I also have a server up for more than half a year and I was wondering whether to reboot it just out of precautions. I refrain from kernel updates for internal machines, it just gives a lot of work due to driver re-compilation. (Don't fix it if it ain't broken :-)

Theo

chrism＠imntv.com

12:58 p.m.

D Ivago wrote:

...

Hi,

I was just wondering if I should reboot some servers that are running over 180 days?

They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...

btw this uptime really confirms me how stable Centos 4.x really is and so I wonder how long some people's uptimes on the list are ;)

I have some Linux machines that have been up 1-2 years. It certainly wouldn't hurt anything if you reboot them, but hardly seems necessary if they're stable and patched to current levels. I had a RedHat 7.3 box that was up for several years. It finally rebooted when there was some maintenance in that wing of the datacenter that required them to interrupt power for a few minutes.

Cheers,

6985

Age (days ago)

6986

Last active (days ago)

discuss@lists.centos.org

11 comments

11 participants

tags (0)

participants (11)

chrism＠imntv.com
D Ivago
Drew Weaver
John Hinton
John Summerfield
Johnny Hughes
MrKiwi
Scott Silva
security
Theo Band
Tom Brown