Hi,
I was just wondering if I should reboot some servers that are running over 180 days?
They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...
btw this uptime really confirms me how stable Centos 4.x really is and so I wonder how long some people's uptimes on the list are ;)
rmc
On Tue, 2007-02-13 at 12:06 +0100, D Ivago wrote:
Hi,
I was just wondering if I should reboot some servers that are running over 180 days?
They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...
I only reboot on kernel upgrades, that is usually more often than 6 months. But if you don't need to reboot for that reason, I would not reboot at all.
btw this uptime really confirms me how stable Centos 4.x really is and so I wonder how long some people's uptimes on the list are ;)
rmc
You should consider upgrading your kernels when security updates come out ... just to be safe. Especially for machines touching the internet.
I usually upgrade my kernels because I like to use LVM snapshots for backups and that has only really started working semi-well since 4.3 and even better in 4.4 ... so most of my machines get rebooted every new kernel, which is at least 2-3 times a year (sometimes more often).
That being said, I do have a non internet facing machine that has not been rebooted since it was installed with CentOS-4.0 on it one March 1, 2005. It is an internal router on my employer's infrastructure, and has been up for almost 2 years (and was installed on the day before CentOS-4 was officially released).
Thanks, Johnny Hughes
Johnny Hughes a écrit :
On Tue, 2007-02-13 at 12:06 +0100, D Ivago wrote:
Hi,
I was just wondering if I should reboot some servers that are running over 180 days?
They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...
I only reboot on kernel upgrades, that is usually more often than 6 months. But if you don't need to reboot for that reason, I would not reboot at all.
kernel and glibc.
btw this uptime really confirms me how stable Centos 4.x really is and so I wonder how long some people's uptimes on the list are ;)
rmc
You should consider upgrading your kernels when security updates come out ... just to be safe. Especially for machines touching the internet.
I usually upgrade my kernels because I like to use LVM snapshots for backups and that has only really started working semi-well since 4.3 and even better in 4.4 ... so most of my machines get rebooted every new kernel, which is at least 2-3 times a year (sometimes more often).
That being said, I do have a non internet facing machine that has not been rebooted since it was installed with CentOS-4.0 on it one March 1, 2005. It is an internal router on my employer's infrastructure, and has been up for almost 2 years (and was installed on the day before CentOS-4 was officially released).
Thanks, Johnny Hughes
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Johnny Hughes Sent: Tuesday, February 13, 2007 6:30 AM To: CentOS ML Subject: Re: [CentOS] reboot long uptimes?
On Tue, 2007-02-13 at 12:06 +0100, D Ivago wrote:
Hi,
I was just wondering if I should reboot some servers that are running over 180 days?
They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...
I only reboot on kernel upgrades, that is usually more often than 6 months. But if you don't need to reboot for that reason, I would not reboot at all.
btw this uptime really confirms me how stable Centos 4.x really is and so I wonder how long some people's uptimes on the list are ;)
rmc
You should consider upgrading your kernels when security updates come out ... just to be safe. Especially for machines touching the internet.
I usually upgrade my kernels because I like to use LVM snapshots for backups and that has only really started working semi-well since 4.3 and even better in 4.4 ... so most of my machines get rebooted every new kernel, which is at least 2-3 times a year (sometimes more often).
That being said, I do have a non internet facing machine that has not been rebooted since it was installed with CentOS-4.0 on it one March 1, 2005. It is an internal router on my employer's infrastructure, and has been up for almost 2 years (and was installed on the day before CentOS-4 was officially released).
Thanks, Johnny Hughes -------------
My uptime on some of our boxes are pretty bad, we have roughly 250 CentOS 4.x boxes here I'd say probably 25% of them initially suffer from some sort of bug with cpuspeed which causes kernel panics (until we disable cpuspeed), and then we have this other curious thing that happens with the filesystem where they will occasionally start spamming this "ext3-fs "Journal Has aborted" message until we reboot the boxes (nothing is wrong with the hardware in any of the cases).
Other than those 75 or so issues no problems at all.
-Drew
Drew Weaver wrote:
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Johnny Hughes Sent: Tuesday, February 13, 2007 6:30 AM To: CentOS ML Subject: Re: [CentOS] reboot long uptimes?
On Tue, 2007-02-13 at 12:06 +0100, D Ivago wrote:
Hi,
I was just wondering if I should reboot some servers that are running over 180 days?
They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...
About the only other reason I can think of is just to make sure it will restart when an emergency arises.
For instance, fans, drives, etc.....
Some servers will balk if a fan doesn't run. Some servers balk if a hard drive isn't up to speed. These types of things only show up during a reboot. In the case of scsi raids, hot swap drives... if a drive goes bad some equipment will require some action for the boot up to continue.. some don't.
For instance, considering RAID5 hot swappable....
If it's one drive on a raid, no biggie.. if it's two and you don't have hot spares.. that is a bigger issue. 'Scheduled' reboots, like when a new kernel comes out and you have time to be there and do something or have someone there if needed... it is a good time to be sure the self checks done by the server pass.
Basically, the longer the time before reboots, the more likely a error will occur. And it would be really bad if three or four of your drives suddenly didn't have enough strength to get up to speed... better that it is only one which can be easily swapped out.
Best, John Hinton
-----Original Message-----
From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Johnny Hughes Sent: Tuesday, February 13, 2007 6:30 AM To: CentOS ML Subject: Re: [CentOS] reboot long uptimes?
On Tue, 2007-02-13 at 12:06 +0100, D Ivago wrote:
Hi,
I was just wondering if I should reboot some servers that are running
over 180 days?
They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3
disk checks f.e.) to reboot every six months...
About the only other reason I can think of is just to make sure it will restart when an emergency arises.
For instance, fans, drives, etc.....
Some servers will balk if a fan doesn't run. Some servers balk if a hard drive isn't up to speed. These types of things only show up during a reboot. In the case of scsi raids, hot swap drives... if a drive goes bad some equipment will require some action for the boot up to continue.. some don't.
For instance, considering RAID5 hot swappable....
If it's one drive on a raid, no biggie.. if it's two and you don't have hot spares.. that is a bigger issue. 'Scheduled' reboots, like when a new kernel comes out and you have time to be there and do something or have someone there if needed... it is a good time to be sure the self checks done by the server pass.
Basically, the longer the time before reboots, the more likely a error will occur. And it would be really bad if three or four of your drives suddenly didn't have enough strength to get up to speed... better that it is only one which can be easily swapped out.
--
That's not really statistically accurate.
X event occuring or not occuring has no probable impact on whether random event Y occurs.
Where X = rebooting, and y = 'something funky'.
Something funky could happen 5 minutes after the system starts, or 5 years.
-Drew
Drew Weaver wrote:
-----Original Message-----
From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Johnny Hughes Sent: Tuesday, February 13, 2007 6:30 AM To: CentOS ML Subject: Re: [CentOS] reboot long uptimes?
On Tue, 2007-02-13 at 12:06 +0100, D Ivago wrote:
Hi,
I was just wondering if I should reboot some servers that are running
over 180 days?
They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3
disk checks f.e.) to reboot every six months...
About the only other reason I can think of is just to make sure it will restart when an emergency arises.
For instance, fans, drives, etc.....
Some servers will balk if a fan doesn't run. Some servers balk if a hard drive isn't up to speed. These types of things only show up during a reboot. In the case of scsi raids, hot swap drives... if a drive goes bad some equipment will require some action for the boot up to continue.. some don't.
For instance, considering RAID5 hot swappable....
If it's one drive on a raid, no biggie.. if it's two and you don't have hot spares.. that is a bigger issue. 'Scheduled' reboots, like when a new kernel comes out and you have time to be there and do something or have someone there if needed... it is a good time to be sure the self checks done by the server pass.
Basically, the longer the time before reboots, the more likely a error will occur. And it would be really bad if three or four of your drives suddenly didn't have enough strength to get up to speed... better that it is only one which can be easily swapped out.
--
That's not really statistically accurate.
X event occuring or not occuring has no probable impact on whether random event Y occurs.
Where X = rebooting, and y = 'something funky'.
Something funky could happen 5 minutes after the system starts, or 5 years.
-Drew _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Drew - I don't think you are correct about those events being independent.
'X' isn't *rebooting* , it is the number of days *between* rebooting.
define/confirm some of the terms; "drive error" - errors which don't kill the drive immediately, but lurk until the next bounce. "Reboot period" - number of days between each reboot.
If the probability of a drive failing in any one 24h period is 'p', then the probability of a drive failing in a reboot period of 7 days (ie for a Windows Vista server ;) ) is 7p. If the reboot period is one year, the probability is 365p
One point John is making (i think!) is that, particularly with raid arrays, dealing with drive errors one at a time is easier than waiting until there are multiple.
The point at question ; How does a long reboot period contribute to the probability of >1 drive errors occurring at any boot event.
Statistically i believe the following is true. If (as in the above example) the probability of 1 drive failing is; p(1 drive failing)=365p assuming independent probability then the probability of 2 drives failing is; p(2 drives failing)=365p * 365p or days^2 * p^2
compare this to a 1day reboot period (ie an MS Exchange box?) p(1 drive failing)=p p(2 drives failing)=p^2
So the probability of 'problems' (ie one drive failing) is linear w respect to reboot period (days times p) The probability of 'disaster' (ie two drives failing) is massively higher with long reboot periods - 133,000 times higher for 365 days then 1 day. Of course 'p' is a very low number we hope!
These are the same calcs as for failure in RAID arrays - as non-intuitive as it may be, more drives in your array means a *greater* risk of a (any) drive failure - however you can of course mitigate the *effect* of this easily with hot-spares.
Food-for-thought? How do we mitigate the effect of multiple failures of this type? Imagine the situation where a box has been running for 10 years. We have to expect that the box will not keep BIOS time during a cold reboot - not a problem with ntp. What about BIOS on mobo/video cards/BIOS on Raid etc - can NVRAM be trusted to be 'NV' after 10 years of being hot? Obviously data is safe because of our meticulous backups???
Regards,
MrKiwi.
John Hinton wrote:
Drew Weaver wrote:
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Johnny Hughes Sent: Tuesday, February 13, 2007 6:30 AM To: CentOS ML Subject: Re: [CentOS] reboot long uptimes?
On Tue, 2007-02-13 at 12:06 +0100, D Ivago wrote:
Hi,
I was just wondering if I should reboot some servers that are running over 180 days?
They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...
About the only other reason I can think of is just to make sure it will restart when an emergency arises.
For instance, fans, drives, etc.....
Some servers will balk if a fan doesn't run. Some servers balk if a hard drive isn't up to speed. These types of things only show up during a reboot. In the case of scsi raids, hot swap drives... if a drive goes bad some equipment will require some action for the boot up to continue.. some don't.
How may spares do you carry?
If you want to do scheduled hardware mainennance that's one thing. Doing hardware mainennance because something broke during software maintenance is something entirely different.
If I schedule a reboot at 18:00 when hardly anyone's around, and the system fails because the hardware's suddenly broken, we're in trouble.
Drew Weaver spake the following on 2/13/2007 7:03 AM:
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Johnny Hughes Sent: Tuesday, February 13, 2007 6:30 AM To: CentOS ML Subject: Re: [CentOS] reboot long uptimes?
On Tue, 2007-02-13 at 12:06 +0100, D Ivago wrote:
Hi,
I was just wondering if I should reboot some servers that are running over 180 days?
They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...
I only reboot on kernel upgrades, that is usually more often than 6 months. But if you don't need to reboot for that reason, I would not reboot at all.
btw this uptime really confirms me how stable Centos 4.x really is and so I wonder how long some people's uptimes on the list are ;)
rmc
You should consider upgrading your kernels when security updates come out ... just to be safe. Especially for machines touching the internet.
I usually upgrade my kernels because I like to use LVM snapshots for backups and that has only really started working semi-well since 4.3 and even better in 4.4 ... so most of my machines get rebooted every new kernel, which is at least 2-3 times a year (sometimes more often).
That being said, I do have a non internet facing machine that has not been rebooted since it was installed with CentOS-4.0 on it one March 1, 2005. It is an internal router on my employer's infrastructure, and has been up for almost 2 years (and was installed on the day before CentOS-4 was officially released).
Thanks, Johnny Hughes
My uptime on some of our boxes are pretty bad, we have roughly 250 CentOS 4.x boxes here I'd say probably 25% of them initially suffer from some sort of bug with cpuspeed which causes kernel panics (until we disable cpuspeed), and then we have this other curious thing that happens with the filesystem where they will occasionally start spamming this "ext3-fs "Journal Has aborted" message until we reboot the boxes (nothing is wrong with the hardware in any of the cases).
Other than those 75 or so issues no problems at all.
-Drew
I have been seeing the ext3 errors also. I think it has something to do with the crappy Adaptec raid card in the server. I'm going to replace them with 3ware 9550's as soon as I can work out the migration.
I was just wondering if I should reboot some servers that are running over 180 days?
They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...
btw this uptime really confirms me how stable Centos 4.x really is and so I wonder how long some people's uptimes on the list are ;)
personally no - i never reboot unless i have to and i have some linux boxes running 4 years uptime or more.
gotcha though: be careful to tune the fs to not automagically fsck the filesystem on boot. When you have a large fileserver or mail store that even a scheduled reboot can cause hours of downtime as it runs its 'required' fsck!
Tom Brown wrote:
gotcha though: be careful to tune the fs to not automagically fsck the filesystem on boot. When you have a large fileserver or mail store that even a scheduled reboot can cause hours of downtime as it runs its 'required' fsck!
There must be a reason to do fsck every now and then (once a year)? Could it be that some cleanup is needed even for ext3? I also have a server up for more than half a year and I was wondering whether to reboot it just out of precautions. I refrain from kernel updates for internal machines, it just gives a lot of work due to driver re-compilation. (Don't fix it if it ain't broken :-)
Theo
D Ivago wrote:
Hi,
I was just wondering if I should reboot some servers that are running over 180 days?
They are still stable and have no problems, also top shows no zombie processes or such, but maybe it's better for the hardware (like ext3 disk checks f.e.) to reboot every six months...
btw this uptime really confirms me how stable Centos 4.x really is and so I wonder how long some people's uptimes on the list are ;)
I have some Linux machines that have been up 1-2 years. It certainly wouldn't hurt anything if you reboot them, but hardly seems necessary if they're stable and patched to current levels. I had a RedHat 7.3 box that was up for several years. It finally rebooted when there was some maintenance in that wing of the datacenter that required them to interrupt power for a few minutes.
Cheers,