Re: [Ci-users] Unexpected outage 17:00 UTC Today - Service Restored

14 Jun 2017


      Hi KB,
In the future our team would like to help with Jenkins maintenance and
issues.  This is something I have spoken about with Brian.
Let me know if this is an option you would like to pursue in the near term.
On Wed, Jun 14, 2017 at 12:20 PM, Karanbir Singh kbsingh@centos.org wrote:
...
On 14/06/17 10:51, Karanbir Singh wrote:
...
On 14/06/17 08:18, Daniel Horák wrote:
...
Hi Brian,
I see lots of slaves offline, is it connected to the yesterday's outage
or is it different issue?
Thanks,
Daniel
On 06/13/17 19:57, Brian Stinson wrote:
...
Hi Folks,
Jenkins was leaking file descriptors and hit a limit today at 17:00
UTC,
...
...
...
service was degraded for about 10 minutes, and service was fully
restored at around 17:24.
I've increased the open-files limit for jenkins and am working on
tuning
...
...
...
the garbage collector to mitigate this in the future.
Thanks for your patience, and apologies for any inconvenience.
I noticed a lot of slaves were down, and was pointed to this by a few
people - on chat.openshift.io and irc.freenode : on investigation it
looked like jenkins master had exhausted ram and other jobs on the
machine were killing the cpu with loads upto 50.x; I had to restart the
jenkins master to bring services back.
once Brian is online, he will likely do a more through investigation and
get back with details.
service went down again a few minutes back, I have restarted jenkins and
its up again.
Brian is on a long haul flight out of the US at the moment, I will try
and keep an eye on things, but were going to need him to look when he can
--
Karanbir Singh, Project Lead, The CentOS Project
+44-207-0999389 | http://www.centos.org/ | twitter.com/CentOS
GnuPG Key : http://www.karan.org/publickey.asc

Ci-users mailing list
Ci-users@centos.org
https://lists.centos.org/mailman/listinfo/ci-users
-- 
-== @ri ==-

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Ci-users] Unexpected outage 17:00 UTC Today - Service Restored