[Ci-users] Unexpected outage 17:00 UTC Today - Service Restored

Wed Jun 14 17:28:54 UTC 2017
Karanbir Singh <kbsingh at centos.org>

hi Ari,

Absolutely! Lets see if we can get Brian for sometime later this week,
or early next week, and thrash through some options.

Regards,


On 14/06/17 18:05, Ari LiVigni wrote:
> Hi KB,
> 
> In the future our team would like to help with Jenkins maintenance and
> issues.  This is something I have spoken about with Brian.
> Let me know if this is an option you would like to pursue in the near term.
> 
> 
> 
> On Wed, Jun 14, 2017 at 12:20 PM, Karanbir Singh <kbsingh at centos.org
> <mailto:kbsingh at centos.org>> wrote:
> 
> 
> 
>     On 14/06/17 10:51, Karanbir Singh wrote:
>     >
>     >
>     > On 14/06/17 08:18, Daniel Horák wrote:
>     >> Hi Brian,
>     >> I see lots of slaves offline, is it connected to the yesterday's outage
>     >> or is it different issue?
>     >>
>     >> Thanks,
>     >> Daniel
>     >>
>     >> On 06/13/17 19:57, Brian Stinson wrote:
>     >>> Hi Folks,
>     >>>
>     >>> Jenkins was leaking file descriptors and hit a limit today at 17:00 UTC,
>     >>> service was degraded for about 10 minutes, and service was fully
>     >>> restored at around 17:24.
>     >>>
>     >>> I've increased the open-files limit for jenkins and am working on tuning
>     >>> the garbage collector to mitigate this in the future.
>     >>>
>     >>> Thanks for your patience, and apologies for any inconvenience.
>     >>>
>     >
>     > I noticed a lot of slaves were down, and was pointed to this by a few
>     > people - on chat.openshift.io <http://chat.openshift.io> and irc.freenode : on
>     investigation it
>     > looked like jenkins master had exhausted ram and other jobs on the
>     > machine were killing the cpu with loads upto 50.x; I had to restart the
>     > jenkins master to bring services back.
>     >
>     > once Brian is online, he will likely do a more through investigation and
>     > get back with details.
>     >
> 
>     service went down again a few minutes back, I have restarted jenkins and
>     its up again.
> 
>     Brian is on a long haul flight out of the US at the moment, I will try
>     and keep an eye on things, but were going to need him to look when
>     he can
> 
> 


-- 
Karanbir Singh, Project Lead, The CentOS Project
+44-207-0999389 | http://www.centos.org/ | twitter.com/CentOS
GnuPG Key : http://www.karan.org/publickey.asc

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: OpenPGP digital signature
URL: <http://lists.centos.org/pipermail/ci-users/attachments/20170614/6eb7159b/attachment-0002.sig>