[Ci-users] Unexpected outage 17:00 UTC Today - Service Restored
kbsingh at centos.org
Wed Jun 14 09:51:47 UTC 2017
On 14/06/17 08:18, Daniel Horák wrote:
> Hi Brian,
> I see lots of slaves offline, is it connected to the yesterday's outage
> or is it different issue?
> On 06/13/17 19:57, Brian Stinson wrote:
>> Hi Folks,
>> Jenkins was leaking file descriptors and hit a limit today at 17:00 UTC,
>> service was degraded for about 10 minutes, and service was fully
>> restored at around 17:24.
>> I've increased the open-files limit for jenkins and am working on tuning
>> the garbage collector to mitigate this in the future.
>> Thanks for your patience, and apologies for any inconvenience.
I noticed a lot of slaves were down, and was pointed to this by a few
people - on chat.openshift.io and irc.freenode : on investigation it
looked like jenkins master had exhausted ram and other jobs on the
machine were killing the cpu with loads upto 50.x; I had to restart the
jenkins master to bring services back.
once Brian is online, he will likely do a more through investigation and
get back with details.
Karanbir Singh, Project Lead, The CentOS Project
+44-207-0999389 | http://www.centos.org/ | twitter.com/CentOS
GnuPG Key : http://www.karan.org/publickey.asc
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 490 bytes
Desc: OpenPGP digital signature
More information about the Ci-users