[Ci-users] Unexpected outage 17:00 UTC Today - Service Restored

Wed Jun 14 16:20:13 UTC 2017
Karanbir Singh <kbsingh at centos.org>

On 14/06/17 10:51, Karanbir Singh wrote:
> On 14/06/17 08:18, Daniel Horák wrote:
>> Hi Brian,
>> I see lots of slaves offline, is it connected to the yesterday's outage
>> or is it different issue?
>> Thanks,
>> Daniel
>> On 06/13/17 19:57, Brian Stinson wrote:
>>> Hi Folks,
>>> Jenkins was leaking file descriptors and hit a limit today at 17:00 UTC,
>>> service was degraded for about 10 minutes, and service was fully
>>> restored at around 17:24.
>>> I've increased the open-files limit for jenkins and am working on tuning
>>> the garbage collector to mitigate this in the future.
>>> Thanks for your patience, and apologies for any inconvenience.
> I noticed a lot of slaves were down, and was pointed to this by a few
> people - on chat.openshift.io and irc.freenode : on investigation it
> looked like jenkins master had exhausted ram and other jobs on the
> machine were killing the cpu with loads upto 50.x; I had to restart the
> jenkins master to bring services back.
> once Brian is online, he will likely do a more through investigation and
> get back with details.

service went down again a few minutes back, I have restarted jenkins and
its up again.

Brian is on a long haul flight out of the US at the moment, I will try
and keep an eye on things, but were going to need him to look when he can

Karanbir Singh, Project Lead, The CentOS Project
+44-207-0999389 | http://www.centos.org/ | twitter.com/CentOS
GnuPG Key : http://www.karan.org/publickey.asc

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: OpenPGP digital signature
URL: <http://lists.centos.org/pipermail/ci-users/attachments/20170614/5255feda/attachment-0003.sig>