On 14/06/17 08:18, Daniel Horák wrote:
Hi Brian, I see lots of slaves offline, is it connected to the yesterday's outage or is it different issue?
Thanks, Daniel
On 06/13/17 19:57, Brian Stinson wrote:
Hi Folks,
Jenkins was leaking file descriptors and hit a limit today at 17:00 UTC, service was degraded for about 10 minutes, and service was fully restored at around 17:24.
I've increased the open-files limit for jenkins and am working on tuning the garbage collector to mitigate this in the future.
Thanks for your patience, and apologies for any inconvenience.
I noticed a lot of slaves were down, and was pointed to this by a few people - on chat.openshift.io and irc.freenode : on investigation it looked like jenkins master had exhausted ram and other jobs on the machine were killing the cpu with loads upto 50.x; I had to restart the jenkins master to bring services back.
once Brian is online, he will likely do a more through investigation and get back with details.
regards