hi Ari,
Absolutely! Lets see if we can get Brian for sometime later this week, or early next week, and thrash through some options.
Regards,
On 14/06/17 18:05, Ari LiVigni wrote:
Hi KB,
In the future our team would like to help with Jenkins maintenance and issues. This is something I have spoken about with Brian. Let me know if this is an option you would like to pursue in the near term.
On Wed, Jun 14, 2017 at 12:20 PM, Karanbir Singh <kbsingh@centos.org mailto:kbsingh@centos.org> wrote:
On 14/06/17 10:51, Karanbir Singh wrote: > > > On 14/06/17 08:18, Daniel Horák wrote: >> Hi Brian, >> I see lots of slaves offline, is it connected to the yesterday's outage >> or is it different issue? >> >> Thanks, >> Daniel >> >> On 06/13/17 19:57, Brian Stinson wrote: >>> Hi Folks, >>> >>> Jenkins was leaking file descriptors and hit a limit today at 17:00 UTC, >>> service was degraded for about 10 minutes, and service was fully >>> restored at around 17:24. >>> >>> I've increased the open-files limit for jenkins and am working on tuning >>> the garbage collector to mitigate this in the future. >>> >>> Thanks for your patience, and apologies for any inconvenience. >>> > > I noticed a lot of slaves were down, and was pointed to this by a few > people - on chat.openshift.io <http://chat.openshift.io> and irc.freenode : on investigation it > looked like jenkins master had exhausted ram and other jobs on the > machine were killing the cpu with loads upto 50.x; I had to restart the > jenkins master to bring services back. > > once Brian is online, he will likely do a more through investigation and > get back with details. > service went down again a few minutes back, I have restarted jenkins and its up again. Brian is on a long haul flight out of the US at the moment, I will try and keep an eye on things, but were going to need him to look when he can