On Wed, Jun 14, 2017 at 1:32 PM, Karanbir Singh <kbsingh at centos.org> wrote: > On 14/06/17 15:40, Fabian Arrotin wrote: > > On 14/06/17 11:51, Karanbir Singh wrote: > >> > >> > >> On 14/06/17 08:18, Daniel Horák wrote: > >>> Hi Brian, > >>> I see lots of slaves offline, is it connected to the yesterday's outage > >>> or is it different issue? > >>> > >>> Thanks, > >>> Daniel > >>> > >>> On 06/13/17 19:57, Brian Stinson wrote: > >>>> Hi Folks, > >>>> > >>>> Jenkins was leaking file descriptors and hit a limit today at 17:00 > UTC, > >>>> service was degraded for about 10 minutes, and service was fully > >>>> restored at around 17:24. > >>>> > >>>> I've increased the open-files limit for jenkins and am working on > tuning > >>>> the garbage collector to mitigate this in the future. > >>>> > >>>> Thanks for your patience, and apologies for any inconvenience. > >>>> > >> > >> I noticed a lot of slaves were down, and was pointed to this by a few > >> people - on chat.openshift.io and irc.freenode : on investigation it > >> looked like jenkins master had exhausted ram and other jobs on the > >> machine were killing the cpu with loads upto 50.x; I had to restart the > >> jenkins master to bring services back. > >> > >> once Brian is online, he will likely do a more through investigation and > >> get back with details. > >> > >> regards > >> > > > > I spoke with Brian last week about a plan to move Jenkins to another > > node : actually jenkins master is running on a small VM (2 vcpus and 4Gb > > of RAM), and load average is indeed always high (actually above 20, to > > give an example). > > Let me sync with him (as we already have the node that will be used as > > replacement) to schedule a maintenance window for this > > > > with 20 you might have caught it just before things went south, again. > lets get Jenkins moved to a new host, more ram and compute etc, but I > think we might need to look at whats going south here. > > I've disabled the JMS Plugin for now, that seems to have had a huge > impact on the system stability. Am going to leave that off till we can > workout what the underlaying issue here is. > > Regards, > Scott wrote that plugin and can look at what is happening. We need that for our pipeline triggering it has been working fine for a while so it would be good to understand what the root cause issue is before just disabling it. > > > > -- > Karanbir Singh, Project Lead, The CentOS Project > +44-207-0999389 | http://www.centos.org/ | twitter.com/CentOS > GnuPG Key : http://www.karan.org/publickey.asc > > > _______________________________________________ > Ci-users mailing list > Ci-users at centos.org > https://lists.centos.org/mailman/listinfo/ci-users > > -- -== @ri ==- -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/ci-users/attachments/20170614/e1803193/attachment-0005.html>