On 14/06/17 15:40, Fabian Arrotin wrote: > On 14/06/17 11:51, Karanbir Singh wrote: >> >> >> On 14/06/17 08:18, Daniel Horák wrote: >>> Hi Brian, >>> I see lots of slaves offline, is it connected to the yesterday's outage >>> or is it different issue? >>> >>> Thanks, >>> Daniel >>> >>> On 06/13/17 19:57, Brian Stinson wrote: >>>> Hi Folks, >>>> >>>> Jenkins was leaking file descriptors and hit a limit today at 17:00 UTC, >>>> service was degraded for about 10 minutes, and service was fully >>>> restored at around 17:24. >>>> >>>> I've increased the open-files limit for jenkins and am working on tuning >>>> the garbage collector to mitigate this in the future. >>>> >>>> Thanks for your patience, and apologies for any inconvenience. >>>> >> >> I noticed a lot of slaves were down, and was pointed to this by a few >> people - on chat.openshift.io and irc.freenode : on investigation it >> looked like jenkins master had exhausted ram and other jobs on the >> machine were killing the cpu with loads upto 50.x; I had to restart the >> jenkins master to bring services back. >> >> once Brian is online, he will likely do a more through investigation and >> get back with details. >> >> regards >> > > I spoke with Brian last week about a plan to move Jenkins to another > node : actually jenkins master is running on a small VM (2 vcpus and 4Gb > of RAM), and load average is indeed always high (actually above 20, to > give an example). > Let me sync with him (as we already have the node that will be used as > replacement) to schedule a maintenance window for this > with 20 you might have caught it just before things went south, again. lets get Jenkins moved to a new host, more ram and compute etc, but I think we might need to look at whats going south here. I've disabled the JMS Plugin for now, that seems to have had a huge impact on the system stability. Am going to leave that off till we can workout what the underlaying issue here is. Regards, -- Karanbir Singh, Project Lead, The CentOS Project +44-207-0999389 | http://www.centos.org/ | twitter.com/CentOS GnuPG Key : http://www.karan.org/publickey.asc -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 490 bytes Desc: OpenPGP digital signature URL: <http://lists.centos.org/pipermail/ci-users/attachments/20170614/bee87b05/attachment-0005.sig>