Issue Summary =============
A misconfigured ulimit on jenkins.ci.centos.org caused Jenkins to fail with too many open files. This also caused the root volume to fill up because of noisy messages in the logs. Access to the Jenkins HTTP interface was affected during this period.
Root Cause ==========
During the reconfigure and move to a new host, the 'nofile'ulimits to jenkins were reset to the default (4096). Jenkins reached this limit before the next scheduled garbage collection.
Recovery ========
At 08h12 we cleared out the jenkins log to free up disk space and set the ulimits for the jenkins user to the appropriate value, and jenkins was restarted.
Corrective Measures ===================
The ulimit change was reflected in our ansible scripts and deployed.
Impact ======
Jobs running during the window should have completed, but may not have reported back status to Jenkins. SCM/Github jobs that would have been triggered during that period were picked up when the Jenkins service came live again.
Jobs with triggers through other means (messaging, HTTP POST, etc.) may not have been launched.
We appreciate your patience during this outage, and apologize for any inconvenience.
-- Brian Stinson CentOS CI Infrastructure Team