Hello,
I've been looking at why all the recent jobs in
https://ci.centos.org/job/user-cont-conu-pr/
are pending.
The first pending job in the queue says
'userspace-containerization-ci-slave06 is offline', so I checked
https://ci.centos.org/computer/userspace-containerization-ci-slave06/
but don't see anything I could do there to resolve it.
I can 'ssh userspace-containerization(a)slave06.ci.centos.org' just fine,
so it doesn't seem to be offline.
What can I do to debug/resolve it ?
Thanks,
Jiri
Hi Folks,
There are a couple of pending security fixes to Jenkins and plugins
outstanding.
I'll be restarting the ci.centos.org master as soon as we can find a
lull in the queue.
Jobs should pick up where they left off, and get queued back when the
master returns.
I'll send a note here when we're finished up so we can keep track. If
you have any questions let us know here or in #centos-devel on freenode.
Cheers!
--
Brian Stinson
CentOS CI Infrastructure Team
I am looking for a way to use python3 on Jenkins slaves. Can someone provide me more details how it can be done ? The simplest way seems like installing rh-python* package and enabling it via scl.
--
--
Siteshwar Vashisht
Summary:
A large subset of the application nodes in apps.ci.centos.org were
placed in an unschedulable state around 13h00 UTC on September 27th.
Nodes were rebooted and service was partially restored, but new behavior
was exhibited overnight. Pods were able to schedule on the nodes but DNS
was not functional. DNS service was restored at around 15h00 UTC on
September 28th.
Timeline:
27-Sept-2018 13h00 UTC - 28-Sept-2018 15h00 UTC
Root Cause:
A previously applied update (applied around 17-August) to selinux-policy
caused some files to be relabeled. We did not, at the time, schedule a
reboot, but routine restarts of the docker service caused the nodes to
enter a degraded state.
Further file relabels caused the node boot process to complete, but also
in a degraded state.
Recovery:
Completed the rest of the pending updates, and rebooted the nodes to
clear the node-schedulable degradation. (27-Sept)
Triggered a full autorelabel and rebooted the nodes to clear the
node-boot degradation. (28-Sept)
Preventative Measures:
- Consider rebooting the nodes more often, perhaps on a regular
schedule to catch OS upgrade problems
- Complete the openshift-monitoring EPIC in the CI backlog, which will
add better checks for DNS.
Thank you very much for your patience during this outage.
--
Brian Stinson
CentOS CI Infrastructure Team
Hi,
I’m sure I’m doing something silly, but I’m not able to add two new SSSD team members to the Admin list on ci.centos.org so that the CI runs on their PRs without having to be explicitly approved.
Here’s what I did:
- I went to:
https://ci.centos.org/
- log in as the sssd user
- click on the ‘sssd’ project
- click on the ‘sssd-CentOS7’ project
- click on configure on the left
- select the “Build Triggers” tab
- there is a box called “Admin list” which contains usernames
- add the usernames there
- click “Save” and “Apply” on the lower bar
But still, both users’ PRs are being “gated” by the centos-ci user. Am I missing something? Does something need to be configured on the github side as well perhaps?
btw everything works for users who were added some time ago to the whitelist..but I’ve inherited the maintenance of the SSSD ci.centos.org so this is the first time I’m adding new developers myself.
Thank you for your help.
Anyone else encountering DNS resolution issues?
Seeing this coming out of one of our apps:
```
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/urllib3/connection.py", line
141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/usr/lib/python3.5/site-packages/urllib3/util/connection.py",
line 60, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/usr/lib64/python3.5/socket.py", line 733, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
...
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/requests/adapters.py", line
440, in send
timeout=timeout
File "/usr/lib/python3.5/site-packages/urllib3/connectionpool.py",
line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3.5/site-packages/urllib3/util/retry.py", line
388, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError:
HTTPSConnectionPool(host='api.github.com', port=443): Max retries
exceeded with url: /user (Caused by
NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object
at 0x7febf8618b00>: Failed to establish a new connection: [Errno -3]
Temporary failure in name resolution',))
```
And indeed, running a `getent ahostsv4 api.github.com` returns rc=2 in
a running container. Even `ci.centos.org` is not resolving.