CI-users October 2020

ci-users@lists.centos.org

7 participants
7 discussions

IBM Power ppc64 (Big-Endian) usage survey
by Fabian Arrotin 04 Feb '21

04 Feb '21

Hi all, We seem to have issue with an IBM Power 8 used within CI and so we have to re-balance CI nodes that you can request through Duffy API for ppc64/ppc64le. My question sounds more like a survey : I think that most (if not all) CI projects actually still building (and testing in CI) just target the ppc64le architecture (Little Endian) and so not the ppc64 (Big Endian) one. We'd like to hear from you and depending on the needs, we can eventually drop ppc64 architecture for CI tests, and so have more (re-balanced) ppc64le resources . Opinions ? -- Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 17F3B7A1 | twitter: @arrfab

1 1

Problem with parametrized jobs and URLTrigger
by Alfredo Moralejo Alonso 21 Dec '20

21 Dec '20

Hi, We are hitting some strange issue in some of the CloudSIG jobs since october, 5th. We have some parameterized jobs triggered by URLTrigger [1]. Until a couple of days ago, those jobs were executed with default values for each parameter [2] but now, when the job is triggered by URLTrigger the parameters are not passed at all to the job [3]. If i use "Build with parameters" manually, the job runs fine using default values. Is anyone hitting similar issues? any hint of what may be the problem? [1] https://ci.centos.org/view/rdo/view/weirdo-pipelines/view/weirdo-promote-te… [2] https://ci.centos.org/view/rdo/view/weirdo-pipelines/view/weirdo-promote-te… [3] https://ci.centos.org/view/rdo/view/weirdo-pipelines/view/weirdo-promote-te… Best regards, Alfredo

3 7

Using BuildConfigs/ImageStreams from OpenShift in Duffy machines
by Niels de Vos 26 Oct '20

26 Oct '20

Hi! To speed up some of the testing we do on bare-metal machines provisioned through Duffy, I would like to pull pre-build images from the OpenShift registry. The images are built through a BuildConfig and placed in an ImageStream. Now, it seems that the Duffy provisioned bare-metal systems can not pull from the internal OpenShift registry: [root@n46 ~]# podman pull image-registry.openshift-image-registry.svc.apps.ocp.ci.centos.org:5000/ceph-csi/ceph-csi:test Trying to pull image-registry.openshift-image-registry.svc.apps.ocp.ci.centos.org:5000/ceph-csi/ceph-csi:test... Get https://image-registry.openshift-image-registry.svc.apps.ocp.ci.centos.org:…: dial tcp 172.19.0.254:5000: connect: no route to host Error: error pulling image "image-registry.openshift-image-registry.svc.apps.ocp.ci.centos.org:5000/ceph-csi/ceph-csi:test": unable to pull image-registry.openshift-image-registry.svc.apps.ocp.ci.centos.org:5000/ceph-csi/ceph-csi:test: unable to pull image: Error initializing source docker://image-registry.openshift-image-registry.svc.apps.ocp.ci.centos.org:5000/ceph-csi/ceph-csi:test: error pinging docker registry image-registry.openshift-image-registry.svc.apps.ocp.ci.centos.org:5000: Get https://image-registry.openshift-image-registry.svc.apps.ocp.ci.centos.org:…: dial tcp 172.19.0.254:5000: connect: no route to host I wonder if this is intentional, or if this is a little too strict? If this can not be allowed through the firewall, what is the recommendation to use these images? Maybe we should deploy our own registry and push the images there... Thanks! Niels

1 0

devtools-ci-slave04 is offline
by Katerina Foniok 09 Oct '20

09 Oct '20

Hello guys, our jobs on ci.centos.org are pending because the *devtools-ci-slave04* is offline. Can someone take a look, please? One of the affected jobs is here <https://ci.centos.org/view/Devtools/job/devtools-rh-che-rh-che-prcheck-dev.…> . Thank you! Have a great day, Katka

3 8

[unscheduled outage] hardware issue impacting CI services (including https://ci.centos.org)
by Fabian Arrotin 04 Oct '20

04 Oct '20

Yesterday (Saturday) evening we got zabbix notifications that some nodes in CI environment were unreachable. After a quick look, I discovered that it was an embedded network switch in a chassis hosting multiple nodes (including but not limited to jenkins node behind ci.centos.org) that went nuts. I tried a remote "hardware reset" and nodes were back online after ~10min. But this morning (sunday), I see through zabbix that same issue happened again, and in the hour after I already did the "hardware reset", but this time, even that doesn't work anymore. So that means that we have a network switch not working anymore. As that chassis (like almost *all* equipment in CI) *isn't* under warranty, we'll see on monday what can be done and how we give priority to try to dispatch services elsewhere (and that probably means then powering down other services , depending on priority that will be given), but it's easy to understand that we can't even give any ETA at this point. Thanks for your comprehending, -- Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 17F3B7A1 | twitter: @arrfab

1 1

[infra outage] : nfs storage server hosting PVs for openshift
by Fabian Arrotin 02 Oct '20

02 Oct '20

we had a kernel panic on the storage box used as nfs server for openshift (both okd and ocp) and machine doesn't come back online due to md device refusing to start. machine is now in single-mode to analyze the situation and try to fix it. We'll send more details and progress when possible -- Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 17F3B7A1 | twitter: @arrfab

1 1

Infra : scheduled hardware maintenance (Openshift/NFS)
by Fabian Arrotin 01 Oct '20

01 Oct '20

Due to a hardware maintenance that needs to take place on the NFS storage node used by openshift ("legacy" and current one - ocp ), we'll have to shutdown the openshift cluster, and then proceed with hardware maintenance on the NFS server (that itself needs to be powered down, no way to actually do that "online") Migration is scheduled for """"Wednesday September 30th, 12:00 pm UTC time"""". You can convert to local time with $(date -d '2020-09-30 12:00 UTC') The expected "downtime" is estimated to ~60 minutes , time needed to shutdown the machine, install new disks, restart the machine and also do some updates and tuning on the setup. For more informations about this, here are some relevant tickets that were created for the perf issue in openshift and nfs : https://pagure.io/centos-infra/issue/53 https://pagure.io/centos-infra/issue/105 https://pagure.io/centos-infra/issue/85 https://pagure.io/centos-infra/issue/26 <subliminal message> PS : worth noting that while we'll investigate reports on new ocp cluster, we'll probably not spend time investigating in the old/legacy one, that projects are supposed to migrate away from soon, as the legacy openshift setup will disappear soon (see https://pagure.io/centos-infra/issue/16) </subliminal message> Thanks for your comprehending and patience. on behalf of the CI Infra team, -- Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 17F3B7A1 | twitter: @arrfab

3 4

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

CI-users October 2020