[Ci-users] Re: [EXT] Re: CI infra outage (duffy.ci.centos.org) : Wednesday April 23rd

23 Apr 2025


      On 23/04/2025 06:54, Fabian Arrotin wrote:
...
On 22/04/2025 22:55, Peter Georg wrote:
...
On 22/04/2025 14.16, Fabian Arrotin wrote:
...
On 22/04/2025 14:06, Fabian Arrotin wrote:
...
We recently discovered that number of duffy ec2 instances (aws side) 
is much higher than what Duffy thinks is provisioned so at some 
point, it lost track of really deployed ec2 and we need to reconcile 
DB and reality)
We just need to :
stop duffy
     delete all duffy ec2 instances
     clean-up duffy DB
     restart duffy (and it will reprovision from scratch/zero)
Once restarted, you'll be able to resume your ci jobs and requesting 
duffy nodes
Maintenance is scheduled for """"Wednesday April 23rd, 11:30 am UTC 
time"""".
You can convert to local time with $(date -d '2025-04-23 11:30 UTC')
So just sending this in advance so that you can either pause your 
builds, or just inject logic into your provisioning scripts for 
duffy nodes and just retry until service is available again.
Just to add that we'll also decommission ppc64le architecture in 
Duffy pool, due to upcoming DC move (and Duffy will entirely run from 
the cloud, where it's not possible to request ppc64le arch anyway - 
see https://pagure.io/centos-infra/issue/1590 )
Just to clarify: This means that in the future there will no longer be 
any possibility for SIGs to run tests for ppc64le within the CentOS 
project or is this only temporary? If permanent, in the case of the 
Kmods SIG this would mean that we would either stop providing kernels 
for ppc64le or ship them without performing any tests, i.e. the kernel 
might not even boot. Or am I missing another option?
That's a very good question ..
Based on actual consumption I was under the impression that it wasn't 
used at all but you're right, in the last 30 days the average is 
0.002109 deployed c9s ppc64le (with a peak up to 3)
Problem is that we're not sure about keeping access to Power9 and 
keeping opennebula (itself using an opennebula controller on a dedicated 
host now used only for that single ppc64le hypervisor) doesn't make sense.
Let's so make that "temporary" and I'll just try to plumb ppc64le back 
but outside of opennebula, and so just using cloud images 
(contextualized through cloud-init local iso) or just plain virt-install 
guests .. I'll create another ticket for that but for sure opennebula 
has to go away and before DC move.
In the meantime, can you try to just boot ppc64le kernel (as it's not a 
fully functional test I guess) through qemu on a metal ec2 host ? I'll 
try to add ppc64le guests support soon but multiple moving targets on my 
plate right now :/
... and while I think I'd be able to add ppc64le arch support for duffy 
pool, I'm not sure that it would be possible after the DC move, as still 
waiting on confirmation that we'd be able to get a tunnel between the 
AWS VPC and new DC vlan but no confirmation yet (I'd say even more and 
more that it will be probably not possible)
-- 
Fabian Arrotin
The CentOS Project | https://www.centos.org
gpg key: 17F3B7A1 | @arrfab[@fosstodon.org]

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

[Ci-users] Re: [EXT] Re: CI infra outage (duffy.ci.centos.org) : Wednesday April 23rd