On 23/04/2025 06:54, Fabian Arrotin wrote:
On 22/04/2025 22:55, Peter Georg wrote:
On 22/04/2025 14.16, Fabian Arrotin wrote:
On 22/04/2025 14:06, Fabian Arrotin wrote:
We recently discovered that number of duffy ec2 instances (aws side) is much higher than what Duffy thinks is provisioned so at some point, it lost track of really deployed ec2 and we need to reconcile DB and reality)
We just need to :
stop duffy delete all duffy ec2 instances clean-up duffy DB restart duffy (and it will reprovision from scratch/zero)
Once restarted, you'll be able to resume your ci jobs and requesting duffy nodes
Maintenance is scheduled for """"Wednesday April 23rd, 11:30 am UTC time"""". You can convert to local time with $(date -d '2025-04-23 11:30 UTC')
So just sending this in advance so that you can either pause your builds, or just inject logic into your provisioning scripts for duffy nodes and just retry until service is available again.
Just to add that we'll also decommission ppc64le architecture in Duffy pool, due to upcoming DC move (and Duffy will entirely run from the cloud, where it's not possible to request ppc64le arch anyway - see https://pagure.io/centos-infra/issue/1590 )
Just to clarify: This means that in the future there will no longer be any possibility for SIGs to run tests for ppc64le within the CentOS project or is this only temporary? If permanent, in the case of the Kmods SIG this would mean that we would either stop providing kernels for ppc64le or ship them without performing any tests, i.e. the kernel might not even boot. Or am I missing another option?
That's a very good question .. Based on actual consumption I was under the impression that it wasn't used at all but you're right, in the last 30 days the average is 0.002109 deployed c9s ppc64le (with a peak up to 3) Problem is that we're not sure about keeping access to Power9 and keeping opennebula (itself using an opennebula controller on a dedicated host now used only for that single ppc64le hypervisor) doesn't make sense.
Let's so make that "temporary" and I'll just try to plumb ppc64le back but outside of opennebula, and so just using cloud images (contextualized through cloud-init local iso) or just plain virt-install guests .. I'll create another ticket for that but for sure opennebula has to go away and before DC move.
In the meantime, can you try to just boot ppc64le kernel (as it's not a fully functional test I guess) through qemu on a metal ec2 host ? I'll try to add ppc64le guests support soon but multiple moving targets on my plate right now :/
... and while I think I'd be able to add ppc64le arch support for duffy pool, I'm not sure that it would be possible after the DC move, as still waiting on confirmation that we'd be able to get a tunnel between the AWS VPC and new DC vlan but no confirmation yet (I'd say even more and more that it will be probably not possible)