Hello!
After a couple of weeks of back and forth with the always helpful infra team I was able to migrate most of our (systemd) jobs over to the EC2 machines. As we require at least an access to KVM (and the EC2 VMs, unfortunately, don't support nested virt), I had to resort to metal machines over the "plain" VMs.
After monitoring the situation for a couple of days I noticed an issue[0] which might bite us in the future if/when other projects migrate over to the metal machines as well (since several of them require at least KVM too) - Duffy currently provisions only one metal machine at a time, and returns an API error for all other API requests for the same pool in the meantime:
can't reserve nodes: quantity=1 pool='metal-ec2-c5n-centos-8s-x86_64'
As the provisioning takes a bit, this delay might stack up quite noticeably. For example, after firing up 10 jobs (current project quota) at once, all for the metal pool, the last one got the machine after ~30 minutes - and that's only one project. If/when other projects migrate over to the metal machines as well, this might get quickly out of hand.
I understand that the metal machines are expensive, and I'm not sure how many other projects are eventually going to migrate over to them, but I guess in the future some balance will need to be found out between the cost and available metal nodes. Is this even up to a discussion, or the size of the metal pools is given and can't/won't be adjusted?
Thank you.
Cheers, Frantisek
[0] https://pagure.io/centos-infra/issue/865#comment-811365
On 8/16/22 15:58, Camila Granella wrote:
Hello everyone,
This is a friendly reminder of the current and upcoming status of CentOS CI changes (check [1]).
Projects that opted-in for continuing on CentOS CI have been migrated, and the new Duffy API is available. With that, /*phase 0* has been completed/. Regarding */phase 1/*, we are still working on a permanent fix for the DB Concurrency issues [2]. Also, as for our OpenShift new deployment, we have a staging environment up and running, and it should be available at the beginning of September 2022.
In October 2022 we begin /phase 2/ when we will work through the following items (these were also previously communicated in [1]):
- legacy/compatibility API endpoint will handover EC2 instances instead of local seamicro nodes (VMs vs bare metal)
- bare-metal options will be available through the new API only
- legacy seamicro and aarch64/ThunderX hardware are decommissioned
- only remaining "on-premises" option is ppc64le (local cloud)Feel free to reach out if you have any questions or concerns
The final deadline for decommissioning the old infrastructure (/phase 3/) is *December 2022*. We will be communicating further until then, and meanwhile, reach out to any of us in case you have any questions.
Regards,
[1] [ci-users] Changes on CentOS CI and next steps: https://lists.centos.org/pipermail/ci-users/2022-June/004547.html https://lists.centos.org/pipermail/ci-users/2022-June/004547.html [2] DB Concurrency issues: https://github.com/CentOS/duffy/issues/523 https://github.com/CentOS/duffy/issues/523 --
Camila Granella
Associate Manager, Software Engineering
Red Hathttps://www.redhat.com/
@Red Hat https://twitter.com/redhat Red Hat https://www.linkedin.com/company/red-hat Red Hat https://www.facebook.com/RedHatInc https://www.redhat.com/
CI-users mailing list CI-users@centos.org https://lists.centos.org/mailman/listinfo/ci-users