Re: [Ci-users] Changes to CentOS CI: reminder of Phase 1 and 2

19 Aug 2022


      Hello!
I understand that the metal machines are expensive, and I'm not sure how
...
many other projects are eventually going to migrate over to them, but I
guess in the future some balance will need to be found out between the cost
and available metal nodes. Is this even up to a discussion, or the size of
the metal pools is given and can't/won't be adjusted?
We're looking to optimize resource usage with the recent changes to CentOS
CI. From our side, the goal is to find a balance between adjusting to
tenants' needs (there are adaptations we could do to have more nodes
available with an increase in resource consumption) and adjusting projects
workflows to use EC2.
I'd appreciate your suggestions on mitigating how to make workflows more
adaptable to EC2.
Also, how much is this impacting critical deliveries on your side at the
moment? My goal here is to understand whether we need a more urgent
solution for you before going for deeper discussions. As I understand, we
still have some bandwidth to find the best solution we can, as it could
become more critical in the future. Is that assumption correct?
Thank you for reaching out about this,
On Fri, Aug 19, 2022 at 8:35 AM Evgeni Golov evgeni@redhat.com wrote:
...
Moin,
On Fri, Aug 19, 2022 at 1:21 PM František Šumšal frantisek@sumsal.cz
wrote:
...
After a couple of weeks of back and forth with the always helpful infra
team I was able to migrate most of our (systemd) jobs over to the EC2
machines. As we require at least an access to KVM (and the EC2 VMs,
unfortunately, don't support nested virt), I had to resort to metal
machines over the "plain" VMs.
We (foreman) are in the same boat, our tests spawn multiple VMs, so we
require KVM access (metal or nested, with the latter sadly not
supported by EC2)
...
After monitoring the situation for a couple of days I noticed an
issue[0] which might bite us in the future if/when other projects migrate
over to the metal machines as well (since several of them require at least
KVM too) - Duffy currently provisions only one metal machine at a time, and
returns an API error for all other API requests for the same pool in the
meantime:
...
can't reserve nodes: quantity=1 pool='metal-ec2-c5n-centos-8s-x86_64'
As the provisioning takes a bit, this delay might stack up quite
noticeably. For example, after firing up 10 jobs (current project quota) at
once, all for the metal pool, the last one got the machine after ~30
minutes - and that's only one project. If/when other projects migrate over
to the metal machines as well, this might get quickly out of hand.
Our tests run up to 8 parallel jobs, so yeah, I can totally see this
being a problem in the longer term for everybody.
We're currently investigating whether we can change our scheduling and
run multiple jobs on one metal host (it's big enough to host more than
the 3 VMs one job needs), but it doesn't seem too trivial right now.
Evgeni
--
Beste Grüße/Kind regards,
Evgeni Golov
Senior Software Engineer
________________________________________________________________________
Red Hat GmbH, https://de.redhat.com/, Registered seat: Werner von
Siemens Ring 14, D-85630 Grasbrunn, Germany
Commercial register: Amtsgericht Muenchen/Munich, HRB 153243,
Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill, Amy
Ross

CI-users mailing list
CI-users@centos.org
https://lists.centos.org/mailman/listinfo/ci-users
-- 

Camila Granella

Associate Manager, Software Engineering

Red Hat https://www.redhat.com/
@Red Hat https://twitter.com/redhat   Red Hat
https://www.linkedin.com/company/red-hat  Red Hat
https://www.facebook.com/RedHatInc
https://www.redhat.com/

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Ci-users] Changes to CentOS CI: reminder of Phase 1 and 2