Hey! On 8/23/22 16:41, Camila Granella wrote: > Hi all, > > Earlier today the infra team attempted to bump the amount of metal machines > available for provisioning on Duffy. > However, the AWS API returned that currently there is no capacity to provision > metal machines in the Availability Zone we are currently in (us-east-1a). > For this reason, we will need to default to the use of EC2. > > Let us know if you need anything from our end to support you adapting your > workflows to it. After thinking about this a bit more, I'd have one (quite naïve idea) - would it be possible to get in touch with the "other side" (i.e. people responsible for AWS) and ask them about a possibility of enabling nested virt for the CentOS CI pool? I have no idea about what's the reason behind no-nested virt anywhere (I suspect it's due to security-related reasons) and if it's even possible to enable in in the current EC2 infra, but having it enabled would, in the end, benefit all involved parties (especially given the infra is apparently sponsored by AWS/Amazon). As voiced by me several project currently utilizing CentOS CI - there are certain workflows which can't be run on the current EC2 machines, and as much as I'd like to use them (to avoid wasting resources unnecessarily), I simply can't. Again, this is just my late-night spitballing in hopes to find some suitable middle-ground, so if it doesn't make sense, please let me know. Cheers, Frantisek > > Regards, > > On Mon, Aug 22, 2022 at 3:56 PM Vladimir Benes <benesv at email.cz <mailto:benesv at email.cz>> wrote: > > On Mon, 2022-08-22 at 13:59 +0200, František Šumšal wrote: > > > > On 8/22/22 13:28, Fabian Arrotin wrote: > > > On 19/08/2022 15:31, František Šumšal wrote: > > > > Hey, > > > > > > > > On 8/19/22 14:23, Camila Granella wrote: > > > > > Hello! > > > > > > > > > > I understand that the metal machines are expensive, and I'm > > > > > not sure how many other projects are eventually going to > > > > > migrate over to them, but I guess in the future some balance > > > > > will need to be found out between the cost and available metal > > > > > nodes. Is this even up to a discussion, or the size of the > > > > > metal pools is given and can't/won't be adjusted? > > > > > > > > > > > > > > > We're looking to optimize resource usage with the recent > > > > > changes to CentOS CI. From our side, the goal is to find a > > > > > balance between adjusting to tenants' needs (there are > > > > > adaptations we could do to have more nodes available with an > > > > > increase in resource consumption) and adjusting projects > > > > > workflows to use EC2. > > > > > > > > > > I'd appreciate your suggestions on mitigating how to make > > > > > workflows more adaptable to EC2. > > > > > > > > The main blocker for many projects is that EC2 VMs don't support > > > > nested virtualization, which is really unfortunate, since using > > > > the EC2 metal machines is indeed a "bit" overkill in many > > > > scenarios (ours included). I spent a week playing with various > > > > approaches to avoid this requirement, but failed (in our case it > > > > would be running the VMs with TCG instead of KVM, but that makes > > > > the tests flaky/unreliable in many cases, and some of them run > > > > for several hours with this change). > > > > > > > > Going through many online resources just confirms this - EC2 VMs > > > > don't support nested virt[0], which is sad, since, for example, > > > > Microsoft's Azure apparently supports it[1][2] (and Google's > > > > Compute Engine apparently supports it as well from a quick > > > > lookup). > > > > > > > > I'm not really sure if there's an easy solution for this (if > > > > any). I'm at least trying to spread the workload on the machine > > > > "to the limits" to utilize as much of the metal resources as > > > > possible, which shortens the runtime of each job quite > > > > considerably, but even that's not ideal (resource-wise). > > > > > > > > As I mentioned on IRC, maybe having Duffy changing the pool size > > > > dynamically based on the demand for the past hour or so would > > > > help with the overall balance (to avoid wasting resources in > > > > "quiet periods"), but that's just an idea from top of my head, > > > > I'm not sure how feasible it is or if it even makes sense. > > > > > > > > > > Yes, that was always communicated that default EC2 instances don't > > > support nested virt, as one request a cloud vm, so not an > > > hypervisor :) > > > It's just before migrating to ec2 that we saw it was possible to > > > deploy bare-metal options at AWS side, but with a higher cost > > > (obviousy) than traditional EC2 instances (VMs) > > > > > > Can you explain why you'd need to have an hypervisor instead of VMs > > > ? I guess that troubleshooting comes to mind (`virsh console` to > > > the rescue while it's not even possible with the ec2 instance as > > > VM) ? > > > > The systemd integration test suite builds an image for each test and > > then runs it with both systemd-nspawn and directly with qemu/qemu- > > kvm, since running systemd tests straight on the host is in many > > cases dangerous (and in some cases it wouldn't be feasible at all, > > since we need to test stuff that happens during (early) boot). > > Running only the systemd-nspawn part would be an option, but this way > > we'd lose a significant part of coverage (as with nspawn you can't > > test the full boot process, and some tests don't run in nspawn at > > all, like the systemd-udevd tests and other storage-related stuff). > > > > NetworkManager needs some more power to start qemu machine as we have > tests trying all possible remote root mounts via nfs/iscsi (over, bond, > bridge, vlans, etc, etc) so we have similar requirements as > dracut/systemd for at least a part of our tests. We don't need > something fancy but we at least need to be able to execute a vm inside > the testing machine to simulate the early boot (remote filesystems are > hosted directly from the machine we run tests on). Maybe we can live > with paravirt, we have to experiment a bit. > > Thank you, > Vladimir > > > > > > > > > > > > > _______________________________________________ > > > CI-users mailing list > > > CI-users at centos.org <mailto:CI-users at centos.org> > > > https://lists.centos.org/mailman/listinfo/ci-users <https://lists.centos.org/mailman/listinfo/ci-users> > > > > -- > > PGP Key ID: 0xFB738CE27B634E4B > > _______________________________________________ > > CI-users mailing list > > CI-users at centos.org <mailto:CI-users at centos.org> > > https://lists.centos.org/mailman/listinfo/ci-users <https://lists.centos.org/mailman/listinfo/ci-users> > > > _______________________________________________ > CI-users mailing list > CI-users at centos.org <mailto:CI-users at centos.org> > https://lists.centos.org/mailman/listinfo/ci-users <https://lists.centos.org/mailman/listinfo/ci-users> > > > > -- > > Camila Granella > > Associate Manager, Software Engineering > > Red Hat<https://www.redhat.com/> > > @Red Hat <https://twitter.com/redhat> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc> > <https://www.redhat.com/> > > > _______________________________________________ > CI-users mailing list > CI-users at centos.org > https://lists.centos.org/mailman/listinfo/ci-users -- PGP Key ID: 0xFB738CE27B634E4B -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 840 bytes Desc: OpenPGP digital signature URL: <http://lists.centos.org/pipermail/ci-users/attachments/20220824/3e44a70a/attachment-0002.sig>