[Ci-users] Changes to CentOS CI: reminder of Phase 1 and 2

Mon Aug 22 11:28:29 UTC 2022
Fabian Arrotin <arrfab at centos.org>

On 19/08/2022 15:31, František Šumšal wrote:
> Hey,
> 
> On 8/19/22 14:23, Camila Granella wrote:
>> Hello!
>>
>>     I understand that the metal machines are expensive, and I'm not 
>> sure how many other projects are eventually going to migrate over to 
>> them, but I guess in the future some balance will need to be found out 
>> between the cost and available metal nodes. Is this even up to a 
>> discussion, or the size of the metal pools is given and can't/won't be 
>> adjusted?
>>
>>
>> We're looking to optimize resource usage with the recent changes to 
>> CentOS CI. From our side, the goal is to find a balance between 
>> adjusting to tenants' needs (there are adaptations we could do to have 
>> more nodes available with an increase in resource consumption) and 
>> adjusting projects workflows to use EC2.
>>
>> I'd appreciate your suggestions on mitigating how to make workflows 
>> more adaptable to EC2.
> 
> The main blocker for many projects is that EC2 VMs don't support nested 
> virtualization, which is really unfortunate, since using the EC2 metal 
> machines is indeed a "bit" overkill in many scenarios (ours included). I 
> spent a week playing with various approaches to avoid this requirement, 
> but failed (in our case it would be running the VMs with TCG instead of 
> KVM, but that makes the tests flaky/unreliable in many cases, and some 
> of them run for several hours with this change).
> 
> Going through many online resources just confirms this - EC2 VMs don't 
> support nested virt[0], which is sad, since, for example, Microsoft's 
> Azure apparently supports it[1][2] (and Google's Compute Engine 
> apparently supports it as well from a quick lookup).
> 
> I'm not really sure if there's an easy solution for this (if any). I'm 
> at least trying to spread the workload on the machine "to the limits" to 
> utilize as much of the metal resources as possible, which shortens the 
> runtime of each job quite considerably, but even that's not ideal 
> (resource-wise).
> 
> As I mentioned on IRC, maybe having Duffy changing the pool size 
> dynamically based on the demand for the past hour or so would help with 
> the overall balance (to avoid wasting resources in "quiet periods"), but 
> that's just an idea from top of my head, I'm not sure how feasible it is 
> or if it even makes sense.
> 

Yes, that was always communicated that default EC2 instances don't 
support nested virt, as one request a cloud vm, so not an hypervisor :)
It's just before migrating to ec2 that we saw it was possible to deploy 
bare-metal options at AWS side, but with a higher cost (obviousy) than 
traditional EC2 instances (VMs)

Can you explain why you'd need to have an hypervisor instead of VMs ? I 
guess that troubleshooting comes to mind (`virsh console` to the rescue 
while it's not even possible with the ec2 instance as VM) ?


-- 
Fabian Arrotin
The CentOS Project | https://www.centos.org
gpg key: 17F3B7A1 | twitter: @arrfab
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xA25DBAFB17F3B7A1.asc
Type: application/pgp-keys
Size: 12767 bytes
Desc: OpenPGP public key
URL: <http://lists.centos.org/pipermail/ci-users/attachments/20220822/07dbcbbf/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://lists.centos.org/pipermail/ci-users/attachments/20220822/07dbcbbf/attachment.sig>