On 20/10/2022 20:30, František Šumšal wrote:
Hey!
<snip>
I originally reported it to the CentOS Infra tracker [0] but was advised to post it here, since this behavior is, for better or worse, expected, or at least that's how the T2 machines are advertised. Is there anything that can be done to mitigate this? The only available solution would be to move the job back to the metal nodes, but that's going against the original issue (and the metal pool is quite limited anyway).
Well, yes, and it was a known fact : the "Cloud [TM]" is about virtual machines and (normally) not about bare metal options. I was even just happy that we can (ab)use a little bit the fact that AWS support "metal" options, but clearly (as you discovered it) in very limited quantity and availability.
Unfortunately that's the only thing we (or I should say "AWS", which is sponsoring that infra) can offer.
Isn't there a possibility to switch your workflow to avoid trying QEMU binary emulation ? IIRC you wanted to have VM yourself, to be able to troubleshoot through console access, in case something wouldn't come back online. What about you try directly on t2 instance and only trigger another job that would do that, only if it was failing ? (so just looking at that option *when* there is something to debug). I hope that systemd code is sane and so doesn't need someone to troubleshoot issues for each commit/build/test :)