On 13/09/2022 10:07, Anoop C S wrote:
Hi,
Post migration to new Duffy API we started noticing the following RPM error intermittently on bare metal nodes reserved from EC2 pool:
. . . Running transaction check Waiting for process with pid 2881 to finish. Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction RPM: error: db5 error(-30969) from dbenv->open: BDB0091 DB_VERSION_MISMATCH: Database environment version mismatch RPM: error: cannot open Packages index using db5 - (-30969) RPM: error: cannot open Packages database in /var/lib/rpm The downloaded packages were saved in cache until the next successful transaction. You can remove cached packages by executing 'dnf clean packages'. Error: Could not run transaction. . .
Especially the "Waiting for process with pid 2881 to finish." indicates some DNF operation in background which may or may not conflict with current DNF execution. With the help of others I got to know about a cloud-init service performing `dnf update` to cope with the very old AMI image with which it is provisioned. Please note that nodes are already marked ready in the pool before `dnf update` kicks in.
Therefore I put forward a suggestion to disable `dnf update` as part of cloud-init service such that it does not interfere with other DNF operations done after the node is reserved by a tenant.
Please feel free to correct me in any of the details mentioned above and let me know your thoughts.
Thanks, Anoop C S.
(Follow-up on the thread started on #centos-ci irc channel)
As we initially wanted to hand over centos nodes that would be "up2date" wrt rpm packages, we indeed added the "dnf update -y" operation in our ansible deployments through the ec2 user data field, so that cloud-init, on ec2 init , would update automatically the machine. If we have enough machines in the pool, it should be "transparent" for users but I see that tenants start to default to the metal instance (and clearly we already warned against it as tenants should start using the classic ec2 instances - large enough but different thread), so while these ec2 nodes are entering the duffy pool, they are 'given' to tenants while cloud-init is still updating these.
I don't mind disabling completely that "dnf update" step from our ec2 config and so each project/tenant would start with such operation in their ci/test workflow/pipeline
Waiting for feedback from other tenants/projects and if so, it's just a git commit && git push operation at centos infra side and it will be reflected for newly deployed ec2 nodes (for all, so not only bare-metal ones)