[Ci-users] Disabling dnf execution in background on an EC2 bare metal node

Tue Sep 13 09:22:10 UTC 2022
Fabian Arrotin <arrfab at centos.org>

On 13/09/2022 10:07, Anoop C S wrote:
> Hi,
> 
> Post migration to new Duffy API we started noticing the following RPM
> error intermittently on bare metal nodes reserved from EC2 pool:
> 
> . . .
> Running transaction check
> Waiting for process with pid 2881 to finish.
> Transaction check succeeded.
> Running transaction test
> Transaction test succeeded.
> Running transaction
> RPM: error: db5 error(-30969) from dbenv->open: BDB0091
> DB_VERSION_MISMATCH: Database environment version mismatch
> RPM: error: cannot open Packages index using db5 -  (-30969)
> RPM: error: cannot open Packages database in /var/lib/rpm
> The downloaded packages were saved in cache until the next successful
> transaction.
> You can remove cached packages by executing 'dnf clean packages'.
> Error: Could not run transaction.
> . . 
> 
> Especially the "Waiting for process with pid 2881 to finish." indicates
> some DNF operation in background which may or may not conflict with
> current DNF execution. With the help of others I got to know about a
> cloud-init service performing `dnf update` to cope with the very old
> AMI image with which it is provisioned. Please note that nodes are
> already marked ready in the pool before `dnf update` kicks in.
> 
> Therefore I put forward a suggestion to disable `dnf update` as part of
> cloud-init service such that it does not interfere with other DNF
> operations done after the node is reserved by a tenant.
> 
> Please feel free to correct me in any of the details mentioned above
> and let me know your thoughts.
> 
> 
> Thanks,
> Anoop C S.

(Follow-up on the thread started on #centos-ci irc channel)

As we initially wanted to hand over centos nodes that would be "up2date" 
wrt rpm packages, we indeed added the "dnf update -y" operation in our 
ansible deployments through the ec2 user data field, so that cloud-init, 
on ec2 init , would update automatically the machine.
If we have enough machines in the pool, it should be "transparent" for 
users but I see that tenants start to default to the metal instance (and 
clearly we already warned against it as tenants should start using the 
classic ec2 instances - large enough but different thread), so while 
these ec2 nodes are entering the duffy pool, they are 'given' to tenants 
while cloud-init is still updating these.

I don't mind disabling completely that "dnf update" step from our ec2 
config and so each project/tenant would start with such operation in 
their ci/test workflow/pipeline

Waiting for feedback from other tenants/projects and if so, it's just a 
git commit && git push operation at centos infra side and it will be 
reflected for newly deployed ec2 nodes (for all, so not only bare-metal 
ones)

-- 
Fabian Arrotin
The CentOS Project | https://www.centos.org
gpg key: 17F3B7A1 | twitter: @arrfab
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xA25DBAFB17F3B7A1.asc
Type: application/pgp-keys
Size: 12767 bytes
Desc: OpenPGP public key
URL: <http://lists.centos.org/pipermail/ci-users/attachments/20220913/7e526c91/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://lists.centos.org/pipermail/ci-users/attachments/20220913/7e526c91/attachment.sig>