Disabling dnf execution in background on an EC2 bare metal node

List overview All Threads
Download

newer

older

Changes on CentOS CI and next steps

Fwd: [CentOS-devel] Changes to...

Anoop C S

13 Sep 2022 13 Sep '22

8:07 a.m.

Hi,

Post migration to new Duffy API we started noticing the following RPM error intermittently on bare metal nodes reserved from EC2 pool:

. . . Running transaction check Waiting for process with pid 2881 to finish. Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction RPM: error: db5 error(-30969) from dbenv->open: BDB0091 DB_VERSION_MISMATCH: Database environment version mismatch RPM: error: cannot open Packages index using db5 - (-30969) RPM: error: cannot open Packages database in /var/lib/rpm The downloaded packages were saved in cache until the next successful transaction. You can remove cached packages by executing 'dnf clean packages'. Error: Could not run transaction. . . .

Especially the "Waiting for process with pid 2881 to finish." indicates some DNF operation in background which may or may not conflict with current DNF execution. With the help of others I got to know about a cloud-init service performing `dnf update` to cope with the very old AMI image with which it is provisioned. Please note that nodes are already marked ready in the pool before `dnf update` kicks in.

Therefore I put forward a suggestion to disable `dnf update` as part of cloud-init service such that it does not interfere with other DNF operations done after the node is reserved by a tenant.

Please feel free to correct me in any of the details mentioned above and let me know your thoughts.

Thanks, Anoop C S.

Show replies by date

Fabian Arrotin

13 Sep 13 Sep

9:22 a.m.

New subject: Disabling dnf execution in background on an EC2 bare metal node

On 13/09/2022 10:07, Anoop C S wrote:

...

Hi,

Post migration to new Duffy API we started noticing the following RPM error intermittently on bare metal nodes reserved from EC2 pool:

. . . Running transaction check Waiting for process with pid 2881 to finish. Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction RPM: error: db5 error(-30969) from dbenv->open: BDB0091 DB_VERSION_MISMATCH: Database environment version mismatch RPM: error: cannot open Packages index using db5 - (-30969) RPM: error: cannot open Packages database in /var/lib/rpm The downloaded packages were saved in cache until the next successful transaction. You can remove cached packages by executing 'dnf clean packages'. Error: Could not run transaction. . .

Especially the "Waiting for process with pid 2881 to finish." indicates some DNF operation in background which may or may not conflict with current DNF execution. With the help of others I got to know about a cloud-init service performing `dnf update` to cope with the very old AMI image with which it is provisioned. Please note that nodes are already marked ready in the pool before `dnf update` kicks in.

Therefore I put forward a suggestion to disable `dnf update` as part of cloud-init service such that it does not interfere with other DNF operations done after the node is reserved by a tenant.

Please feel free to correct me in any of the details mentioned above and let me know your thoughts.

Thanks, Anoop C S.

(Follow-up on the thread started on #centos-ci irc channel)

As we initially wanted to hand over centos nodes that would be "up2date" wrt rpm packages, we indeed added the "dnf update -y" operation in our ansible deployments through the ec2 user data field, so that cloud-init, on ec2 init , would update automatically the machine. If we have enough machines in the pool, it should be "transparent" for users but I see that tenants start to default to the metal instance (and clearly we already warned against it as tenants should start using the classic ec2 instances - large enough but different thread), so while these ec2 nodes are entering the duffy pool, they are 'given' to tenants while cloud-init is still updating these.

I don't mind disabling completely that "dnf update" step from our ec2 config and so each project/tenant would start with such operation in their ci/test workflow/pipeline

Waiting for feedback from other tenants/projects and if so, it's just a git commit && git push operation at centos infra side and it will be reflected for newly deployed ec2 nodes (for all, so not only bare-metal ones)

-- Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 17F3B7A1 | twitter: @arrfab

Anoop C S

21 Sep 21 Sep

6:47 a.m.

New subject: Disabling dnf execution in background on an EC2 bare metal node

On Tue, 2022-09-13 at 11:22 +0200, Fabian Arrotin wrote:

...

On 13/09/2022 10:07, Anoop C S wrote:

...
Therefore I put forward a suggestion to disable `dnf update` as part of cloud-init service such that it does not interfere with other DNF operations done after the node is reserved by a tenant.

Please feel free to correct me in any of the details mentioned above and let me know your thoughts.

(Follow-up on the thread started on #centos-ci irc channel)

I don't mind disabling completely that "dnf update" step from our ec2 config and so each project/tenant would start with such operation in their ci/test workflow/pipeline

Waiting for feedback from other tenants/projects and if so, it's just a git commit && git push operation at centos infra side and it will be reflected for newly deployed ec2 nodes (for all, so not only bare- metal ones)

With updated EC2 cloud image available(and deployed) for CentOS Stream 8 where do we stand w.r.t this request? As far as our project is considered we haven't seen any mentioned DNF/RPM issues in our jobs since the update to recent EC2 cloud image for CentOS Stream 8 on metal instances.

Regards, Anoop C S.

Fabian Arrotin

9:42 a.m.

New subject: Disabling dnf execution in background on an EC2 bare metal node

On 21/09/2022 08:47, Anoop C S wrote:

...

On Tue, 2022-09-13 at 11:22 +0200, Fabian Arrotin wrote:

...
On 13/09/2022 10:07, Anoop C S wrote:

...
Therefore I put forward a suggestion to disable `dnf update` as part of cloud-init service such that it does not interfere with other DNF operations done after the node is reserved by a tenant.

Please feel free to correct me in any of the details mentioned above and let me know your thoughts.

(Follow-up on the thread started on #centos-ci irc channel)

I don't mind disabling completely that "dnf update" step from our ec2 config and so each project/tenant would start with such operation in their ci/test workflow/pipeline

Waiting for feedback from other tenants/projects and if so, it's just a git commit && git push operation at centos infra side and it will be reflected for newly deployed ec2 nodes (for all, so not only bare- metal ones)

With updated EC2 cloud image available(and deployed) for CentOS Stream 8 where do we stand w.r.t this request? As far as our project is considered we haven't seen any mentioned DNF/RPM issues in our jobs since the update to recent EC2 cloud image for CentOS Stream 8 on metal instances.

Regards, Anoop C S.

No other feedback received so far, but if updates "delta" to be applied by cloud-init is now really small, that can explain why nobody is suffering from the dnf lock behaviour. I'd (personally) like to keep it like that : provisioned instance is updated with $latest pkgs in the background.

If nobody minds with that plan, that means nothing to change for now :)

PS : but that still means ensuring that we have more up2date base image for CentOS Stream 8 and 9 and then it's only a matter of a config change to push in git for us to point to new AMI

-- Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 17F3B7A1 | twitter: @arrfab

Anoop C S

10:27 a.m.

New subject: Disabling dnf execution in background on an EC2 bare metal node

On Wed, 2022-09-21 at 11:42 +0200, Fabian Arrotin wrote:

...

On 21/09/2022 08:47, Anoop C S wrote:

...
On Tue, 2022-09-13 at 11:22 +0200, Fabian Arrotin wrote:

...
On 13/09/2022 10:07, Anoop C S wrote:

...
Therefore I put forward a suggestion to disable `dnf update` as part of cloud-init service such that it does not interfere with other DNF operations done after the node is reserved by a tenant.

Please feel free to correct me in any of the details mentioned above and let me know your thoughts.

(Follow-up on the thread started on #centos-ci irc channel)

I don't mind disabling completely that "dnf update" step from our ec2 config and so each project/tenant would start with such operation in their ci/test workflow/pipeline

Waiting for feedback from other tenants/projects and if so, it's just a git commit && git push operation at centos infra side and it will be reflected for newly deployed ec2 nodes (for all, so not only bare- metal ones)

With updated EC2 cloud image available(and deployed) for CentOS Stream 8 where do we stand w.r.t this request? As far as our project is considered we haven't seen any mentioned DNF/RPM issues in our jobs since the update to recent EC2 cloud image for CentOS Stream 8 on metal instances.

No other feedback received so far, but if updates "delta" to be applied by cloud-init is now really small, that can explain why nobody is suffering from the dnf lock behaviour. I'd (personally) like to keep it like that : provisioned instance is updated with $latest pkgs in the background.

Fair enough.

...

PS : but that still means ensuring that we have more up2date base image for CentOS Stream 8 and 9 and then it's only a matter of a config change to push in git for us to point to new AMI

Correct. What would be an ideal time period to get the base image updated? Quarterly? Or even monthly?

Thanks, Anoop C S.

Anoop C S

18 Nov 18 Nov

10:08 a.m.

New subject: Disabling dnf execution in background on an EC2 bare metal node

On Wed, 2022-09-21 at 12:17 +0530, Anoop C S wrote:

...

On Tue, 2022-09-13 at 11:22 +0200, Fabian Arrotin wrote:

...
On 13/09/2022 10:07, Anoop C S wrote:

...
Therefore I put forward a suggestion to disable `dnf update` as part of cloud-init service such that it does not interfere with other DNF operations done after the node is reserved by a tenant.

Please feel free to correct me in any of the details mentioned above and let me know your thoughts.

(Follow-up on the thread started on #centos-ci irc channel)

I don't mind disabling completely that "dnf update" step from our ec2 config and so each project/tenant would start with such operation in their ci/test workflow/pipeline

Waiting for feedback from other tenants/projects and if so, it's just a git commit && git push operation at centos infra side and it will be reflected for newly deployed ec2 nodes (for all, so not only bare- metal ones)

With updated EC2 cloud image available(and deployed) for CentOS Stream 8 where do we stand w.r.t this request? As far as our project is considered we haven't seen any mentioned DNF/RPM issues in our jobs since the update to recent EC2 cloud image for CentOS Stream 8 on metal instances.

Now after 2 months we again started noticing these DNF errors again as more and more updates comes in for CentOS Stream 8. Shall we reconsider disabling the `dnf update` step?

Thanks, Anoop C S.

Anoop C S

28 Nov 28 Nov

6:52 a.m.

New subject: Disabling dnf execution in background on an EC2 bare metal node

On Tue, 2022-09-13 at 11:22 +0200, Fabian Arrotin wrote:

...

On 13/09/2022 10:07, Anoop C S wrote:

...
Therefore I put forward a suggestion to disable `dnf update` as part of cloud-init service such that it does not interfere with other DNF operations done after the node is reserved by a tenant.

Please feel free to correct me in any of the details mentioned above and let me know your thoughts.

Thanks, Anoop C S.

(Follow-up on the thread started on #centos-ci irc channel)

As we initially wanted to hand over centos nodes that would be "up2date" wrt rpm packages, we indeed added the "dnf update -y" operation in our ansible deployments through the ec2 user data field, so that cloud- init, on ec2 init , would update automatically the machine. If we have enough machines in the pool, it should be "transparent" for users but I see that tenants start to default to the metal instance (and clearly we already warned against it as tenants should start using the classic ec2 instances - large enough but different thread), so while these ec2 nodes are entering the duffy pool, they are 'given' to tenants while cloud-init is still updating these.

I don't mind disabling completely that "dnf update" step from our ec2 config and so each project/tenant would start with such operation in their ci/test workflow/pipeline

Waiting for feedback from other tenants/projects and if so, it's just a git commit && git push operation at centos infra side and it will be reflected for newly deployed ec2 nodes (for all, so not only bare- metal ones)

I would like to hear from other tenants for any objections around the above suggestion. Many of our jobs performing `dnf update` or `dnf install` started failing in a more frequent manner which feels bad looking at their respective statuses.

If I don't get to hear any objections my recommendation would be to disable `dnf update` from cloud-init on EC2 instances.

Thanks, Anoop C S.

Fabian Arrotin

7:39 a.m.

New subject: Disabling dnf execution in background on an EC2 bare metal node

On 28/11/2022 07:52, Anoop C S wrote:

...

On Tue, 2022-09-13 at 11:22 +0200, Fabian Arrotin wrote:

...
On 13/09/2022 10:07, Anoop C S wrote:

...
Therefore I put forward a suggestion to disable `dnf update` as part of cloud-init service such that it does not interfere with other DNF operations done after the node is reserved by a tenant.

Please feel free to correct me in any of the details mentioned above and let me know your thoughts.

Thanks, Anoop C S.

(Follow-up on the thread started on #centos-ci irc channel)

As we initially wanted to hand over centos nodes that would be "up2date" wrt rpm packages, we indeed added the "dnf update -y" operation in our ansible deployments through the ec2 user data field, so that cloud- init, on ec2 init , would update automatically the machine. If we have enough machines in the pool, it should be "transparent" for users but I see that tenants start to default to the metal instance (and clearly we already warned against it as tenants should start using the classic ec2 instances - large enough but different thread), so while these ec2 nodes are entering the duffy pool, they are 'given' to tenants while cloud-init is still updating these.

I don't mind disabling completely that "dnf update" step from our ec2 config and so each project/tenant would start with such operation in their ci/test workflow/pipeline

Waiting for feedback from other tenants/projects and if so, it's just a git commit && git push operation at centos infra side and it will be reflected for newly deployed ec2 nodes (for all, so not only bare- metal ones)

I would like to hear from other tenants for any objections around the above suggestion. Many of our jobs performing `dnf update` or `dnf install` started failing in a more frequent manner which feels bad looking at their respective statuses.

If I don't get to hear any objections my recommendation would be to disable `dnf update` from cloud-init on EC2 instances.

Thanks, Anoop C S.

Hi,

As nobody said anything about it, I just pushed the change to remove the "dnf update -y" part from the cloud-init configuration. Ideally we'd let it there but if all tenants are starting themselves all their jobs with such dnf transaction, it's all good . Ideally we'd have refreshed centos 8s ami on regular basis and updating the ami id is just a git commit away for us (and Duffy then provisioning these new ones instead)

-- Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 17F3B7A1 | twitter: @arrfab

František Šumšal

8:44 a.m.

New subject: Disabling dnf execution in background on an EC2 bare metal node

Hey,

Apologies for a late response.

On 11/28/22 08:39, Fabian Arrotin wrote:

...

On 28/11/2022 07:52, Anoop C S wrote:

...
On Tue, 2022-09-13 at 11:22 +0200, Fabian Arrotin wrote:

I would like to hear from other tenants for any objections around the above suggestion. Many of our jobs performing `dnf update` or `dnf install` started failing in a more frequent manner which feels bad looking at their respective statuses.

If I don't get to hear any objections my recommendation would be to disable `dnf update` from cloud-init on EC2 instances.

Thanks, Anoop C S.

Hi,

As nobody said anything about it, I just pushed the change to remove the "dnf update -y" part from the cloud-init configuration.

In this case I agree with Anoop, even with the measures I have in place [0] I still occasionally bump into the original issue as well. So if it wasn't for Anoop, I'd start complaining sooner or later anyway :)

...

Ideally we'd let it there but if all tenants are starting themselves all their jobs with such dnf transaction, it's all good . Ideally we'd have refreshed centos 8s ami on regular basis and updating the ami id is just a git commit away for us (and Duffy then provisioning these new ones instead)

That would be definitely ideal, but it looks like only C9S AMIs are updated regularly, not quite sure why it's not the case for C8S as well.

[0] Calling `systemd-run --wait -p Wants=cloud-init.target -p After=cloud-init.target true` after getting a Duffy node

-- Frantisek Sumsal GPG key ID: 0xFB738CE27B634E4B

Anoop C S

1:27 p.m.

New subject: Disabling dnf execution in background on an EC2 bare metal node

On Mon, 2022-11-28 at 08:39 +0100, Fabian Arrotin wrote:

...

On 28/11/2022 07:52, Anoop C S wrote:

...
On Tue, 2022-09-13 at 11:22 +0200, Fabian Arrotin wrote:

...
On 13/09/2022 10:07, Anoop C S wrote:

...
Therefore I put forward a suggestion to disable `dnf update` as part of cloud-init service such that it does not interfere with other DNF operations done after the node is reserved by a tenant.

Please feel free to correct me in any of the details mentioned above and let me know your thoughts.

I don't mind disabling completely that "dnf update" step from our ec2 config and so each project/tenant would start with such operation in their ci/test workflow/pipeline

Waiting for feedback from other tenants/projects and if so, it's just a git commit && git push operation at centos infra side and it will be reflected for newly deployed ec2 nodes (for all, so not only bare-metal ones)

I would like to hear from other tenants for any objections around the above suggestion. Many of our jobs performing `dnf update` or `dnf install` started failing in a more frequent manner which feels bad looking at their respective statuses.

If I don't get to hear any objections my recommendation would be to disable `dnf update` from cloud-init on EC2 instances.

As nobody said anything about it, I just pushed the change to remove the "dnf update -y" part from the cloud-init configuration. Ideally we'd let it there but if all tenants are starting themselves all their jobs with such dnf transaction, it's all good . Ideally we'd have refreshed centos 8s ami on regular basis and updating the ami id is just a git commit away for us (and Duffy then provisioning these new ones instead)

Thank you Fabian. I can confirm that our jobs are succeeding without any DNF errors.

Regards, Anoop C S.

976

Age (days ago)

1052

Last active (days ago)

ci-users@lists.centos.org

9 comments

3 participants

tags (0)

participants (3)

Anoop C S
Fabian Arrotin
František Šumšal