unscheduled outage (resolved): cbs.centos.org

List overview All Threads
Download

newer

older

dnf-makecache.service failing...

Alt Images SIG meeting minutes

Fabian Arrotin

2 Dec 2022 2 Dec '22

8:13 a.m.

Just for awareness : I noticed this morning through monitoring that the cache/proxy used by kojid builders (needed to reach out to gitlab/etc) was down during the night, so while kojihub itself was available, the kojid nodes weren't even checking in.

I quickly reconfigured (with ansible) another caching proxy and so restored connectivity. If you noticed some build issues during some hours, please resubmit these, but it seems most were just queued and so picked up "normally" as soon as connection was restored.

-- Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 17F3B7A1 | twitter: @arrfab

Attachments:

OpenPGP_signature.sig (application/pgp-signature — 840 bytes)

Show replies by date

Peter Georg

5 Dec 5 Dec

11:45 a.m.

New subject: [EXT] unscheduled outage (resolved): cbs.centos.org

On 02/12/2022 09.13, Fabian Arrotin wrote:

...

Just for awareness : I noticed this morning through monitoring that the cache/proxy used by kojid builders (needed to reach out to gitlab/etc) was down during the night, so while kojihub itself was available, the kojid nodes weren't even checking in.

I quickly reconfigured (with ansible) another caching proxy and so restored connectivity. If you noticed some build issues during some hours, please resubmit these, but it seems most were just queued and so picked up "normally" as soon as connection was restored.

Since approx. 1st of December I've been noticing issues running cbs commands within GitLab CI. Most of the cbs commands are running into a time out, no connection can be established. No tasks are ever created. This issue still exists.

Might this issue be related to the cache/proxy used for the kojid nodes?

Fabian Arrotin

2:10 p.m.

New subject: [EXT] unscheduled outage (resolved): cbs.centos.org

On 05/12/2022 12:45, Peter Georg wrote:

...

On 02/12/2022 09.13, Fabian Arrotin wrote:

...
Just for awareness : I noticed this morning through monitoring that the cache/proxy used by kojid builders (needed to reach out to gitlab/etc) was down during the night, so while kojihub itself was available, the kojid nodes weren't even checking in.

I quickly reconfigured (with ansible) another caching proxy and so restored connectivity. If you noticed some build issues during some hours, please resubmit these, but it seems most were just queued and so picked up "normally" as soon as connection was restored.

Since approx. 1st of December I've been noticing issues running cbs commands within GitLab CI. Most of the cbs commands are running into a time out, no connection can be established. No tasks are ever created. This issue still exists.

Might this issue be related to the cache/proxy used for the kojid nodes?

No, not related, but last week we also suffered from massive remote koji api calls , that was taking cbs/koji down on its knees. After having discussed with Fedora infra team colleagues, they said they had the same issue the week[s] before and had to just drop IP ranges to ensure koji.fedoraproject.org was back online. I quickly took same emergency measure (had really other things to do) to restore cbs.centos.org availability and so just also dropped some ranges. Probably gitlab is hosted in one of these ranges ?

In all cases, it's better to open an infra ticket (in parallel, as it's also good to discuss on the centos-devel list too)

-- Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 17F3B7A1 | twitter: @arrfab

Neal Gompa

2:13 p.m.

New subject: [EXT] unscheduled outage (resolved): cbs.centos.org

On Mon, Dec 5, 2022 at 9:10 AM Fabian Arrotin arrfab@centos.org wrote:

...

On 05/12/2022 12:45, Peter Georg wrote:

...
On 02/12/2022 09.13, Fabian Arrotin wrote:

...
Just for awareness : I noticed this morning through monitoring that the cache/proxy used by kojid builders (needed to reach out to gitlab/etc) was down during the night, so while kojihub itself was available, the kojid nodes weren't even checking in.

I quickly reconfigured (with ansible) another caching proxy and so restored connectivity. If you noticed some build issues during some hours, please resubmit these, but it seems most were just queued and so picked up "normally" as soon as connection was restored.

Since approx. 1st of December I've been noticing issues running cbs commands within GitLab CI. Most of the cbs commands are running into a time out, no connection can be established. No tasks are ever created. This issue still exists.

Might this issue be related to the cache/proxy used for the kojid nodes?

No, not related, but last week we also suffered from massive remote koji api calls , that was taking cbs/koji down on its knees. After having discussed with Fedora infra team colleagues, they said they had the same issue the week[s] before and had to just drop IP ranges to ensure koji.fedoraproject.org was back online. I quickly took same emergency measure (had really other things to do) to restore cbs.centos.org availability and so just also dropped some ranges. Probably gitlab is hosted in one of these ranges ?

In all cases, it's better to open an infra ticket (in parallel, as it's also good to discuss on the centos-devel list too)

GitLab.com is hosted in Google Cloud on Google Kubernetes Engine. If you blocked GCP IP addresses, then yes, that would happen.

-- 真実はいつも一つ！/ Always, there's only one truth!

Peter Georg

4:21 p.m.

New subject: [EXT] unscheduled outage (resolved): cbs.centos.org

On 05/12/2022 15.10, Fabian Arrotin wrote:

...

On 05/12/2022 12:45, Peter Georg wrote:

...
On 02/12/2022 09.13, Fabian Arrotin wrote:

...
Just for awareness : I noticed this morning through monitoring that the cache/proxy used by kojid builders (needed to reach out to gitlab/etc) was down during the night, so while kojihub itself was available, the kojid nodes weren't even checking in.

I quickly reconfigured (with ansible) another caching proxy and so restored connectivity. If you noticed some build issues during some hours, please resubmit these, but it seems most were just queued and so picked up "normally" as soon as connection was restored.

Since approx. 1st of December I've been noticing issues running cbs commands within GitLab CI. Most of the cbs commands are running into a time out, no connection can be established. No tasks are ever created. This issue still exists.

Might this issue be related to the cache/proxy used for the kojid nodes?

No, not related, but last week we also suffered from massive remote koji api calls , that was taking cbs/koji down on its knees. After having discussed with Fedora infra team colleagues, they said they had the same issue the week[s] before and had to just drop IP ranges to ensure koji.fedoraproject.org was back online. I quickly took same emergency measure (had really other things to do) to restore cbs.centos.org availability and so just also dropped some ranges. Probably gitlab is hosted in one of these ranges ?

Probably. The runners are all deployed in GCP us-east1 (according to GitLab's documentation). This would explain why either all cbs commands of a job succeed or already the first fails (at least I have no encountered any other case yet).

...

In all cases, it's better to open an infra ticket (in parallel, as it's also good to discuss on the centos-devel list too)

Opened an issue: https://pagure.io/centos-infra/issue/993

...

CentOS-devel mailing list CentOS-devel@centos.org https://lists.centos.org/mailman/listinfo/centos-devel

988

Age (days ago)

991

Last active (days ago)

devel@lists.centos.org

4 comments

3 participants

tags (0)

participants (3)

Fabian Arrotin
Neal Gompa
Peter Georg