Just for awareness : I noticed this morning through monitoring that the cache/proxy used by kojid builders (needed to reach out to gitlab/etc) was down during the night, so while kojihub itself was available, the kojid nodes weren't even checking in.
I quickly reconfigured (with ansible) another caching proxy and so restored connectivity. If you noticed some build issues during some hours, please resubmit these, but it seems most were just queued and so picked up "normally" as soon as connection was restored.
On 02/12/2022 09.13, Fabian Arrotin wrote:
Just for awareness : I noticed this morning through monitoring that the cache/proxy used by kojid builders (needed to reach out to gitlab/etc) was down during the night, so while kojihub itself was available, the kojid nodes weren't even checking in.
I quickly reconfigured (with ansible) another caching proxy and so restored connectivity. If you noticed some build issues during some hours, please resubmit these, but it seems most were just queued and so picked up "normally" as soon as connection was restored.
Since approx. 1st of December I've been noticing issues running cbs commands within GitLab CI. Most of the cbs commands are running into a time out, no connection can be established. No tasks are ever created. This issue still exists.
Might this issue be related to the cache/proxy used for the kojid nodes?
On 05/12/2022 12:45, Peter Georg wrote:
On 02/12/2022 09.13, Fabian Arrotin wrote:
Just for awareness : I noticed this morning through monitoring that the cache/proxy used by kojid builders (needed to reach out to gitlab/etc) was down during the night, so while kojihub itself was available, the kojid nodes weren't even checking in.
I quickly reconfigured (with ansible) another caching proxy and so restored connectivity. If you noticed some build issues during some hours, please resubmit these, but it seems most were just queued and so picked up "normally" as soon as connection was restored.
Since approx. 1st of December I've been noticing issues running cbs commands within GitLab CI. Most of the cbs commands are running into a time out, no connection can be established. No tasks are ever created. This issue still exists.
Might this issue be related to the cache/proxy used for the kojid nodes?
No, not related, but last week we also suffered from massive remote koji api calls , that was taking cbs/koji down on its knees. After having discussed with Fedora infra team colleagues, they said they had the same issue the week[s] before and had to just drop IP ranges to ensure koji.fedoraproject.org was back online. I quickly took same emergency measure (had really other things to do) to restore cbs.centos.org availability and so just also dropped some ranges. Probably gitlab is hosted in one of these ranges ?
In all cases, it's better to open an infra ticket (in parallel, as it's also good to discuss on the centos-devel list too)
On Mon, Dec 5, 2022 at 9:10 AM Fabian Arrotin arrfab@centos.org wrote:
On 05/12/2022 12:45, Peter Georg wrote:
On 02/12/2022 09.13, Fabian Arrotin wrote:
Just for awareness : I noticed this morning through monitoring that the cache/proxy used by kojid builders (needed to reach out to gitlab/etc) was down during the night, so while kojihub itself was available, the kojid nodes weren't even checking in.
I quickly reconfigured (with ansible) another caching proxy and so restored connectivity. If you noticed some build issues during some hours, please resubmit these, but it seems most were just queued and so picked up "normally" as soon as connection was restored.
Since approx. 1st of December I've been noticing issues running cbs commands within GitLab CI. Most of the cbs commands are running into a time out, no connection can be established. No tasks are ever created. This issue still exists.
Might this issue be related to the cache/proxy used for the kojid nodes?
No, not related, but last week we also suffered from massive remote koji api calls , that was taking cbs/koji down on its knees. After having discussed with Fedora infra team colleagues, they said they had the same issue the week[s] before and had to just drop IP ranges to ensure koji.fedoraproject.org was back online. I quickly took same emergency measure (had really other things to do) to restore cbs.centos.org availability and so just also dropped some ranges. Probably gitlab is hosted in one of these ranges ?
In all cases, it's better to open an infra ticket (in parallel, as it's also good to discuss on the centos-devel list too)
GitLab.com is hosted in Google Cloud on Google Kubernetes Engine. If you blocked GCP IP addresses, then yes, that would happen.
On 05/12/2022 15.10, Fabian Arrotin wrote:
On 05/12/2022 12:45, Peter Georg wrote:
On 02/12/2022 09.13, Fabian Arrotin wrote:
Just for awareness : I noticed this morning through monitoring that the cache/proxy used by kojid builders (needed to reach out to gitlab/etc) was down during the night, so while kojihub itself was available, the kojid nodes weren't even checking in.
I quickly reconfigured (with ansible) another caching proxy and so restored connectivity. If you noticed some build issues during some hours, please resubmit these, but it seems most were just queued and so picked up "normally" as soon as connection was restored.
Since approx. 1st of December I've been noticing issues running cbs commands within GitLab CI. Most of the cbs commands are running into a time out, no connection can be established. No tasks are ever created. This issue still exists.
Might this issue be related to the cache/proxy used for the kojid nodes?
No, not related, but last week we also suffered from massive remote koji api calls , that was taking cbs/koji down on its knees. After having discussed with Fedora infra team colleagues, they said they had the same issue the week[s] before and had to just drop IP ranges to ensure koji.fedoraproject.org was back online. I quickly took same emergency measure (had really other things to do) to restore cbs.centos.org availability and so just also dropped some ranges. Probably gitlab is hosted in one of these ranges ?
Probably. The runners are all deployed in GCP us-east1 (according to GitLab's documentation). This would explain why either all cbs commands of a job succeed or already the first fails (at least I have no encountered any other case yet).
In all cases, it's better to open an infra ticket (in parallel, as it's also good to discuss on the centos-devel list too)
Opened an issue: https://pagure.io/centos-infra/issue/993
CentOS-devel mailing list CentOS-devel@centos.org https://lists.centos.org/mailman/listinfo/centos-devel