[Ci-users] devtools-ci-slave04 is offline

Fri Oct 9 11:28:15 UTC 2020
Katerina Foniok <kkanova at redhat.com>

Hello,
it seems that the `devtools-ci-slave04`  is down again.
Thank you, have a nice day
Katka

On Tue, Oct 6, 2020 at 9:52 AM Katerina Foniok <kkanova at redhat.com> wrote:

> Ah, ok, thank you very much for clarifying!
>
> On Tue, Oct 6, 2020 at 9:42 AM Vipul Siddharth <vipul at redhat.com> wrote:
>
>> On Tue, Oct 6, 2020 at 12:50 PM Katerina Foniok <kkanova at redhat.com>
>> wrote:
>> >
>> > So, I can see that access to Vault was disabled on purpose, so it
>> probably doesn't relate to the outage. Sorry for the hoax.
>> >
>> > We also can see this error message in our jobs:
>> >>
>> >> "msg": "Exceeded maximum allowed fail nodes limit, please release
>> other machines to continue"
>> >
>> > Example of the job is here.
>> So when you mark a node fail (usually when the job fails), the node
>> stays around for 12 hours in case someone wants to check manually on
>> what went wrong.
>> Keeping too many nodes in fail state becomes a bottleneck for duffy
>> pool as it means those nodes can't be reprovisioned for the next round
>> of jobs (for 12 hours).
>> We have a limit on how many can be in the fail state.
>> This is expected and you would have seen it on calling node/fail API
>> which should ideally be called when the job failed. So error could be
>> something else
>>
>> > Thank you for taking a look,
>> > Katka
>> >
>> > On Tue, Oct 6, 2020 at 9:04 AM Katerina Foniok <kkanova at redhat.com>
>> wrote:
>> >>
>> >> Thank you, the `devtools-ci-slave04` is running again but it seems
>> that our jobs can not get credentials from the vault now. Can it be related
>> to the outage?
>> >>
>> >> On Tue, Oct 6, 2020 at 8:43 AM Vipul Siddharth <vipul at redhat.com>
>> wrote:
>> >>>
>> >>> On Tue, Oct 6, 2020 at 11:40 AM Katerina Foniok <kkanova at redhat.com>
>> wrote:
>> >>> >
>> >>> > Hello guys,
>> >>> >
>> >>> > our jobs on ci.centos.org are pending because the
>> devtools-ci-slave04 is offline. Can someone take a look, please?
>> >>> fixed
>> >>> > One of the affected jobs is here.
>> >>> > Thank you!
>> >>> >
>> >>> > Have a great day,
>> >>> > Katka
>> >>> > _______________________________________________
>> >>> > CI-users mailing list
>> >>> > CI-users at centos.org
>> >>> > https://lists.centos.org/mailman/listinfo/ci-users
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Vipul Siddharth
>> >>> He/His/Him
>> >>> Fedora | CentOS CI Infrastructure Team
>> >>>
>> >>> _______________________________________________
>> >>> CI-users mailing list
>> >>> CI-users at centos.org
>> >>> https://lists.centos.org/mailman/listinfo/ci-users
>> >>>
>> > _______________________________________________
>> > CI-users mailing list
>> > CI-users at centos.org
>> > https://lists.centos.org/mailman/listinfo/ci-users
>>
>>
>>
>> --
>> Vipul Siddharth
>> He/His/Him
>> Fedora | CentOS CI Infrastructure Team
>>
>> _______________________________________________
>> CI-users mailing list
>> CI-users at centos.org
>> https://lists.centos.org/mailman/listinfo/ci-users
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/ci-users/attachments/20201009/99ec8f9e/attachment-0002.html>