other hardware outage for nodes behind ci.centos.org

List overview All Threads
Download

newer

older

request jenkins vault plugin

Cannot get duffy nodes

Fabian Arrotin

16 Jun 2018 16 Jun '18

6:39 a.m.

Hi guys,

While investigating the hardware issue we had on one Seamicro chassis in the last days (see previous thread), we lost another one (completely at this point) during the night, so I have disabled all those ones too. (so that means 64 bare-metal nodes less in the pool)

I'll create ticket with DC to see if that's possible to investigate the issue, and I'll keep the list informed about the status.

Thanks for your comprehending,

-- Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 56BEC54E | twitter: @arrfab

Attachments:

signature.asc (application/pgp-signature — 198 bytes)

Show replies by date

Fabian Arrotin

19 Jun 19 Jun

9:08 a.m.

On 16/06/18 08:39, Fabian Arrotin wrote:

...

Hi guys,

While investigating the hardware issue we had on one Seamicro chassis in the last days (see previous thread), we lost another one (completely at this point) during the night, so I have disabled all those ones too. (so that means 64 bare-metal nodes less in the pool)

I'll create ticket with DC to see if that's possible to investigate the issue, and I'll keep the list informed about the status.

Thanks for your comprehending,

Just to let you know that we're still waiting on some input from DC about that unreachable Seamicro chassis (Pufty) and so I can't even give you any ETA on this.

OTOH, we were able to get the previous chassis (Gusty) back online, and I did some parallel reinstalls during the whole week-end and yesterday, and it seems only one compute card (out of 64) has really a problem, so that specific node/card is isolated now and I put that chassis back in action in the duffy pool (so nodes were reinstalled, and I see some were even deployed for some CI projects today already)

More informations about the Pufty chassis when I'll have something to report

-- Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 56BEC54E | twitter: @arrfab

Fabian Arrotin

23 Jun 23 Jun

6:39 a.m.

On 19/06/18 11:08, Fabian Arrotin wrote:

...

On 16/06/18 08:39, Fabian Arrotin wrote:

...
Hi guys,

While investigating the hardware issue we had on one Seamicro chassis in the last days (see previous thread), we lost another one (completely at this point) during the night, so I have disabled all those ones too. (so that means 64 bare-metal nodes less in the pool)

I'll create ticket with DC to see if that's possible to investigate the issue, and I'll keep the list informed about the status.

Thanks for your comprehending,

Just to let you know that we're still waiting on some input from DC about that unreachable Seamicro chassis (Pufty) and so I can't even give you any ETA on this.

OTOH, we were able to get the previous chassis (Gusty) back online, and I did some parallel reinstalls during the whole week-end and yesterday, and it seems only one compute card (out of 64) has really a problem, so that specific node/card is isolated now and I put that chassis back in action in the duffy pool (so nodes were reinstalled, and I see some were even deployed for some CI projects today already)

More informations about the Pufty chassis when I'll have something to report

[update]That "Pufty" chassis is now back online, but under investigation. We're "stress-testing" it during week-end to see if it's working as it should (multiple reinstalls in parallels) and if that's ok, we'll add it back in the CI nodes pool

Cheers,

-- Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 56BEC54E | twitter: @arrfab

Ari LiVigni

11:05 a.m.

Is this why I wouldbe seeing issues getting Duffy noses? I saw this on two separate Jenkins slaves.

-== @ri ==-

On Sat, Jun 23, 2018, 2:39 AM Fabian Arrotin arrfab@centos.org wrote:

...

On 19/06/18 11:08, Fabian Arrotin wrote:

...
On 16/06/18 08:39, Fabian Arrotin wrote:

...
Hi guys,

While investigating the hardware issue we had on one Seamicro chassis in the last days (see previous thread), we lost another one (completely at this point) during the night, so I have disabled all those ones too. (so that means 64 bare-metal nodes less in the pool)

I'll create ticket with DC to see if that's possible to investigate the issue, and I'll keep the list informed about the status.

Thanks for your comprehending,

Just to let you know that we're still waiting on some input from DC about that unreachable Seamicro chassis (Pufty) and so I can't even give you any ETA on this.

OTOH, we were able to get the previous chassis (Gusty) back online, and I did some parallel reinstalls during the whole week-end and yesterday, and it seems only one compute card (out of 64) has really a problem, so that specific node/card is isolated now and I put that chassis back in action in the duffy pool (so nodes were reinstalled, and I see some were even deployed for some CI projects today already)

More informations about the Pufty chassis when I'll have something to

report

...
[update]That "Pufty" chassis is now back online, but under investigation. We're "stress-testing" it during week-end to see if it's working as it should (multiple reinstalls in parallels) and if that's ok, we'll add it back in the CI nodes pool

Cheers,

Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 56BEC54E | twitter: @arrfab

Ci-users mailing list Ci-users@centos.org https://lists.centos.org/mailman/listinfo/ci-users

Fabian Arrotin

12:55 p.m.

On 23/06/18 13:05, Ari LiVigni wrote:

...

Is this why I wouldbe seeing issues getting Duffy noses? I saw this on two separate Jenkins slaves.

-== @ri ==-

No, as those nodes aren't actually used by duffy, and there are more than enough nodes in the ready pool, so what's the error you have ?

-- Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 56BEC54E | twitter: @arrfab

Ari LiVigni

7:18 p.m.

I wanted to get some duffy resources to validate some playbook roles and I used cico:

[fedora-atomic@slave03 ~]$ cico --api-key duffy.key node get Starting new HTTP connection (1): admin.ci.centos.org Resetting dropped connection: admin.ci.centos.org The requested operation failed as no inventory is available.

Is there another way I should be allocating duffy resources?

-== @ri ==- My PGP fingerprint is F87F1EE7CD8BEE13

On Sat, Jun 23, 2018 at 8:55 AM, Fabian Arrotin arrfab@centos.org wrote:

...

On 23/06/18 13:05, Ari LiVigni wrote:

...
Is this why I wouldbe seeing issues getting Duffy noses? I saw this on two separate Jenkins slaves.

-== @ri ==-

No, as those nodes aren't actually used by duffy, and there are more than enough nodes in the ready pool, so what's the error you have ?

-- Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 56BEC54E | twitter: @arrfab

Ci-users mailing list Ci-users@centos.org https://lists.centos.org/mailman/listinfo/ci-users

2741

Age (days ago)

2748

Last active (days ago)

ci-users@lists.centos.org

5 comments

3 participants

tags (0)

participants (3)

Ari LiVigni
Ari LiVigni
Fabian Arrotin