Yesterday (Saturday) evening we got zabbix notifications that some nodes in CI environment were unreachable. After a quick look, I discovered that it was an embedded network switch in a chassis hosting multiple nodes (including but not limited to jenkins node behind ci.centos.org) that went nuts. I tried a remote "hardware reset" and nodes were back online after ~10min. But this morning (sunday), I see through zabbix that same issue happened again, and in the hour after I already did the "hardware reset", but this time, even that doesn't work anymore. So that means that we have a network switch not working anymore. As that chassis (like almost *all* equipment in CI) *isn't* under warranty, we'll see on monday what can be done and how we give priority to try to dispatch services elsewhere (and that probably means then powering down other services , depending on priority that will be given), but it's easy to understand that we can't even give any ETA at this point. Thanks for your comprehending, -- Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 17F3B7A1 | twitter: @arrfab -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: <http://lists.centos.org/pipermail/ci-users/attachments/20201004/9385d7f2/attachment.sig>