On 04/10/2020 09:00, Fabian Arrotin wrote: > Yesterday (Saturday) evening we got zabbix notifications that some nodes > in CI environment were unreachable. After a quick look, I discovered > that it was an embedded network switch in a chassis hosting multiple > nodes (including but not limited to jenkins node behind ci.centos.org) > that went nuts. > > I tried a remote "hardware reset" and nodes were back online after ~10min. > > But this morning (sunday), I see through zabbix that same issue happened > again, and in the hour after I already did the "hardware reset", but > this time, even that doesn't work anymore. > > So that means that we have a network switch not working anymore. > > As that chassis (like almost *all* equipment in CI) *isn't* under > warranty, we'll see on monday what can be done and how we give priority > to try to dispatch services elsewhere (and that probably means then > powering down other services , depending on priority that will be > given), but it's easy to understand that we can't even give any ETA at > this point. > > Thanks for your comprehending, > I had a quick workaround and jenkins (aka ci.centos.org) is now back in action normally. We'll see tomorrow about other impacted services .. -- Fabian Arrotin The CentOS Project | https://www.centos.org gpg key: 17F3B7A1 | twitter: @arrfab -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: <http://lists.centos.org/pipermail/centos-devel/attachments/20201004/977ccdb4/attachment-0006.sig>