[Ci-users] CentOS CI OCP 4 cluster down for emergency maintenance

Thu Jun 10 14:30:07 UTC 2021
David Kirwan <dkirwan at redhat.com>

The outage for the CentOS CI OCP 4 cluster is now over, service has been
fully restored with a temporary workaround.

We had a hardware failure on the storinator node `storage02`, which
provides storage services to our cluster. Logs show some issues with the
backplane.

As a temporary workaround, we have migrated this storage to an older node
(which is out of warranty). We'll have an on-site engineer visit the data
center early next week to diagnose the problem affecting the main
storinator node. At a future date, once this storinator node is
repaired/replaced, we will schedule an outage to migrate our storage back
to that device.

Tracking ticket [0] has been updated [0]


- [0] https://pagure.io/centos-infra/issue/353

On Wed, 9 Jun 2021 at 17:42, Vipul Siddharth <vipul at redhat.com> wrote:

> On Tue, Jun 8, 2021 at 6:33 PM David Kirwan <dkirwan at redhat.com> wrote:
> >
> > Hi ci-users,
> >
> > We're currently suffering an issue with our storage on the CentOS CI OCP
> 4 cluster, we'll be taking the cluster down for emergency maintenance
> immediately.
>
> This problem seems bigger than we had anticipated. So far from our
> investigation it seems this is a low level hardware issue that will
> need an onsite visit. We may have to go with a server replacement
> (from logs this is a symptom of backplane issue or of the
> controller's) but hoping onsite visit reveals something like "power
> cable not connected properly or low voltage".
>
> There is no estimated resolution time for this but we will keep the
> ticket [0] up to date as we find out.
>
> [0] https://pagure.io/centos-infra/issue/353
> --
> Vipul Siddharth
> He/His/Him
> Fedora and CentOS Infrastructure
>
> _______________________________________________
> CI-users mailing list
> CI-users at centos.org
> https://lists.centos.org/mailman/listinfo/ci-users
>
>

-- 
David Kirwan
Software Engineer

Community Platform Engineering @ Red Hat

T: +(353) 86-8624108     IM: @dkirwan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/ci-users/attachments/20210610/b37ef593/attachment-0002.html>