The outage for the CentOS CI OCP 4 cluster is now over, service has been fully restored with a temporary workaround.
                                             
We had a hardware failure on the storinator node `storage02`, which provides storage services to our cluster. Logs show some issues with the backplane.
                                             
As a temporary workaround, we have migrated this storage to an older node (which is out of warranty). We'll have an on-site engineer visit the data center early next week to diagnose the problem affecting the main storinator node. At a future date, once this storinator node is repaired/replaced, we will schedule an outage to migrate our storage back to that device.
                                             
Tracking ticket [0] has been updated [0]      
                                             
                                             
- [0] https://pagure.io/centos-infra/issue/353  

On Wed, 9 Jun 2021 at 17:42, Vipul Siddharth <vipul@redhat.com> wrote:
On Tue, Jun 8, 2021 at 6:33 PM David Kirwan <dkirwan@redhat.com> wrote:
>
> Hi ci-users,
>
> We're currently suffering an issue with our storage on the CentOS CI OCP 4 cluster, we'll be taking the cluster down for emergency maintenance immediately.

This problem seems bigger than we had anticipated. So far from our
investigation it seems this is a low level hardware issue that will
need an onsite visit. We may have to go with a server replacement
(from logs this is a symptom of backplane issue or of the
controller's) but hoping onsite visit reveals something like "power
cable not connected properly or low voltage".

There is no estimated resolution time for this but we will keep the
ticket [0] up to date as we find out.

[0] https://pagure.io/centos-infra/issue/353
--
Vipul Siddharth
He/His/Him
Fedora and CentOS Infrastructure

_______________________________________________
CI-users mailing list
CI-users@centos.org
https://lists.centos.org/mailman/listinfo/ci-users



--
David Kirwan
Software Engineer

Community Platform Engineering @ Red Hat

T: +(353) 86-8624108     IM: @dkirwan