[Ci-users] Infra : scheduled hardware maintenance (Openshift/NFS)

Wed Sep 30 22:24:23 UTC 2020
Brian Stinson <brian at bstinson.com>

Hi Folks,

Here's an update on where we are with the legacy (OKD 3.6) cluster. There were some integrity issues on the filesystem, so we decided to make sure we got a full xfs_repair done, and will be syncing the data to a new volume before we bring up the old cluster again.

The good news is: we have a good xfs_repair
The bad news is: the sync is going to take a number of hours yet. I don't have a good ETA for when the legacy cluster will come up again, but it may be into the morning US-time on Thursday 

This is another call for folks who would like to start a migration to the fancy OCP cluster, please fill out a ticket at https://pagure.io/centos-infra

-- 
  Brian Stinson
  brian at bstinson.com

On Wed, Sep 30, 2020, at 08:02, Vipul Siddharth wrote:
> On Wed, Sep 30, 2020 at 1:50 AM Fabian Arrotin <arrfab at centos.org> wrote:
> >
> > On 17/09/2020 16:30, Fabian Arrotin wrote:
> > > Due to a hardware maintenance that needs to take place on the NFS
> > > storage node used by openshift ("legacy" and current one - ocp ), we'll
> > > have to shutdown the openshift cluster, and then proceed with hardware
> > > maintenance on the NFS server (that itself needs to be powered down, no
> > > way to actually do that "online")
> > >
> > > Migration is scheduled for """"Wednesday September 30th, 12:00 pm UTC
> > > time"""".
> > > You can convert to local time with $(date -d '2020-09-30 12:00 UTC')
> > >
> > > The expected "downtime" is estimated to ~60 minutes , time needed to
> > > shutdown the machine, install new disks, restart the machine and also do
> > > some updates and tuning on the setup.
> 
> Due to some issues with legacy cluster volume, this is taking longer
> than expected.
> We are working on it.
> Apologies for the inconveniences.
> 
> > >
> > > For more informations about this, here are some relevant tickets that
> > > were created for the perf issue in openshift and nfs :
> > >
> > > https://pagure.io/centos-infra/issue/53
> > > https://pagure.io/centos-infra/issue/105
> > > https://pagure.io/centos-infra/issue/85
> > > https://pagure.io/centos-infra/issue/26
> > >
> > > <subliminal message>
> > > PS : worth noting that while we'll investigate reports on new ocp
> > > cluster, we'll probably  not spend time investigating in the old/legacy
> > > one, that projects are supposed to migrate away from soon, as the legacy
> > > openshift setup will disappear soon (see
> > > https://pagure.io/centos-infra/issue/16)
> > > </subliminal message>
> > >
> >
> >
> > Reminder !  :-)
> >
> > Also, due to the needed time to also properly/cleanly power down all
> > nodes, we decided to start at 11:00 am UTC, to be ready when on-site
> > engineer will start un-racking storage server for hardware maintenance
> > and put it back online after (we have a fixed appointment for when to do it)
> >
> > I'd like to remind all projects still on the old openshift cluster that
> > despite our calls to have projects migrated, only a very few did.
> > So we'll have discussion (centos ci infra team) about how to deal with
> > this but at first sight, we'll just announce a date/deadline for
> > decommissioning the old infra
> >
> > --
> > Fabian Arrotin
> > The CentOS Project | https://www.centos.org
> > gpg key: 17F3B7A1 | twitter: @arrfab
> >
> > _______________________________________________
> > CI-users mailing list
> > CI-users at centos.org
> > https://lists.centos.org/mailman/listinfo/ci-users
> 
> 
> 
> -- 
> Vipul Siddharth
> He/His/Him
> Fedora | CentOS CI Infrastructure Team
> 
> _______________________________________________
> CI-users mailing list
> CI-users at centos.org
> https://lists.centos.org/mailman/listinfo/ci-users
>