Re: [CentOS-devel] major infra issue : impacting git.centos.org and cbs.centos.org

4 Mar 2024

      Kudos Fabian for having taken care of this during your PTO.
On Mon, Mar 4, 2024 at 2:07 PM Amy Marrich amy@redhat.com wrote:
...
Thank you so much Fabian for doing all that while on PTO!
Amy
*Amy Marrich*
She/Her/Hers
Principal Technical Marketing Manager - Cloud Platforms
Red Hat, Inc https://www.redhat.com/
amy@redhat.com
Mobile: 954-818-0514
Slack:  amarrich
IRC: spotz
https://www.redhat.com/
On Sun, Mar 3, 2024 at 3:47 PM Fabian Arrotin arrfab@centos.org wrote:
...
On 03/03/2024 20:27, Fabian Arrotin wrote:
...
On 03/03/2024 19:48, Fabian Arrotin wrote:
...
Today evening (Sunday), I got zabbix notification that some services
hosted on same hypervisor were down.
A quick investigation showed me that despite running on a hardware
raid controller, said server firware confirm data loss and corruption.
As I'm myself normally on PTO, I still wanted to restore services to
quickly working on trying to redeploy from scratch services, and
restore data from last backup and hope to have news soon ...
Status update : cbs.centos.org kojihub was fully reinstalled from
scratch on a different hypervisor, reconfigured by Ansible and DB
restored from backup that happened earlier today.
Quickly checked and it seems all operations are working fine.
The only issue you should eventually see is if you submitted a build
today, *after* postgresql backup operation took place, so if that's the
case, reconsider rebuilding your rpm (but it's usually quite during the
weekend, especially on Sunday)
Next item to reinstall/restore : git.centos.org
https://git.centos.org is now also fully redeployed from scratch on a
different hypervisor, reconfigured fully by ansible and data restored
from backup (that's the step that needed more time as I had to restore
~1TiB of data from remote backup server to local pagure instance)
What I (quicky) tried after service was restored :

git pull from various repositories
git commit and push to one specific branch (test only)
verified mqtt notifications were also working
push a random file to lookaside cache (testing identified fasjson api

call to verify if I was allowed to push to a specific sig-infra branch)
Everything seems to work but here are some interesting informations , as
we fully redeployed the machine, sshd_host_key changed and can be viewed
through web ui : https://git.centos.org/ssh_info
Also worth knowing that if you trust our CA, you shouldn't need to worry
about key change , as new sshd_host_key is also signed by same CA.
That just means that you should trust this in your ~/.ssh/known_hosts
@cert-authority *.centos.org ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQDXmhva/yVOS6y/sR1Pjd+Gflzkl7azfl3ZIhex5kSHilUjT3DSjfXK0TgSHT93BCKs1/mT84ZKv6s+Ulfc3kC9aykJQnkWJ6I6CjIgfIM547VT2Egx5fKJZ/7yRedYf6HoVPZSAW5WYKZ0fq/DDoAFUuZJkkp3QEzh6TUiXif9qjCu3liXNgkS2uVIWc7+1QTLRxqU3/MCD1YxuOL8ShyMSHlGJTRMMTYq6aAFmlQ/FsA8deb9HeR3PaAZx7Q7jqmiJD5cx9XtrmgM4CCZNFxP9i0s+L7yDKzFQ1ecm1/vzouOsAVcSh7MiAexuBLgbUdhmBDGVEJYQDNENKOdaoiP
WRT content/git repositories: same remark as for kojihub/cbs : we
restored from backup so it can be that you'll have to push again commits
(if any) and/or assets to lookaside cache if you used git.centos.org
this Sunday
PS: I'm myself normally on PTO/Away/Grief mode so not normally paying
attention to the list nor irc. If you encounter any issue due to this
unscheduled outage, feel free to open a ticket on
pagure.io/centos-infra/issues
Kind Regards,
Fabian Arrotin
The CentOS Project | https://www.centos.org
gpg key: 17F3B7A1 | @arrfab[@fosstodon.org]

CentOS-devel mailing list
CentOS-devel@centos.org
https://lists.centos.org/mailman/listinfo/centos-devel

CentOS-devel mailing list
CentOS-devel@centos.org
https://lists.centos.org/mailman/listinfo/centos-devel

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [CentOS-devel] major infra issue : impacting git.centos.org and cbs.centos.org

Kind Regards,