Re: [CentOS-devel] major infra issue : impacting git.centos.org and cbs.centos.org

3 Mar 2024


      On 03/03/2024 20:27, Fabian Arrotin wrote:
...
On 03/03/2024 19:48, Fabian Arrotin wrote:
...
Today evening (Sunday), I got zabbix notification that some services 
hosted on same hypervisor were down.
A quick investigation showed me that despite running on a hardware 
raid controller, said server firware confirm data loss and corruption.
As I'm myself normally on PTO, I still wanted to restore services to 
quickly working on trying to redeploy from scratch services, and 
restore data from last backup and hope to have news soon ...
Status update : cbs.centos.org kojihub was fully reinstalled from 
scratch on a different hypervisor, reconfigured by Ansible and DB 
restored from backup that happened earlier today.
Quickly checked and it seems all operations are working fine.
The only issue you should eventually see is if you submitted a build 
today, *after* postgresql backup operation took place, so if that's the 
case, reconsider rebuilding your rpm (but it's usually quite during the 
weekend, especially on Sunday)
Next item to reinstall/restore : git.centos.org
https://git.centos.org is now also fully redeployed from scratch on a 
different hypervisor, reconfigured fully by ansible and data restored 
from backup (that's the step that needed more time as I had to restore 
~1TiB of data from remote backup server to local pagure instance)
What I (quicky) tried after service was restored :
- git pull from various repositories
- git commit and push to one specific branch (test only)
- verified mqtt notifications were also working
- push a random file to lookaside cache (testing identified fasjson api 
call to verify if I was allowed to push to a specific sig-infra branch)
Everything seems to work but here are some interesting informations , as 
we fully redeployed the machine, sshd_host_key changed and can be viewed 
through web ui : https://git.centos.org/ssh_info
Also worth knowing that if you trust our CA, you shouldn't need to worry 
about key change , as new sshd_host_key is also signed by same CA.
That just means that you should trust this in your ~/.ssh/known_hosts
@cert-authority *.centos.org ssh-rsa 
AAAAB3NzaC1yc2EAAAADAQABAAABAQDXmhva/yVOS6y/sR1Pjd+Gflzkl7azfl3ZIhex5kSHilUjT3DSjfXK0TgSHT93BCKs1/mT84ZKv6s+Ulfc3kC9aykJQnkWJ6I6CjIgfIM547VT2Egx5fKJZ/7yRedYf6HoVPZSAW5WYKZ0fq/DDoAFUuZJkkp3QEzh6TUiXif9qjCu3liXNgkS2uVIWc7+1QTLRxqU3/MCD1YxuOL8ShyMSHlGJTRMMTYq6aAFmlQ/FsA8deb9HeR3PaAZx7Q7jqmiJD5cx9XtrmgM4CCZNFxP9i0s+L7yDKzFQ1ecm1/vzouOsAVcSh7MiAexuBLgbUdhmBDGVEJYQDNENKOdaoiP
WRT content/git repositories: same remark as for kojihub/cbs : we 
restored from backup so it can be that you'll have to push again commits 
(if any) and/or assets to lookaside cache if you used git.centos.org 
this Sunday
PS: I'm myself normally on PTO/Away/Grief mode so not normally paying 
attention to the list nor irc. If you encounter any issue due to this 
unscheduled outage, feel free to open a ticket on 
pagure.io/centos-infra/issues
Kind Regards,
-- 
Fabian Arrotin
The CentOS Project | https://www.centos.org
gpg key: 17F3B7A1 | @arrfab[@fosstodon.org]

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [CentOS-devel] major infra issue : impacting git.centos.org and cbs.centos.org