<div dir="ltr"><div>Great work, thanks Fabian for fixing that over the weekend!</div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Mar 3, 2024 at 10:47 PM Fabian Arrotin <<a href="mailto:arrfab@centos.org">arrfab@centos.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 03/03/2024 20:27, Fabian Arrotin wrote:<br>
> On 03/03/2024 19:48, Fabian Arrotin wrote:<br>
>> Today evening (Sunday), I got zabbix notification that some services <br>
>> hosted on same hypervisor were down.<br>
>> A quick investigation showed me that despite running on a hardware <br>
>> raid controller, said server firware confirm data loss and corruption.<br>
>><br>
>> As I'm myself normally on PTO, I still wanted to restore services to <br>
>> quickly working on trying to redeploy from scratch services, and <br>
>> restore data from last backup and hope to have news soon ...<br>
>><br>
> <br>
> Status update : <a href="http://cbs.centos.org" rel="noreferrer" target="_blank">cbs.centos.org</a> kojihub was fully reinstalled from <br>
> scratch on a different hypervisor, reconfigured by Ansible and DB <br>
> restored from backup that happened earlier today.<br>
> <br>
> Quickly checked and it seems all operations are working fine.<br>
> The only issue you should eventually see is if you submitted a build <br>
> today, *after* postgresql backup operation took place, so if that's the <br>
> case, reconsider rebuilding your rpm (but it's usually quite during the <br>
> weekend, especially on Sunday)<br>
> <br>
> Next item to reinstall/restore : <a href="http://git.centos.org" rel="noreferrer" target="_blank">git.centos.org</a><br>
> <br>
<br>
<a href="https://git.centos.org" rel="noreferrer" target="_blank">https://git.centos.org</a> is now also fully redeployed from scratch on a <br>
different hypervisor, reconfigured fully by ansible and data restored <br>
from backup (that's the step that needed more time as I had to restore <br>
~1TiB of data from remote backup server to local pagure instance)<br>
<br>
What I (quicky) tried after service was restored :<br>
- git pull from various repositories<br>
- git commit and push to one specific branch (test only)<br>
- verified mqtt notifications were also working<br>
- push a random file to lookaside cache (testing identified fasjson api <br>
call to verify if I was allowed to push to a specific sig-infra branch)<br>
<br>
Everything seems to work but here are some interesting informations , as <br>
we fully redeployed the machine, sshd_host_key changed and can be viewed <br>
through web ui : <a href="https://git.centos.org/ssh_info" rel="noreferrer" target="_blank">https://git.centos.org/ssh_info</a><br>
<br>
Also worth knowing that if you trust our CA, you shouldn't need to worry <br>
about key change , as new sshd_host_key is also signed by same CA.<br>
<br>
That just means that you should trust this in your ~/.ssh/known_hosts<br>
<br>
@cert-authority *.<a href="http://centos.org" rel="noreferrer" target="_blank">centos.org</a> ssh-rsa <br>
AAAAB3NzaC1yc2EAAAADAQABAAABAQDXmhva/yVOS6y/sR1Pjd+Gflzkl7azfl3ZIhex5kSHilUjT3DSjfXK0TgSHT93BCKs1/mT84ZKv6s+Ulfc3kC9aykJQnkWJ6I6CjIgfIM547VT2Egx5fKJZ/7yRedYf6HoVPZSAW5WYKZ0fq/DDoAFUuZJkkp3QEzh6TUiXif9qjCu3liXNgkS2uVIWc7+1QTLRxqU3/MCD1YxuOL8ShyMSHlGJTRMMTYq6aAFmlQ/FsA8deb9HeR3PaAZx7Q7jqmiJD5cx9XtrmgM4CCZNFxP9i0s+L7yDKzFQ1ecm1/vzouOsAVcSh7MiAexuBLgbUdhmBDGVEJYQDNENKOdaoiP<br>
<br>
<br>
WRT content/git repositories: same remark as for kojihub/cbs : we <br>
restored from backup so it can be that you'll have to push again commits <br>
(if any) and/or assets to lookaside cache if you used <a href="http://git.centos.org" rel="noreferrer" target="_blank">git.centos.org</a> <br>
this Sunday<br>
<br>
<br>
PS: I'm myself normally on PTO/Away/Grief mode so not normally paying <br>
attention to the list nor irc. If you encounter any issue due to this <br>
unscheduled outage, feel free to open a ticket on <br>
<a href="http://pagure.io/centos-infra/issues" rel="noreferrer" target="_blank">pagure.io/centos-infra/issues</a><br>
<br>
Kind Regards,<br>
-- <br>
Fabian Arrotin<br>
The CentOS Project | <a href="https://www.centos.org" rel="noreferrer" target="_blank">https://www.centos.org</a><br>
gpg key: 17F3B7A1 | @arrfab[@<a href="http://fosstodon.org" rel="noreferrer" target="_blank">fosstodon.org</a>]<br>
<br>
_______________________________________________<br>
CentOS-devel mailing list<br>
<a href="mailto:CentOS-devel@centos.org" target="_blank">CentOS-devel@centos.org</a><br>
<a href="https://lists.centos.org/mailman/listinfo/centos-devel" rel="noreferrer" target="_blank">https://lists.centos.org/mailman/listinfo/centos-devel</a><br>
</blockquote></div>