After trying several paths, some suggested on this list, here's my results.
1) Fixing a unbootable system wasn't practical in my case. Fortunately, all my systems can be rebuilt from scratch.
2) When I was lucky enough to catch an updated system before reboot, backing out the defective updates wasn't possible. Yum said there were no prior versions.
3) The most reliable method I found for Centos 7 was: - Re=install from scratch (luckily, my data files were safe and restorable) - Before running any updates, apply the fix suggested by Redhat and exclude updates to grub2, shim and mokutil. - Without the above 'exclude', the system became unbootable after a yum update even though the corrected versions of shim should have been loaded.
The system I'm dealing with is Centos 7. I can easily rebuild it from scratch and test stuff without losing crucial data, if it would helpful.
4) I haven't experimented yet with centos 8 because the hardware is remote and requires me to get a friend involved to help. My local hardware is not supported by Centos 8, so it will remain on Centos 7 until I replace the hardware or switch to a different Linux.
David
Le 03/08/2020 à 19:24, david a écrit :
After trying several paths, some suggested on this list, here's my results.
Hi,
Just back from a hiking trip. One of my clients sent me a message that his CentOS server refuses to boot. So tomorrow I have to drive there to figure out what's going on. I guess there's a high probability it's the issue discussed in this thread.
Simple question: besides a tsunami of mailing list and forum messages, is there some to-the-point reliable information about this mess ? As well as some to-the-point reliable information about how to fix it ?
Thanks,
Niki
Hi all,
I had the same problem with my UEFI bios machine and I fixed it so for Centos 7:
1) Boot from an rescue linux usb
2) When the rescue system is running:
2.1) #chroot /mnt/sysimage
3) Config network:
3.1) # ip addr add X.X.X.X/X dev X
3.2) # ip route add default via X.X.X.X <--- default router
4) And finally:
#yum downgrade shim* grub2* mokutil
#exit
#reboot
I hope you can fix it with these steps.
El 4/8/20 a las 0:56, Nicolas Kovacs escribió:
Le 03/08/2020 à 19:24, david a écrit :
After trying several paths, some suggested on this list, here's my results.
Hi,
Just back from a hiking trip. One of my clients sent me a message that his CentOS server refuses to boot. So tomorrow I have to drive there to figure out what's going on. I guess there's a high probability it's the issue discussed in this thread.
Simple question: besides a tsunami of mailing list and forum messages, is there some to-the-point reliable information about this mess ? As well as some to-the-point reliable information about how to fix it ?
Thanks,
Niki
--
Am 04.08.2020 um 08:31 schrieb lpeci lpeci@roa.es:
Hi all,
I had the same problem with my UEFI bios machine and I fixed it so for Centos 7:
Boot from an rescue linux usb
When the rescue system is running:
2.1) #chroot /mnt/sysimage
Config network:
3.1) # ip addr add X.X.X.X/X dev X
3.2) # ip route add default via X.X.X.X <--- default router
And finally:
#yum downgrade shim* grub2* mokutil
As there are updated and working packages available now, downgrading is no longer needed, another update will also work.
# yum makecache # yum upgrade
You should see a shim-x64 package with version 15.8 which is the working version (15.7 caused the problem)
#exit #reboot
I hope you can fix it with these steps.
On 8/4/20 2:31 AM, lpeci wrote:
- Config network:
3.1) # ip addr add X.X.X.X/X dev X
3.2) # ip route add default via X.X.X.X <--- default router
While I appreciate the thoughts behind this step in the instructions, and I thank you for the post that will be useful to those running fairly traditional servers, there are numerous cases where this simply will not work to bring up a network while booted into the rescue mode chroot. Not all, and maybe not even most, CentOS machines are traditional servers with simple direct ethernet connections that don't require more steps. I can just off the top of my head think of three cases where the above won't work:
Case 1: Virtualization host with a bridge on multiple VLANs over a bond. Depending upon the type of bond, it may or may not be possible to bring up the host's interface to the network with the commands above. More than half of my server machines here fall under this case.
Case 2: workstation with wired network and 802.1x authentication.
Case 3: workstation or laptop with only a wireless interface that requires a supplicant to authenticate. Yes, workstation and laptop installs of CentOS do exist and are actively used and are just as important to recover as any traditional server.
For my laptop I was able to recover thanks to the 'nmtui' text-mode interactive interface to NetworkManager, bringing up any of my WiFi SSIDs with authentication; if any of my virtualization hosts had hit this problem (none did, interestingly enough) nmtui would have allowed me to activate the bridge on the host admin vlan quickly and easily from, again, a nice interactive text interface that is dead-simple to use quickly and accurately, and where you don't have to do any extra steps to get the interface name or any other details; nmtui just takes care of it in an intuitive manner.
On 8/4/20 1:31 AM, lpeci wrote:
Hi all,
I had the same problem with my UEFI bios machine and I fixed it so for Centos 7:
Boot from an rescue linux usb
When the rescue system is running:
2.1) #chroot /mnt/sysimage
- Config network:
3.1) # ip addr add X.X.X.X/X dev X
3.2) # ip route add default via X.X.X.X <--- default router
- And finally:
#yum downgrade shim* grub2* mokutil
#exit
#reboot
I hope you can fix it with these steps.
El 4/8/20 a las 0:56, Nicolas Kovacs escribió:
Le 03/08/2020 à 19:24, david a écrit :
After trying several paths, some suggested on this list, here's my results.
Hi,
Just back from a hiking trip. One of my clients sent me a message that his CentOS server refuses to boot. So tomorrow I have to drive there to figure out what's going on. I guess there's a high probability it's the issue discussed in this thread.
Simple question: besides a tsunami of mailing list and forum messages, is there some to-the-point reliable information about this mess ? As well as some to-the-point reliable information about how to fix it ?
Thanks,
Niki
The issues should now be resolved.
If you just mount /mnt/sysimage, set an ip address and upgrade (to get th new shim) .. then:
yum reinstall <latest-version>
Everything should just work.
On 8/4/20 9:51 AM, Johnny Hughes wrote:
On 8/4/20 1:31 AM, lpeci wrote:
Hi all,
I had the same problem with my UEFI bios machine and I fixed it so for Centos 7:
Boot from an rescue linux usb
When the rescue system is running:
2.1) #chroot /mnt/sysimage
- Config network:
3.1) # ip addr add X.X.X.X/X dev X
3.2) # ip route add default via X.X.X.X <--- default router
- And finally:
#yum downgrade shim* grub2* mokutil
#exit
#reboot
I hope you can fix it with these steps.
El 4/8/20 a las 0:56, Nicolas Kovacs escribió:
Le 03/08/2020 à 19:24, david a écrit :
After trying several paths, some suggested on this list, here's my results.
Hi,
Just back from a hiking trip. One of my clients sent me a message that his CentOS server refuses to boot. So tomorrow I have to drive there to figure out what's going on. I guess there's a high probability it's the issue discussed in this thread.
Simple question: besides a tsunami of mailing list and forum messages, is there some to-the-point reliable information about this mess ? As well as some to-the-point reliable information about how to fix it ?
Thanks,
Niki
The issues should now be resolved.
If you just mount /mnt/sysimage, set an ip address and upgrade (to get th new shim) .. then:
yum reinstall <latest-version>
Everything should just work.
sorry ..
yum reinstall kernsl-<latest_version>
Once upon a time, Johnny Hughes johnny@centos.org said:
The issues should now be resolved.
If you just mount /mnt/sysimage, set an ip address and upgrade (to get th new shim) .. then:
yum reinstall <latest-version>
I'm curious - why does the kernel need to be reinstalled? The shim-x64 package installs its files directly to the EFI partition where they are needed.
On Tue, 2020-08-04 at 10:36 -0500, Chris Adams wrote:
Once upon a time, Johnny Hughes johnny@centos.org said:
The issues should now be resolved.
If you just mount /mnt/sysimage, set an ip address and upgrade (to get th new shim) .. then:
yum reinstall <latest-version>
I'm curious - why does the kernel need to be reinstalled? The shim-x64 package installs its files directly to the EFI partition where they are needed.
+1
On 8/4/20 10:45 AM, ja wrote:
On Tue, 2020-08-04 at 10:36 -0500, Chris Adams wrote:
Once upon a time, Johnny Hughes johnny@centos.org said:
The issues should now be resolved.
If you just mount /mnt/sysimage, set an ip address and upgrade (to get th new shim) .. then:
yum reinstall <latest-version>
I'm curious - why does the kernel need to be reinstalled? The shim-x64 package installs its files directly to the EFI partition where they are needed.
+1
That is the easiest way for the initrd to be rebuilt .. which is what created the unbootable issue in the first place. At least is some circumstances.
You can also regenerate your initrd manually after installing the shim.
This is IF you are already in a failed boot condition from the bad install on Friday.
If you are doing the upgrade/install now from a bootable system, all you need to do is a normal update.
Le 04/08/2020 à 08:31, lpeci a écrit :
I had the same problem with my UEFI bios machine and I fixed it so for Centos 7:
Boot from an rescue linux usb
When the rescue system is running:
2.1) #chroot /mnt/sysimage
- Config network:
3.1) # ip addr add X.X.X.X/X dev X
3.2) # ip route add default via X.X.X.X <--- default router
- And finally:
#yum downgrade shim* grub2* mokutil
#exit
#reboot
I hope you can fix it with these steps.
Thanks for the detailed suggestion.
Unfortunately I couldn't recover the installation, and I had to redo everything from scratch, which cost me the first two days of my holidays.
One thought regularly kept crossing my mind:
"How on earth could this have passed Q & A ?"
:o)
Cheers,
Niki
Le 04/08/2020 à 08:31, lpeci a écrit :
I had the same problem with my UEFI bios machine and I fixed it so for Centos 7:
Boot from an rescue linux usb
When the rescue system is running:
2.1) #chroot /mnt/sysimage
- Config network:
3.1) # ip addr add X.X.X.X/X dev X
3.2) # ip route add default via X.X.X.X <--- default router
- And finally:
#yum downgrade shim* grub2* mokutil
#exit
#reboot
I hope you can fix it with these steps.
Thanks for the detailed suggestion.
Unfortunately I couldn't recover the installation, and I had to redo everything from scratch, which cost me the first two days of my holidays.
Now you know how the Window$ admins suffered all the years :-)
One thought regularly kept crossing my mind:
"How on earth could this have passed Q & A ?"
Quite simple I guess. It's one of the cases where you can not test so easily like other updates. Here you have to make tests on real hardware, different hardware of all kind and can not do it in the Cloud, a VM or whatever.
The only real solution I can think of to prevent this would be to make preview versions of updates available to the public so that a lot of people can test them on their hardware, hopefully spare hardware, and give feedback.
I think current business models do not support such a way these days.
However one can find strategies to survive. What I do is:
* Never update any system directly from what is provided online. Sync to local repositories first to control what is fed to your systems.
* Never blindly apply updates. Always do tests on not so important systems or dedicated test systems first.
* If all goes well, update important systems. If you have multiple systems, update only one first as another test. Then update others.
I have learned my lessons in the past decades but this was a good wake up call to follow above rules even more strictly. Better safe than sorry.
Regards, Simon
On 8/6/2020 7:25 AM, Simon Matter via CentOS wrote:
The only real solution I can think of to prevent this would be to make preview versions of updates available to the public so that a lot of people can test them on their hardware, hopefully spare hardware, and give feedback.
A practical equivalent is simply to avoid applying updates for a week to see if someone else gets burned by them. I'm already waiting for a weekend so I don't disrupt work in case a catastrophe happens, and I wait at least a week and watch this list for any reports of disaster. So I haven't experienced this one. Let the impatient do your testing for you.
Le 06/08/2020 à 16:53, Kenneth Porter a écrit :
A practical equivalent is simply to avoid applying updates for a week to see if someone else gets burned by them. I'm already waiting for a weekend so I don't disrupt work in case a catastrophe happens, and I wait at least a week and watch this list for any reports of disaster. So I haven't experienced this one. Let the impatient do your testing for you.
So a zero-day becomes at least an eight-day.
:o)
On Thu, Aug 06, 2020 at 03:57:56PM +0200, Nicolas Kovacs wrote:
Le 04/08/2020 à 08:31, lpeci a écrit :
I had the same problem with my UEFI bios machine and I fixed it so for Centos 7:
While this worked for me, it might not work for you...
My "solution" was to boot the previous kernel, which came up just fine, yum remove kernel.xx.yy.zz yum install kernel.xx.yy.zz
which rebuilds the initrd, and voila!
Fred
Boot from an rescue linux usb
When the rescue system is running:
2.1) #chroot /mnt/sysimage
- Config network:
3.1) # ip addr add X.X.X.X/X dev X
3.2) # ip route add default via X.X.X.X <--- default router
- And finally:
#yum downgrade shim* grub2* mokutil
#exit
#reboot
I hope you can fix it with these steps.
Thanks for the detailed suggestion.
Unfortunately I couldn't recover the installation, and I had to redo everything from scratch, which cost me the first two days of my holidays.
One thought regularly kept crossing my mind:
"How on earth could this have passed Q & A ?"
:o)
Cheers,
Niki
-- Microlinux - Solutions informatiques durables 7, place de l'église - 30730 Montpezat Site : https://www.microlinux.fr Blog : https://blog.microlinux.fr Mail : info@microlinux.fr Tél. : 04 66 63 10 32 Mob. : 06 51 80 12 12 _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
On 8/6/20 8:57 AM, Nicolas Kovacs wrote:
Le 04/08/2020 à 08:31, lpeci a écrit :
I had the same problem with my UEFI bios machine and I fixed it so for Centos 7:
Boot from an rescue linux usb
When the rescue system is running:
2.1) #chroot /mnt/sysimage
- Config network:
3.1) # ip addr add X.X.X.X/X dev X
3.2) # ip route add default via X.X.X.X <--- default router
- And finally:
#yum downgrade shim* grub2* mokutil
#exit
#reboot
I hope you can fix it with these steps.
Thanks for the detailed suggestion.
Unfortunately I couldn't recover the installation, and I had to redo everything from scratch, which cost me the first two days of my holidays.
One thought regularly kept crossing my mind:
"How on earth could this have passed Q & A ?"
Well, I mean that would be a valid point if it happened for every install. The issue did not happen on every install. There is no way to test every single hardware and firmware combination for every single computer ever built :)
It would be great if things like this did not happen, but with the universe of possible combinations, i am surprised it does not happen more often.
We do run boot tests of every single kernel for CentOS. The RHEL team runs many more tests for RHEL. But every possible combination from every vendor can't possibly be tested. Right?
Il 07/08/20 08:22, Johnny Hughes ha scritto:
"How on earth could this have passed Q & A ?"
Hi Johnny, Niki's question is spread, legit, in the thoughts in many and many users so don't see this as an attack. Many and many users,though really "if this was tested before release" and I think that many of us are incredulous at what happened on CentOS and in the upstream (specially in the upstream) but as you said CentOS inherits RHEL bugs. I'm reading about many users that lost their trust in RH with the last 2 problem (microcode and shim). This is bad for CentOS.
Well, I mean that would be a valid point if it happened for every install. The issue did not happen on every install. There is no way to test every single hardware and firmware combination for every single computer ever built :)
It would be great if things like this did not happen, but with the universe of possible combinations, i am surprised it does not happen more often.
Probably many users have not updated their machines between the bug release and the resolution (thanks to your fast apply in the weekend, thank you) and many update their centos machines on a 2 months base (if not worst). I think also that many users of CentOS user base have not proclamed their disappointement/the issue on this list or in other channels. For example I simply updated in the wrong time.
We do run boot tests of every single kernel for CentOS. The RHEL team runs many more tests for RHEL. But every possible combination from every vendor can't possibly be tested. Right?
you are right but is not UEFI a standard and it shouldn't work the same on several vendors? I ask this because this patch broken all my uefi workstations.
While CentOS team could not have so much resources to run this type of tests would be great to know what happened to RHEL QA (being RH giant) for this release and given the partenership between CentOS and RH if you know something more on this.....
Thank you.
Le 07/08/2020 à 09:40, Alessandro Baggi a écrit :
Probably many users have not updated their machines between the bug release and the resolution (thanks to your fast apply in the weekend, thank you) and many update their centos machines on a 2 months base (if not worst). I think also that many users of CentOS user base have not proclamed their disappointement/the issue on this list or in other channels. For example I simply updated in the wrong time.
I'm using yum-cron to keep all my server updated on a daily basis.
And my question "How could this have passed Q & A" was obviously directed at Red Hat... and *not* at Johnny Hughes and the CentOS team who do their best to deliver the best possible downstream system. I raise my morning coffee mug to your health, guys.
Cheers,
Niki
On 8/7/20 3:46 AM, Nicolas Kovacs wrote:
Le 07/08/2020 à 09:40, Alessandro Baggi a écrit :
Probably many users have not updated their machines between the bug release and the resolution (thanks to your fast apply in the weekend, thank you) and many update their centos machines on a 2 months base (if not worst). I think also that many users of CentOS user base have not proclamed their disappointement/the issue on this list or in other channels. For example I simply updated in the wrong time.
I'm using yum-cron to keep all my server updated on a daily basis.
And my question "How could this have passed Q & A" was obviously directed at Red Hat... and *not* at Johnny Hughes and the CentOS team who do their best to deliver the best possible downstream system. I raise my morning coffee mug to your health, guys.
Cheers,
Niki
I can assure you .. a BUNCH of testing was done. Because of the scope of this udpate, the CentOS team was looped in during the embargo stage (we normally are not .. Red Hat Engineering got permission to make this happen for this issue). Normally we see things that are open source only .. not embargoed content. Once the embargo gets lifted, the items become open source. Kudos to the RH team for making this happen.
The CentOS team worked with the RHEL team on this update for several days (more than a week, for sure, maybe 2 weeks)
I gained MUCH respect for all those guys .. especially Peter Jones. He is Mr.Secure Boot.
I personally tested both the c8 and c7 solutions on several machines (All i have access to actually, including several personal machines that have secureboot). I saw some of the testing that happened on the RHEL side. It was extensive.
Microsoft, Debian, Ubuntu and others also had issues with this .. so if you are losing trust, you are losing it with all OS vendors WRT this issue.
All I can say is .. this issue was the hardest thing I have been involved with since starting with the CentOS Project 17 years ago.
Obviously, everyone involved in this build would have prevented this from happening if they could have. Secureboot is complicated.
On 07/08/2020 10:01, Johnny Hughes wrote:
On 8/7/20 3:46 AM, Nicolas Kovacs wrote:
Le 07/08/2020 à 09:40, Alessandro Baggi a écrit :
Probably many users have not updated their machines between the bug release and the resolution (thanks to your fast apply in the weekend, thank you) and many update their centos machines on a 2 months base (if not worst). I think also that many users of CentOS user base have not proclamed their disappointement/the issue on this list or in other channels. For example I simply updated in the wrong time.
I'm using yum-cron to keep all my server updated on a daily basis.
And my question "How could this have passed Q & A" was obviously directed at Red Hat... and *not* at Johnny Hughes and the CentOS team who do their best to deliver the best possible downstream system. I raise my morning coffee mug to your health, guys.
Cheers,
Niki
I can assure you .. a BUNCH of testing was done. Because of the scope of this udpate, the CentOS team was looped in during the embargo stage (we normally are not .. Red Hat Engineering got permission to make this happen for this issue). Normally we see things that are open source only .. not embargoed content. Once the embargo gets lifted, the items become open source. Kudos to the RH team for making this happen.
The CentOS team worked with the RHEL team on this update for several days (more than a week, for sure, maybe 2 weeks)
I gained MUCH respect for all those guys .. especially Peter Jones. He is Mr.Secure Boot.
I personally tested both the c8 and c7 solutions on several machines (All i have access to actually, including several personal machines that have secureboot). I saw some of the testing that happened on the RHEL side. It was extensive.
I'll just add to Johnny's already comprehensive reply. As a member of the CentOS QA team, I personally tested the update on 3 physical machines and all worked fine. Moreover, the QA team was not able to replicate the issue on a single physical machine available to them - the first indication of a problem came from public reports. We give up a huge amount of our personal time and resources to ensure CentOS (and RHEL) are the very best products they can be. I'm unsure what more could have been done.
Microsoft, Debian, Ubuntu and others also had issues with this .. so if you are losing trust, you are losing it with all OS vendors WRT this issue.
All I can say is .. this issue was the hardest thing I have been involved with since starting with the CentOS Project 17 years ago.
Obviously, everyone involved in this build would have prevented this from happening if they could have. Secureboot is complicated.
On 8/7/20 5:30 AM, Phil Perry wrote:
On 07/08/2020 10:01, Johnny Hughes wrote:
On 8/7/20 3:46 AM, Nicolas Kovacs wrote:
Le 07/08/2020 à 09:40, Alessandro Baggi a écrit :
Probably many users have not updated their machines between the bug release and the resolution (thanks to your fast apply in the weekend, thank you) and many update their centos machines on a 2 months base (if not worst). I think also that many users of CentOS user base have not proclamed their disappointement/the issue on this list or in other channels. For example I simply updated in the wrong time.
I'm using yum-cron to keep all my server updated on a daily basis.
And my question "How could this have passed Q & A" was obviously directed at Red Hat... and *not* at Johnny Hughes and the CentOS team who do their best to deliver the best possible downstream system. I raise my morning coffee mug to your health, guys.
Cheers,
Niki
I can assure you .. a BUNCH of testing was done. Because of the scope of this udpate, the CentOS team was looped in during the embargo stage (we normally are not .. Red Hat Engineering got permission to make this happen for this issue). Normally we see things that are open source only .. not embargoed content. Once the embargo gets lifted, the items become open source. Kudos to the RH team for making this happen.
The CentOS team worked with the RHEL team on this update for several days (more than a week, for sure, maybe 2 weeks)
I gained MUCH respect for all those guys .. especially Peter Jones. He is Mr.Secure Boot.
I personally tested both the c8 and c7 solutions on several machines (All i have access to actually, including several personal machines that have secureboot). I saw some of the testing that happened on the RHEL side. It was extensive.
I'll just add to Johnny's already comprehensive reply. As a member of the CentOS QA team, I personally tested the update on 3 physical machines and all worked fine. Moreover, the QA team was not able to replicate the issue on a single physical machine available to them - the first indication of a problem came from public reports. We give up a huge amount of our personal time and resources to ensure CentOS (and RHEL) are the very best products they can be. I'm unsure what more could have been done.
Thanks Phil,
I very much appreciate all you and the rest of the QA team do.
I know it is a knee jerk reaction to say .. how did that not get caught. I actually said it MYSELF for this very issue. But looking back, I am not sure how we could have caught it.
"Stuff Happens" :)
There are just a huge number of possible combinations.
Microsoft, Debian, Ubuntu and others also had issues with this .. so if you are losing trust, you are losing it with all OS vendors WRT this issue.
All I can say is .. this issue was the hardest thing I have been involved with since starting with the CentOS Project 17 years ago.
Obviously, everyone involved in this build would have prevented this from happening if they could have. Secureboot is complicated.
On 8/7/20 5:30 AM, Phil Perry wrote:
On 07/08/2020 10:01, Johnny Hughes wrote:
On 8/7/20 3:46 AM, Nicolas Kovacs wrote:
Le 07/08/2020 à 09:40, Alessandro Baggi a écrit :
Probably many users have not updated their machines between the bug release and the resolution (thanks to your fast apply in the weekend, thank you) and many update their centos machines on a 2 months base (if not worst). I think also that many users of CentOS user base have not proclamed their disappointement/the issue on this list or in other channels. For example I simply updated in the wrong time.
I'm using yum-cron to keep all my server updated on a daily basis.
And my question "How could this have passed Q & A" was obviously directed at Red Hat... and *not* at Johnny Hughes and the CentOS team who do their best to deliver the best possible downstream system. I raise my morning coffee mug to your health, guys.
Cheers,
Niki
I can assure you .. a BUNCH of testing was done. Because of the scope of this udpate, the CentOS team was looped in during the embargo stage (we normally are not .. Red Hat Engineering got permission to make this happen for this issue). Normally we see things that are open source only .. not embargoed content. Once the embargo gets lifted, the items become open source. Kudos to the RH team for making this happen.
The CentOS team worked with the RHEL team on this update for several days (more than a week, for sure, maybe 2 weeks)
I gained MUCH respect for all those guys .. especially Peter Jones. He is Mr.Secure Boot.
I personally tested both the c8 and c7 solutions on several machines (All i have access to actually, including several personal machines that have secureboot). I saw some of the testing that happened on the RHEL side. It was extensive.
I'll just add to Johnny's already comprehensive reply. As a member of the CentOS QA team, I personally tested the update on 3 physical machines and all worked fine. Moreover, the QA team was not able to replicate the issue on a single physical machine available to them - the first indication of a problem came from public reports. We give up a huge amount of our personal time and resources to ensure CentOS (and RHEL) are the very best products they can be. I'm unsure what more could have been done.
Thanks Phil,
I very much appreciate all you and the rest of the QA team do.
I know it is a knee jerk reaction to say .. how did that not get caught. I actually said it MYSELF for this very issue. But looking back, I am not sure how we could have caught it.
"Stuff Happens" :)
Crowd testing? Feed the green bananas to the crowd and let them ripe. It works well for some of the biggest software companies :-)
At least it could make sense for directly hardware related stuff like kernel, boot loader, firmware/microcode and similar.
Regards, Simon
Il 07/08/20 14:53, Johnny Hughes ha scritto:
On 8/7/20 5:30 AM, Phil Perry wrote:
On 07/08/2020 10:01, Johnny Hughes wrote:
On 8/7/20 3:46 AM, Nicolas Kovacs wrote:
Le 07/08/2020 à 09:40, Alessandro Baggi a écrit :
Probably many users have not updated their machines between the bug release and the resolution (thanks to your fast apply in the weekend, thank you) and many update their centos machines on a 2 months base (if not worst). I think also that many users of CentOS user base have not proclamed their disappointement/the issue on this list or in other channels. For example I simply updated in the wrong time.
I'm using yum-cron to keep all my server updated on a daily basis.
And my question "How could this have passed Q & A" was obviously directed at Red Hat... and *not* at Johnny Hughes and the CentOS team who do their best to deliver the best possible downstream system. I raise my morning coffee mug to your health, guys.
Cheers,
Niki
I can assure you .. a BUNCH of testing was done. Because of the scope of this udpate, the CentOS team was looped in during the embargo stage (we normally are not .. Red Hat Engineering got permission to make this happen for this issue). Normally we see things that are open source only .. not embargoed content. Once the embargo gets lifted, the items become open source. Kudos to the RH team for making this happen.
The CentOS team worked with the RHEL team on this update for several days (more than a week, for sure, maybe 2 weeks)
I gained MUCH respect for all those guys .. especially Peter Jones. He is Mr.Secure Boot.
I personally tested both the c8 and c7 solutions on several machines (All i have access to actually, including several personal machines that have secureboot). I saw some of the testing that happened on the RHEL side. It was extensive.
I'll just add to Johnny's already comprehensive reply. As a member of the CentOS QA team, I personally tested the update on 3 physical machines and all worked fine. Moreover, the QA team was not able to replicate the issue on a single physical machine available to them - the first indication of a problem came from public reports. We give up a huge amount of our personal time and resources to ensure CentOS (and RHEL) are the very best products they can be. I'm unsure what more could have been done.
Thanks Phil,
I very much appreciate all you and the rest of the QA team do.
I know it is a knee jerk reaction to say .. how did that not get caught. I actually said it MYSELF for this very issue. But looking back, I am not sure how we could have caught it.
"Stuff Happens" :)
There are just a huge number of possible combinations.
Hi Johnny,
what is the current status of the notification tool for security updates on C8? There are possibilities to get soon announces on ML for EL8?
Would be great have the tool working.
Thank you.
Am 07.08.20 um 17:17 schrieb Alessandro Baggi:
Hi Johnny,
what is the current status of the notification tool for security updates on C8? There are possibilities to get soon announces on ML for EL8?
Would be great have the tool working.
As I understand some kind of mapping must be implemented for indexcode+gitcommitid beetween CentOS and RH ...
https://lists.centos.org/pipermail/centos/2020-August/351263.html
-- Leon
Il 07/08/20 17:39, Leon Fauster via CentOS ha scritto:
Am 07.08.20 um 17:17 schrieb Alessandro Baggi:
Hi Johnny,
what is the current status of the notification tool for security updates on C8? There are possibilities to get soon announces on ML for EL8?
Would be great have the tool working.
As I understand some kind of mapping must be implemented for indexcode+gitcommitid beetween CentOS and RH ...
https://lists.centos.org/pipermail/centos/2020-August/351263.html
-- Leon
Hi Leon,
so we won't have announces soon. Until this happen why not push them on list manually?
On 8/9/20 2:49 AM, Alessandro Baggi wrote:
Il 07/08/20 17:39, Leon Fauster via CentOS ha scritto:
Am 07.08.20 um 17:17 schrieb Alessandro Baggi:
Hi Johnny,
what is the current status of the notification tool for security updates on C8? There are possibilities to get soon announces on ML for EL8?
Would be great have the tool working.
As I understand some kind of mapping must be implemented for indexcode+gitcommitid beetween CentOS and RH ...
https://lists.centos.org/pipermail/centos/2020-August/351263.html
-- Leon
Hi Leon,
so we won't have announces soon. Until this happen why not push them on list manually?
Push what to the list .. i don't have anything to push?
We have this:
Il 09/08/20 10:40, Johnny Hughes ha scritto:
On 8/9/20 2:49 AM, Alessandro Baggi wrote:
Il 07/08/20 17:39, Leon Fauster via CentOS ha scritto:
Am 07.08.20 um 17:17 schrieb Alessandro Baggi:
Hi Johnny,
what is the current status of the notification tool for security updates on C8? There are possibilities to get soon announces on ML for EL8?
Would be great have the tool working.
As I understand some kind of mapping must be implemented for indexcode+gitcommitid beetween CentOS and RH ...
https://lists.centos.org/pipermail/centos/2020-August/351263.html
-- Leon
Hi Leon,
so we won't have announces soon. Until this happen why not push them on list manually?
Push what to the list .. i don't have anything to push?
We have this:
Hi Johnny,
thank you for the resource.
Le 07/08/2020 à 11:01, Johnny Hughes a écrit :
Microsoft, Debian, Ubuntu and others also had issues with this .. so if you are losing trust, you are losing it with all OS vendors WRT this issue.
All I can say is .. this issue was the hardest thing I have been involved with since starting with the CentOS Project 17 years ago.
Obviously, everyone involved in this build would have prevented this from happening if they could have. Secureboot is complicated.
In my head I've filed this under the "sh*t happens" category. Bad luck this happened on the first day of my holiday, so I had to cancel a hiking trip. :o)
This being said, rest assured my confidence in the CentOS project is still 100 % intact. On a side note, I've just published my third book about CentOS here in France.
Keep up the good work,
Niki
Il 07/08/20 10:46, Nicolas Kovacs ha scritto:
Le 07/08/2020 à 09:40, Alessandro Baggi a écrit :
Probably many users have not updated their machines between the bug release and the resolution (thanks to your fast apply in the weekend, thank you) and many update their centos machines on a 2 months base (if not worst). I think also that many users of CentOS user base have not proclamed their disappointement/the issue on this list or in other channels. For example I simply updated in the wrong time.
I'm using yum-cron to keep all my server updated on a daily basis.
And my question "How could this have passed Q & A" was obviously directed at Red Hat... and *not* at Johnny Hughes and the CentOS team who do their best to deliver the best possible downstream system. I raise my morning coffee mug to your health, guys.
Cheers,
Niki
Hi Niki,
I intended what you mean.
On 8/7/20 2:40 AM, Alessandro Baggi wrote:
Il 07/08/20 08:22, Johnny Hughes ha scritto:
"How on earth could this have passed Q & A ?"
Hi Johnny, Niki's question is spread, legit, in the thoughts in many and many users so don't see this as an attack. Many and many users,though really "if this was tested before release" and I think that many of us are incredulous at what happened on CentOS and in the upstream (specially in the upstream) but as you said CentOS inherits RHEL bugs. I'm reading about many users that lost their trust in RH with the last 2 problem (microcode and shim). This is bad for CentOS.
Well, I mean that would be a valid point if it happened for every install. The issue did not happen on every install. There is no way to test every single hardware and firmware combination for every single computer ever built :)
It would be great if things like this did not happen, but with the universe of possible combinations, i am surprised it does not happen more often.
Probably many users have not updated their machines between the bug release and the resolution (thanks to your fast apply in the weekend, thank you) and many update their centos machines on a 2 months base (if not worst). I think also that many users of CentOS user base have not proclamed their disappointement/the issue on this list or in other channels. For example I simply updated in the wrong time.
We do run boot tests of every single kernel for CentOS. The RHEL team runs many more tests for RHEL. But every possible combination from every vendor can't possibly be tested. Right?
you are right but is not UEFI a standard and it shouldn't work the same on several vendors? I ask this because this patch broken all my uefi workstations.
It would be nice if it did .. however, this worked on many UEFI/Secureboot machines. It did not work on a small subset of machines.
While CentOS team could not have so much resources to run this type of tests would be great to know what happened to RHEL QA (being RH giant) for this release and given the partenership between CentOS and RH if you know something more on this.....
I have not seen the full post event account if what actually happened. I do know that many Red Hatters worked many hours over the last weekend to fix it. I am sure a public post will be made (if not already there) .. if someone knows where it is, post a link.
If I don't see it posted soon, I'll look for it and post here.
Il 07/08/20 10:47, Johnny Hughes ha scritto:
On 8/7/20 2:40 AM, Alessandro Baggi wrote:
Il 07/08/20 08:22, Johnny Hughes ha scritto:
"How on earth could this have passed Q & A ?"
Hi Johnny, Niki's question is spread, legit, in the thoughts in many and many users so don't see this as an attack. Many and many users,though really "if this was tested before release" and I think that many of us are incredulous at what happened on CentOS and in the upstream (specially in the upstream) but as you said CentOS inherits RHEL bugs. I'm reading about many users that lost their trust in RH with the last 2 problem (microcode and shim). This is bad for CentOS.
Well, I mean that would be a valid point if it happened for every install. The issue did not happen on every install. There is no way to test every single hardware and firmware combination for every single computer ever built :)
It would be great if things like this did not happen, but with the universe of possible combinations, i am surprised it does not happen more often.
Probably many users have not updated their machines between the bug release and the resolution (thanks to your fast apply in the weekend, thank you) and many update their centos machines on a 2 months base (if not worst). I think also that many users of CentOS user base have not proclamed their disappointement/the issue on this list or in other channels. For example I simply updated in the wrong time.
We do run boot tests of every single kernel for CentOS. The RHEL team runs many more tests for RHEL. But every possible combination from every vendor can't possibly be tested. Right?
you are right but is not UEFI a standard and it shouldn't work the same on several vendors? I ask this because this patch broken all my uefi workstations.
It would be nice if it did .. however, this worked on many UEFI/Secureboot machines. It did not work on a small subset of machines.
While CentOS team could not have so much resources to run this type of tests would be great to know what happened to RHEL QA (being RH giant) for this release and given the partenership between CentOS and RH if you know something more on this.....
I have not seen the full post event account if what actually happened. I do know that many Red Hatters worked many hours over the last weekend to fix it. I am sure a public post will be made (if not already there) .. if someone knows where it is, post a link.
If I don't see it posted soon, I'll look for it and post here.
Thank you Johnny.
Once upon a time, Alessandro Baggi alessandro.baggi@gmail.com said:
you are right but is not UEFI a standard and it shouldn't work the same on several vendors? I ask this because this patch broken all my uefi workstations.
The great thing about standards is there's so many to choose from! Also relevant: https://xkcd.com/927/
UEFI has gone through a number of revisions over the years, and has optional bits like Secure Boot (which itself has gone through revisions). Almost any set of standards has undefined corners where vendors interpret things differently. Vendors also have bugs in weird places sometimes.
The firmware and boot loaders arguably are the least "exercised" parts of a system - both change rarely and there are few implementations. There's not many combinations, and they don't change a lot.
I'm interested to read about the cause of this issue - something like this can be a lesson on "hmm, hadn't thought of that before" type things to watch for in other areas.
On Fri, 7 Aug 2020 at 09:15, Chris Adams linux@cmadams.net wrote:
Once upon a time, Alessandro Baggi alessandro.baggi@gmail.com said:
you are right but is not UEFI a standard and it shouldn't work the same on several vendors? I ask this because this patch broken all my uefi workstations.
The great thing about standards is there's so many to choose from! Also relevant: https://xkcd.com/927/
UEFI has gone through a number of revisions over the years, and has optional bits like Secure Boot (which itself has gone through revisions). Almost any set of standards has undefined corners where vendors interpret things differently. Vendors also have bugs in weird places sometimes.
I go with the lines from Pirates of the Carribean movie.. it is less of a rigid code and more a set of guidelines. Computer programmers are a surly lot, and most take any MUST/SHALL in a standard a personal challenge on how to make it pass a test but do so in an interesting way.
The firmware and boot loaders arguably are the least "exercised" parts of a system - both change rarely and there are few implementations. There's not many combinations, and they don't change a lot.
I'm interested to read about the cause of this issue - something like this can be a lesson on "hmm, hadn't thought of that before" type things to watch for in other areas. -- Chris Adams linux@cmadams.net _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Il 07/08/20 15:46, Stephen John Smoogen ha scritto:
On Fri, 7 Aug 2020 at 09:15, Chris Adams linux@cmadams.net wrote:
Once upon a time, Alessandro Baggi alessandro.baggi@gmail.com said:
you are right but is not UEFI a standard and it shouldn't work the same on several vendors? I ask this because this patch broken all my uefi workstations.
The great thing about standards is there's so many to choose from! Also relevant: https://xkcd.com/927/
UEFI has gone through a number of revisions over the years, and has optional bits like Secure Boot (which itself has gone through revisions). Almost any set of standards has undefined corners where vendors interpret things differently. Vendors also have bugs in weird places sometimes.
I go with the lines from Pirates of the Carribean movie.. it is less of a rigid code and more a set of guidelines. Computer programmers are a surly lot, and most take any MUST/SHALL in a standard a personal challenge on how to make it pass a test but do so in an interesting way.
+1 Jack :D
Once upon a time, Alessandro Baggi alessandro.baggi@gmail.com said:
you are right but is not UEFI a standard and it shouldn't work the same on several vendors? I ask this because this patch broken all my uefi workstations.
The great thing about standards is there's so many to choose from! Also relevant: https://xkcd.com/927/
UEFI has gone through a number of revisions over the years, and has optional bits like Secure Boot (which itself has gone through revisions). Almost any set of standards has undefined corners where vendors interpret things differently. Vendors also have bugs in weird places sometimes.
The firmware and boot loaders arguably are the least "exercised" parts of a system - both change rarely and there are few implementations. There's not many combinations, and they don't change a lot.
I'm interested to read about the cause of this issue - something like this can be a lesson on "hmm, hadn't thought of that before" type things to watch for in other areas.
If you ask me I think the real root of the problem is that the UEFI/Secure Boot developers didn't know KISS - or they forgot about it. Once such a beast is born you can not handle it correctly no matter how much you try.
Regards, Simon
Hi Alessandro,
Compared to Microsoft , both RH and SuSE are awesome. You always need a patch management strategy with locked repos (spacewalk/pulp) which can be tested on less important systems, prior deployment on Prod. Keep in mind that Secureboot is hard to deploy in Virtual Environments and thus testing is not so easy.
Of course, contributing to the community was always welcomed.
Best Regards, Strahil Nikolov
На 7 август 2020 г. 10:40:01 GMT+03:00, Alessandro Baggi alessandro.baggi@gmail.com написа:
Il 07/08/20 08:22, Johnny Hughes ha scritto:
"How on earth could this have passed Q & A ?"
Hi Johnny, Niki's question is spread, legit, in the thoughts in many and many users so don't see this as an attack. Many and many users,though really "if this was tested before release" and I think that many of us are incredulous at what happened on CentOS and in the upstream (specially in the upstream) but as you said CentOS inherits RHEL bugs. I'm reading about many users that lost their trust in RH with the last 2 problem (microcode and shim). This is bad for CentOS.
Well, I mean that would be a valid point if it happened for every install. The issue did not happen on every install. There is no way
to
test every single hardware and firmware combination for every single computer ever built :)
It would be great if things like this did not happen, but with the universe of possible combinations, i am surprised it does not happen more often.
Probably many users have not updated their machines between the bug release and the resolution (thanks to your fast apply in the weekend, thank you) and many update their centos machines on a 2 months base (if
not worst). I think also that many users of CentOS user base have not proclamed their disappointement/the issue on this list or in other channels. For example I simply updated in the wrong time.
We do run boot tests of every single kernel for CentOS. The RHEL
team
runs many more tests for RHEL. But every possible combination from every vendor can't possibly be tested. Right?
you are right but is not UEFI a standard and it shouldn't work the same
on several vendors? I ask this because this patch broken all my uefi workstations.
While CentOS team could not have so much resources to run this type of tests would be great to know what happened to RHEL QA (being RH giant) for this release and given the partenership between CentOS and RH if you know something more on this.....
Thank you.
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos