Hi,
I recently installed some fresh CentOS 6.5 machines and it took only about 20 minutes until the file system (ext4) was broken. And with "broken" I mean, that the system wasn't able to find vital system libraries any more!
I were able to reproduce it on highly different systems:
- A fresh installed CentOS 6.5 64 Bit on a virtual machine (KVM) - A system which I installed some weeks ago (also a KVM machine) and did a "yum update" which upgraded to kernel 2.6.32-431.5.1.el6. - A fresh installed CentOS 6.5 64 Bit on native hardware on a RAID 5 assembled with mdadm - A CentOS 6.4 system installed some months ago on native hardware with mdadm RAID 5
The last system wasn't able to boot after the upgrade and crashed with several kernel panics which I can show as a screenshot if there's any interest - but the display resolution was quite low so there are not many helpful information :-(
Actually, I assume that there's anything broken with the new kernel or at least the kernel module for ext4 file system. You can reproduce it by just doing a "yum update" and rebooting the system. If it comes up, reboot it again and manually do an offline filesystem check - or just do some writing activities on the disks.
Is it just me? I don't use any 3rd party repositorys and it blows my mind that no one else seems to notice...!
Greetings from Wuppertal, Germany Max
On 02/15/2014 11:33 AM, Max Grobecker wrote:
Hi,
I recently installed some fresh CentOS 6.5 machines and it took only about 20 minutes until the file system (ext4) was broken. And with "broken" I mean, that the system wasn't able to find vital system libraries any more!
I were able to reproduce it on highly different systems:
- A fresh installed CentOS 6.5 64 Bit on a virtual machine (KVM)
- A system which I installed some weeks ago (also a KVM machine) and
did a "yum update" which upgraded to kernel 2.6.32-431.5.1.el6.
- A fresh installed CentOS 6.5 64 Bit on native hardware on a RAID 5
assembled with mdadm
- A CentOS 6.4 system installed some months ago on native hardware with
mdadm RAID 5
The last system wasn't able to boot after the upgrade and crashed with several kernel panics which I can show as a screenshot if there's any interest - but the display resolution was quite low so there are not many helpful information :-(
Actually, I assume that there's anything broken with the new kernel or at least the kernel module for ext4 file system. You can reproduce it by just doing a "yum update" and rebooting the system. If it comes up, reboot it again and manually do an offline filesystem check - or just do some writing activities on the disks.
Is it just me? I don't use any 3rd party repositorys and it blows my mind that no one else seems to notice...!
I have the following VM I am currently using to build software on:
Linux sclbuild 2.6.32-431.5.1.el6.x86_64 #1 SMP Wed Feb 12 00:41:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
It has this as the main filesystem:
/dev/mapper/vg_sclbuild-lv_root on / type ext4 (rw)
I am not having any issues with this Xen DomU VM, running on a Xen4CentOS6 Dom0.
Hi,
Thanks for your replies! Today, I'm unable to get the filesystem errors reproduced - maybe I got a bad mirror? Very unlikely, the PGP signature should then be broken also...
Well, at least the problems with booting the machine still exists, but I tested this only in virtual environments until now.
In about 50% of my startup tries, the bootloader is counting down and tries starting the default kernel. But instead of that, the system gets reset and the bootloader starts again. Even if I choose the former kernel, this thing still happens.
After a while the loop gets stopped by a kernel panic. I'm not sure, if the systems kernel panic'ed, or if it's the boot loader itself... I attached two screenshots to this mail. These are the only information I can get so far.
This strange behaviour only happens to KVM (HVM) machines which were recently upgraded to the new kernel. Other systems (surprisingly MS Windows also) are running and booting finde without problems. I'm unable to test, if this happens also to native machines without virtualization, at the moment :-(
If you need more information, I could help ;-)
Greetings from Wuppertal, Germany Max
Am 15.02.2014 20:31, schrieb Ulf Volmer:
On 02/15/2014 06:33 PM, Max Grobecker wrote:
Is it just me? I don't use any 3rd party repositorys and it blows my mind that no one else seems to notice...!
Just you.
2.6.32-431.5.1.el works here without any issues on phys. and virtual plattforms.
regrads Ulf
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
You might have some hardware going bad underneath.
On 02/15/2014 03:30 PM, Max Grobecker wrote:
Hi,
Thanks for your replies! Today, I'm unable to get the filesystem errors reproduced - maybe I got a bad mirror? Very unlikely, the PGP signature should then be broken also...
Well, at least the problems with booting the machine still exists, but I tested this only in virtual environments until now.
In about 50% of my startup tries, the bootloader is counting down and tries starting the default kernel. But instead of that, the system gets reset and the bootloader starts again. Even if I choose the former kernel, this thing still happens.
After a while the loop gets stopped by a kernel panic. I'm not sure, if the systems kernel panic'ed, or if it's the boot loader itself... I attached two screenshots to this mail. These are the only information I can get so far.
This strange behaviour only happens to KVM (HVM) machines which were recently upgraded to the new kernel. Other systems (surprisingly MS Windows also) are running and booting finde without problems. I'm unable to test, if this happens also to native machines without virtualization, at the moment :-(
If you need more information, I could help ;-)
Greetings from Wuppertal, Germany Max
Am 15.02.2014 20:31, schrieb Ulf Volmer:
On 02/15/2014 06:33 PM, Max Grobecker wrote:
Is it just me? I don't use any 3rd party repositorys and it blows my mind that no one else seems to notice...!
Just you.
2.6.32-431.5.1.el works here without any issues on phys. and virtual plattforms.
regrads Ulf _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
I tested one KVM machine - but this problem occured on several different hardwares, most of them are running other virtual machines without problems...
I'm going to test, if it's related to the mirror I'm using. All machines were this is happening are using the same mirror server. I't very unlikely because the signature would also be broken with a broken package - but it's the only thing that seems to be explaining what's going on here :-(
Max
Am 16.02.2014 03:47, schrieb Gerry Reno:
You might have some hardware going bad underneath.
On 02/15/2014 03:30 PM, Max Grobecker wrote:
Hi,
Thanks for your replies! Today, I'm unable to get the filesystem errors reproduced - maybe I got a bad mirror? Very unlikely, the PGP signature should then be broken also...
Well, at least the problems with booting the machine still exists, but I tested this only in virtual environments until now.
In about 50% of my startup tries, the bootloader is counting down and tries starting the default kernel. But instead of that, the system gets reset and the bootloader starts again. Even if I choose the former kernel, this thing still happens.
After a while the loop gets stopped by a kernel panic. I'm not sure, if the systems kernel panic'ed, or if it's the boot loader itself... I attached two screenshots to this mail. These are the only information I can get so far.
This strange behaviour only happens to KVM (HVM) machines which were recently upgraded to the new kernel. Other systems (surprisingly MS Windows also) are running and booting finde without problems. I'm unable to test, if this happens also to native machines without virtualization, at the moment :-(
If you need more information, I could help ;-)
Greetings from Wuppertal, Germany Max
Am 15.02.2014 20:31, schrieb Ulf Volmer:
On 02/15/2014 06:33 PM, Max Grobecker wrote:
Is it just me? I don't use any 3rd party repositorys and it blows my mind that no one else seems to notice...!
Just you.
2.6.32-431.5.1.el works here without any issues on phys. and virtual plattforms.
regrads Ulf _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
OK, the original CentOS mirror does not make any changes to that :-(
Is there any way I can debug these kernel panics? The hardware I'm testing on is definetily working well (Memtest did not find any errors, besides that, this machine is using ECC RAM) and as mentioned, no other machine on this host throws any errors.
The panics seem to be a KVM related thing... When powering up the machine, it boots without any problems. If I do a reboot, it nevers comes up again. Then it gets stuck in a "bootloader loop", which means, the bootloader shows up, tries to start something and the system gets reset instantly. The last thing I can see before the reset occurs is "Probing EDD (edd=off to disable)... ok". Then the machine gets reset and the bootloader comes up again.
If I add "edd=off" to the kernel parameters before booting, it gets stuck with a cursor in the top left corner and nothing happens - it does'nt anything on the disks and does not consume any CPU time.
This machine is running on a Debian Wheezy host with kernel 3.2.0-4-amd64 and QEMU 1.1.2 / libvirtd 0.9.12.
Is there anything I could do to debug this thing more deeply? At the moment I have to shut off the machine when I'm going to reboot it...
The collapsing file system has been demystified - my colleague simply missed to reboot the systems after upgrading to the new kernel version. But, in my opinion, that should'nt happen either...
Thank you!
Greetings from Wuppertal Max
Am 16.02.2014 14:55, schrieb Max Grobecker:
I tested one KVM machine - but this problem occured on several different hardwares, most of them are running other virtual machines without problems...
I'm going to test, if it's related to the mirror I'm using. All machines were this is happening are using the same mirror server. I't very unlikely because the signature would also be broken with a broken package - but it's the only thing that seems to be explaining what's going on here :-(
Max
Am 16.02.2014 03:47, schrieb Gerry Reno:
You might have some hardware going bad underneath.
On 02/15/2014 03:30 PM, Max Grobecker wrote:
Hi,
Thanks for your replies! Today, I'm unable to get the filesystem errors reproduced - maybe I got a bad mirror? Very unlikely, the PGP signature should then be broken also...
Well, at least the problems with booting the machine still exists, but I tested this only in virtual environments until now.
In about 50% of my startup tries, the bootloader is counting down and tries starting the default kernel. But instead of that, the system gets reset and the bootloader starts again. Even if I choose the former kernel, this thing still happens.
After a while the loop gets stopped by a kernel panic. I'm not sure, if the systems kernel panic'ed, or if it's the boot loader itself... I attached two screenshots to this mail. These are the only information I can get so far.
This strange behaviour only happens to KVM (HVM) machines which were recently upgraded to the new kernel. Other systems (surprisingly MS Windows also) are running and booting finde without problems. I'm unable to test, if this happens also to native machines without virtualization, at the moment :-(
If you need more information, I could help ;-)
Greetings from Wuppertal, Germany Max
Am 15.02.2014 20:31, schrieb Ulf Volmer:
On 02/15/2014 06:33 PM, Max Grobecker wrote:
Is it just me? I don't use any 3rd party repositorys and it blows my mind that no one else seems to notice...!
Just you.
2.6.32-431.5.1.el works here without any issues on phys. and virtual plattforms.
regrads Ulf _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
OK, the original CentOS mirror does not make any changes to that :-(
Is there any way I can debug these kernel panics? The hardware I'm testing on is definetily working well (Memtest did not find any errors, besides that, this machine is using ECC RAM) and as mentioned, no other machine on this host throws any errors.
The panics seem to be a KVM related thing... When powering up the machine, it boots without any problems. If I do a reboot, it nevers comes up again. Then it gets stuck in a "bootloader loop", which means, the bootloader shows up, tries to start something and the system gets reset instantly. The last thing I can see before the reset occurs is "Probing EDD (edd=off to disable)... ok". Then the machine gets reset and the bootloader comes up again.
If I add "edd=off" to the kernel parameters before booting, it gets stuck with a cursor in the top left corner and nothing happens - it does'nt anything on the disks and does not consume any CPU time.
This machine is running on a Debian Wheezy host with kernel 3.2.0-4-amd64 and QEMU 1.1.2 / libvirtd 0.9.12.
Is there anything I could do to debug this thing more deeply? At the moment I have to shut off the machine when I'm going to reboot it...
The collapsing file system has been demystified - my colleague simply missed to reboot the systems after upgrading to the new kernel version. But, in my opinion, that should'nt happen either...
Any ext4 or kernel errors in the logs or anything at all? AFAIR there was once a problem with virtio disk drivers in C5 kvm guests.