Dear all.
I'm experiencing a weird crash with one of our desktop running CentOS 5.1
We have 5 machines identical, onle one has this problem.
Right after it starts, it will kernel panic. Unfortunately, from the backtrace this is all I've managed to get : serial port isn't working. So it's a manual copy of what the screen show.
[<c04721f3>] sync_buffer+0x0/0x33 [<c04bff66>] avc_has_perm+0x3a/0x44 [<c04c055d>] inode_has_perm+0x54 [<f8899829>] ext3_lookup+0x25/0xb7 [ext3] [<c047b0d1>] do_lookup+0xb4/0x166 [<c047ce5b>] __link_path_walk+0x87a/0xd33 [<c047d35d>] link_path_walk+0x49/0xbd [<c046f582>] sys_chdir+0x4f/0x57 [<c047d72a>] do_path_lookup+0x20e/0x25e [<c0470daf>] get_empty_filp+0x99/0x15e [<c047dfd7>] __path_lookup_intent_open+0x42/0x72 [<c047e056>] path_lookup_open+0xf/0x13 [<c047e15a>] open_namei+0x7b/0x609 [<c046e8ca>] do_filp_open+0x1c/0x31 [<c046f582>] sys_chdir+0x4f/0x57 [<c046e91d>] do_sys_open+0x3e/0xae [<c046e9ba>] sys_open+0x16/0x18 [<c0404eff>] syscall_call+0x7/0xb ======================= Code: 0c 29 d0 83 e8 18 c1 e8 03 39 c1 74 29 68 d4 22 8a f8 68 80 01 00 00 68 3e 21 8a f8 68 68 12 8a f8 68 4e 21 8a f8 e8 d0 f1 b8 c7 <0f> 0b 80 01 3e 21 8a f8 83 c4 14 8b 44 24 28 89 44 24 0c 66 8b EIP: [<f8897626>] dx_probe+0x16e/0x2b4 [ext3] SS:ESP 0068:f7e47cc4 <0>Kernel panic - not syncing: Fatal exception
This machine is dual booting windows and CentOS. Windows has no problem and will run just fine (so I believe this is not a hardware issue). Interestingly, this machine worked fine for several weeks and one day decided to crash within the first few seconds of boot. I have tried with several kernel, including the latest one as of today: not working any better.
If I boot the CentOS DVD, when it comes to searching for existing CentOS installation, it will crash just the same.
If I format the linux partition, I can then re-install CentOS, update it just fine... Then a few days later it will crash just the same right after booting.
This machine is running a Gigabyte P35-DS3P motherboard, with 2GB of DDR2-667MHz RAM, an Intel core-duo quad: Q6600. It has two 250GB hard drive in RAID1 using the bios RAID support (so I can have RAID on both Windows and Linux). It's running the 32 bits version of CentOS 5
Any ideas what the problem could be?
Any help will be greatly appreciated.
Thank you Jean-Yves
On Jan 3, 2008 8:54 AM, Jean-Yves Avenard jyavenard@gmail.com wrote:
We have 5 machines identical, onle one has this problem.
Odd. Have you checked the memory with memtest?
This machine is dual booting windows and CentOS. Windows has no problem and will run just fine (so I believe this is not a hardware issue).
Which version of Windows? This shouldn't make a difference, but it's been known to cause issues.
Interestingly, this machine worked fine for several weeks and one day decided to crash within the first few seconds of boot.
Anything special happen on that day? Windows updates, linux updates, application updates etc?
Hi
On Jan 4, 2008 2:43 AM, Jim Perrin jperrin@gmail.com wrote:
Odd. Have you checked the memory with memtest?
I haven't. Simply because it runs so well under Windows.
But I had plan to buy new RAM tomorrow, they are so cheap these days...
Which version of Windows? This shouldn't make a difference, but it's been known to cause issues.
Windows XP SP2.
Anything special happen on that day? Windows updates, linux updates, application updates etc?
Well, the user told me that no he didn't. Reason being linux had just been fully re-installed (due to the same crash), and Windows had been running for several weeks no reason to upgrade, and people are instructed no to install any updates.
Windows is installed with the extfs driver.
Now thinking about it, I did modify the sudoers list from Windows by editing /etc/sudoers.
I actually did think that the issue could be with the Intel RAID not being handled properly and that somehow Windows wrote something on the ext partition that made the linux driver crash.
It's very annoying. Jean-Yves
on 1/3/2008 7:53 AM Jean-Yves Avenard spake the following:
Hi
On Jan 4, 2008 2:43 AM, Jim Perrin jperrin@gmail.com wrote:
Odd. Have you checked the memory with memtest?
I haven't. Simply because it runs so well under Windows.
But I had plan to buy new RAM tomorrow, they are so cheap these days...
Which version of Windows? This shouldn't make a difference, but it's been known to cause issues.
Windows XP SP2.
Anything special happen on that day? Windows updates, linux updates, application updates etc?
Well, the user told me that no he didn't. Reason being linux had just been fully re-installed (due to the same crash), and Windows had been running for several weeks no reason to upgrade, and people are instructed no to install any updates.
Windows is installed with the extfs driver.
Now thinking about it, I did modify the sudoers list from Windows by editing /etc/sudoers.
Try fixing that, you could dos2unix that file, or restore from another machine.
I actually did think that the issue could be with the Intel RAID not being handled properly and that somehow Windows wrote something on the ext partition that made the linux driver crash.
It's very annoying. Jean-Yves
Hi
On Jan 4, 2008 10:33 AM, Scott Silva ssilva@sgvwater.com wrote:
Try fixing that, you could dos2unix that file, or restore from another machine.
I used ultraedit to edit that file, it preserves unix end of line. You actually think that deleting /etc/sudoers will suddenly prevent a kernel panic during boot ? Or even the install DVD to crash when searching for existing CentOS installation?
Jean-Yves
Hi again
On Jan 4, 2008 11:57 AM, Jean-Yves Avenard jyavenard@gmail.com wrote:
I used ultraedit to edit that file, it preserves unix end of line. You actually think that deleting /etc/sudoers will suddenly prevent a kernel panic during boot ? Or even the install DVD to crash when searching for existing CentOS installation?
I've tried deleting the file from Windows , copying the file from another machine.
Same result.
When booting the CentOS 5.1 DVD in rescue mode, as soon as I mount the RAID array, I get a kernel panic.
Sound like a bug too me.
Jean-Yves
Hi again
On Jan 4, 2008 4:56 PM, Jean-Yves Avenard jyavenard@gmail.com wrote:
Sound like a bug too me.
I have tried booting the rescue DVD of Fedora 7, and it crashed just the same when trying to mount the linux partition on the RAID1 array.
However, Fedora 8 manages to boot well, I was able to mount the partition without any problems.
Currently compiling a kernel 2.6.26.12 , will see if it works with this kernel.
If anyone has a solution on why the stock CentOS kernel crashes when mounting a Linux Intel Bios RAID1 array, I'm all ears !
Jean-Yves
on 1/3/2008 11:30 PM Jean-Yves Avenard spake the following:
Hi again
On Jan 4, 2008 4:56 PM, Jean-Yves Avenard jyavenard@gmail.com wrote:
Sound like a bug too me.
I have tried booting the rescue DVD of Fedora 7, and it crashed just the same when trying to mount the linux partition on the RAID1 array.
However, Fedora 8 manages to boot well, I was able to mount the partition without any problems.
Currently compiling a kernel 2.6.26.12 , will see if it works with this kernel.
If anyone has a solution on why the stock CentOS kernel crashes when mounting a Linux Intel Bios RAID1 array, I'm all ears !
Jean-Yves
AFAIR that is fakeraid anyway. Maybe one of the drives is having a problem. Could be that the dmraid driver isn't as robust as software raid with drive problems. You could eliminate the hardware (except the drives) by swapping drives from one of the known working machines into this one and see what happens. IF it borks, bad hardware, if not, bad drives or bad install.
Hi
On Jan 5, 2008 2:46 AM, Scott Silva ssilva@sgvwater.com wrote:
AFAIR that is fakeraid anyway. Maybe one of the drives is having a problem. Could be that the dmraid driver isn't as robust as software raid with drive problems. You could eliminate the hardware (except the drives) by swapping drives from one of the known working machines into this one and see what happens. IF it borks, bad hardware, if not, bad drives or bad install.
It may be fakeraid,, but it's the only RAID solution that will work in both Windows and Linux and allow to transfer files between both system. If I were to use linux software raid, I wouldn't be able to access the linux partition under windows.
I doubt it's a hardware issue for the following reasons: 1-Windows works fine with intensive disk activity 2-Fedora 8 works fine too. 3-It's only Centos kernel that crashes when using those drives.
Jean-Yves
Jean-Yves Avenard wrote:
Hi
On Jan 5, 2008 2:46 AM, Scott Silva ssilva@sgvwater.com wrote:
AFAIR that is fakeraid anyway. Maybe one of the drives is having a problem. Could be that the dmraid driver isn't as robust as software raid with drive problems. You could eliminate the hardware (except the drives) by swapping drives from one of the known working machines into this one and see what happens. IF it borks, bad hardware, if not, bad drives or bad install.
It may be fakeraid,, but it's the only RAID solution that will work in both Windows and Linux and allow to transfer files between both system. If I were to use linux software raid, I wouldn't be able to access the linux partition under windows.
I doubt it's a hardware issue for the following reasons: 1-Windows works fine with intensive disk activity 2-Fedora 8 works fine too. 3-It's only Centos kernel that crashes when using those drives.
I am sure it is a raid driver issue.
Have you considered running one of the OSes (linux or windows) in a VM.
That way, you can share files and have both available at the same time, unlike dual boot, where you need to reboot to get the other OS.
In that scenario you do not need to use fraid as one OS can be a virtual file system inside the other.
CentOS does not do the "latest and greatest" drivers very well ... though they may bring bugfixes back from newer kernels, I would not count on that happening.
Hi
On Jan 6, 2008 12:03 AM, Johnny Hughes johnny@centos.org wrote:
I am sure it is a raid driver issue.
Have you considered running one of the OSes (linux or windows) in a VM.
That way, you can share files and have both available at the same time, unlike dual boot, where you need to reboot to get the other OS.
I think I know what is happening and what happened.
After repairing the partition from the Fedora 8 DVD, I restarted booting from the disk with CentOS
I then compiled the latest kernel from kernel.org and rebooted. It failed to see the RAID array and mounted the disk individually (from /dev/sda2).
When I rebooted again using the CentOS kernel, again I got a kernel panic.
I think this is what happened earlier on. The EXT2/3 drivers I used in Windows wasn't using the RAID array but writing directly on the physical disk, as such damaging the RAID mirror. When CentOS booted, it got confused with the data it was reading and did a kernel panic.
Same thing happened using my compiled kernel, it wrote data on the first disk only rather than on the mirror.
For some reasons, the CentOS kernel RAID driver doesn't handle that error properly and will kernel panic.
The Fedora 8 kernel, does handle it properly and is able to access the disk.
Now I have no idea on how to compile a newer kernel and make it recognise the RAID array. There aren't much tool available to manage the RAID mirror. It looks like one kernel see the RAID disk name as something and the centos kernel as another.
Jean-Yves