Kernel bug in software RAID?

List overview All Threads
Download

newer

older

DNS configuration problem

Centos 4, bind slow responses when...

Matt Lawrence

8 May 2005 8 May '05

3:21 a.m.

After doing lots of research, it seems that there is a known bug in the software RAID for various versions of the 2.6 kernel. It seems to be affecting a system that a friend of mine has put together, so I'm guessing that RH has not back ported a fix into the released kernel. Does anyone here have any idea when a fix will make it into the CentOS4 distribution?

Matt Lawrence "Your friendly neighborhood sysadmin" 512.838.2645 T/L 678-2645 512.351.1061 (cell)

Attachments:

attachment.html (text/html — 523 bytes)

Show replies by date

Matt Dainty

8 May 8 May

11:12 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 8 May 2005, at 04:21, Matt Lawrence wrote:

...

After doing lots of research, it seems that there is a known bug in the software RAID for various versions of the 2.6 kernel. It seems to be affecting a system that a friend of mine has put together, so I'm guessing that RH has not back ported a fix into the released kernel. Does anyone here have any idea when a fix will make it into the CentOS4 distribution?

And the details of this known bug are...

The short answer is it will appear in CentOS 4 when it appears in RHEL 4.

Matt

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (Darwin) iD8DBQFCffQJKP58eR+X2TMRAvEOAKChngCaB/kzr69AgMOMtWiA8UKyAACfenlo PTj8poE1CUP9ELWfYsAdyuw= =+QQa -----END PGP SIGNATURE-----

Johnny Hughes

3:18 p.m.

On Sun, 2005-05-08 at 12:12 +0100, Matt Dainty wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 8 May 2005, at 04:21, Matt Lawrence wrote:

...
After doing lots of research, it seems that there is a known bug in the software RAID for various versions of the 2.6 kernel. It seems to be affecting a system that a friend of mine has put together, so I'm guessing that RH has not back ported a fix into the released kernel. Does anyone here have any idea when a fix will make it into the CentOS4 distribution?

And the details of this known bug are...

?

The short answer is it will appear in CentOS 4 when it appears in RHEL 4.

Unfortunately, Matt is correct. The only time that CentOS would release a patch that is a bugfix and not already released by RedHat would be if the bug rendered the OS unusable to the majority of people.

An example is the Glade-2 bug ... it affected all users and was an easy fix. Another example is thunderbird, which would not install at all as compiled. In both of these cases, the bug was clearly defined and a fix was already developed and released by RedHat, just not yet rolled into RHEL ... and the package in question was totally non-functional without the fix.

We must maintain binary compatibility in the base system ... it is our number one goal. That means that bugs are also usually duplicated.

If you can define the specific bug and a fix, I would be happy to produce a test kernel with the fix included ... or provide you with a test kernel from what will become CentOS-4.1 (currently in internal testing).

Thanks, Johnny Hughes

Aleksandar Milivojevic

9 May 9 May

1:39 p.m.

Johnny Hughes wrote:

...

If you can define the specific bug and a fix, I would be happy to produce a test kernel with the fix included ... or provide you with a test kernel from what will become CentOS-4.1 (currently in internal testing).

Since we are at kernel bugs, is fix for bug #151284 (NFS data corruption when mmap is used) included in 4.1?

-- Aleksandar Milivojevic amilivojevic@pbl.ca Pollard Banknote Limited Systems Administrator 1499 Buffalo Place Tel: (204) 474-2323 ext 276 Winnipeg, MB R3T 1L7

Johnny Hughes

3:46 p.m.

On Mon, 2005-05-09 at 08:39 -0500, Aleksandar Milivojevic wrote:

...

Johnny Hughes wrote:

...
If you can define the specific bug and a fix, I would be happy to produce a test kernel with the fix included ... or provide you with a test kernel from what will become CentOS-4.1 (currently in internal testing).

Since we are at kernel bugs, is fix for bug #151284 (NFS data corruption when mmap is used) included in 4.1?

In the beta kernel there are these fixes listed for mmap in the changelog ... (151284 is not listed):

- Fix possible futex mmap_sem deadlock - Add the flex-mmap bits for s390/s390x (Pete Zaitcev) - Add flex-mmap for x86-64 32 bit emulation

There may be a newer kernel though ... this one is kernel-2.6.9-6.37.EL.

Les Mikesell

5:46 p.m.

On Mon, 2005-05-09 at 10:46, Johnny Hughes wrote:

...

On Mon, 2005-05-09 at 08:39 -0500, Aleksandar Milivojevic wrote:

...
Johnny Hughes wrote:

...
If you can define the specific bug and a fix, I would be happy to produce a test kernel with the fix included ... or provide you with a test kernel from what will become CentOS-4.1 (currently in internal testing).

Since we are at kernel bugs, is fix for bug #151284 (NFS data corruption when mmap is used) included in 4.1?

In the beta kernel there are these fixes listed for mmap in the changelog ... (151284 is not listed):

Fix possible futex mmap_sem deadlock

Add the flex-mmap bits for s390/s390x (Pete Zaitcev)

Add flex-mmap for x86-64 32 bit emulation

There may be a newer kernel though ... this one is kernel-2.6.9-6.37.EL.

I've run into something that might or might not be the same RAID bug in FC3, kernel 2.6.11-1.14_FC3. I am trying to use software RAID1 to mirror an internal IDE with a matching partition on an external firewire drive. If I unmount the /dev/md? partiton so it stays idle I can usually make it through a full sync to the firewire drive but with the partition mounted the system will usually crash before the sync is complete. Also, if the partition is mounted after the sync completes it has never run more than a day without crashing. So far I have not found any useful diagnostics logged anywhere. The filesystem involved is resierfs in case that might make a difference. Obviously I haven't tried this under Centos yet since neither firewire nor reiserfs are supported, but the bug may really be in the raid code.

-- Les Mikesell les@futuresource.com

Nigel Kendrick

3:38 p.m.

This but wouldn't cause my Proliant to fail to resync a RAID array after a reboot following the Centos-4 install would it?

I have just rebooted and get a message that the array is not clean, both drives are accessed and then for the second (IDE) drive I get:

Hde: drive not ready for command.

Then the system hangs.

The install went fine albeit very slowly - I presumed this was because the OS was being installed on a software RAID 1 pair and it was resyncing as it installed. Having said that I installed Centos-4 on a 300GB SATA RAID 1 pair on Friday and it whizzed through.

Nigel

-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Johnny Hughes Sent: 08 May 2005 16:19 To: CentOS ML Subject: Re: [CentOS] Kernel bug in software RAID?

On Sun, 2005-05-08 at 12:12 +0100, Matt Dainty wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 8 May 2005, at 04:21, Matt Lawrence wrote:

...
After doing lots of research, it seems that there is a known bug in the software RAID for various versions of the 2.6 kernel. It seems to be affecting a system that a friend of mine has put together, so I'm guessing that RH has not back ported a fix into the released kernel. Does anyone here have any idea when a fix will make it into the CentOS4 distribution?

And the details of this known bug are...

?

The short answer is it will appear in CentOS 4 when it appears in RHEL 4.

We must maintain binary compatibility in the base system ... it is our number one goal. That means that bugs are also usually duplicated.

Thanks, Johnny Hughes

Aleksandar Milivojevic

3:53 p.m.

Nigel Kendrick wrote:

...

This but wouldn't cause my Proliant to fail to resync a RAID array after a reboot following the Centos-4 install would it?

I have just rebooted and get a message that the array is not clean, both drives are accessed and then for the second (IDE) drive I get:

Hde: drive not ready for command.

Then the system hangs.

The install went fine albeit very slowly - I presumed this was because the OS was being installed on a software RAID 1 pair and it was resyncing as it installed. Having said that I installed Centos-4 on a 300GB SATA RAID 1 pair on Friday and it whizzed through.

Installing on a software RAID1 (/dev/md*) that is something normal when doing an initial installation on an freshly defined RAID1. Disks are syncing during the install (alt-f2 and type "cat /proc/mdstat"), and when you reboot they will continue with the syncing). This shouldn't be a problem.

The only release that was doing it differently was one release of Fedora Core (don't remember if it was 1 or 2). That one would create RAID1 and force it not to resync. Assumption was that for ext3 and swap partitions it doesn't hurt if unused blocks were different on each submirror. Which is true as long as your file systems are clean. They chagned back to the old behaviour at one point down the road.

-- Aleksandar Milivojevic amilivojevic@pbl.ca Pollard Banknote Limited Systems Administrator 1499 Buffalo Place Tel: (204) 474-2323 ext 276 Winnipeg, MB R3T 1L7

7542

Age (days ago)

7543

Last active (days ago)

discuss@lists.centos.org

7 comments

6 participants

tags (0)

participants (6)

Aleksandar Milivojevic
Johnny Hughes
Les Mikesell
Matt Dainty
Matt Lawrence
Nigel Kendrick