[CentOS] raid5 crash

James Olin Oden james.oden at gmail.com
Sat Jul 9 15:58:49 UTC 2005


On 7/7/05, Farkas Levente <lfarkas at bppiac.hu> wrote:
> hi,
> after we switch our servers from centos-3 to centos-4 (aka. rhel-4) one
> of our server always crash once a week without any oops. this happneds
> with both the normal kernel-2.6.9-11.EL and
> kernel-2.6.9-11.106.unsupported. after we change the motherboard, the
> raid contorller and the cables too we still got it. finally we start
> netdump and last but not least yesterday we got a crash log and a core
> file. it seems there is a bug in the raid5 code of the kernel.
> this is our backup server with 8 x 200GB hdd in a raid5 (for the data)
> plus 2 x 40GB hdd in raid1 (for the system) with 3ware 8xxx raid
> contorller, running. i attached the netdump log of the last crash.
> how can i fix it?
> yours.
>
Hi,

I have seen similar (but not quite the same) in the raid code on RHEL
3 kernels.   They typically have occured due to a race condition
between something updating the linked lists of raid devices and
something trying to read them.  For RHEL 3, my co-workes and I found
where one particular race condition was fixed in 2.6 kernel and back
ported to RHEL 3 kernel.   Ultimately this patch was placed in one of
the updates for the RHEL 3 kernel.

Anyway, it is likely your problem is yet another race condition.  What
I would suggest doing is get a box configured with true RHEL 4 and
reproduce.   Once reproduced file a bugzilla report with redhat.   We
have had very good success with this approach with a number of kernel
bugs we found in the Centos 3/RHEL 3 kernels. Fixes have not always
come quickly, but they generally do come.

Good Luck...james
> --



More information about the CentOS mailing list