[CentOS] task md1_resync:9770 blocked for more than 120 seconds and OOM errors

Sun Mar 20 14:33:40 UTC 2011
Kenni Lund <kenni at kelu.dk>

2011/3/20 Alexander Farber <alexander.farber at gmail.com>
>
> Thank you, I've decreased
> /proc/sys/dev/raid/speed_limit_max
> from 200000 to 100000.

200000 is just the theoretical maximum. If your discs max out at
80000, you'll need to set it lower than that. While syncing, you can
check the current sync speed with:
cat /proc/mdstat

> I think I don't care about the sync speed,
> but I'd like to avoid the OOM errors and
> server lockup like I had yesterday

AFAIK, the errors are harmless, it's some locking bug in the kernel
which just hasn't been fixed in CentOS 5 yet. This is not related to
any out-of-memory errors, and hence most likely not related to the
lockup you experienced.

2011/3/20 Markus Falb <markus.falb at fasel.at>:
> https://bugzilla.redhat.com/show_bug.cgi?id=573106#c31

Ahh, yes, I forgot about that bugreport. According to that report, the
issue has been fixed in the kernel in upstream 5.6...so it will get
fixed in CentOS 5.6.

> I do not see how decreasing the speed_limit_max should avoid the
> mdX_resync warnings. I would expect more of these warnings now, because
> sync takes longer?

Hmm, I received the same error messages on a Core i7 system I
installed recently. While syncing, the system was close to being
completely unresponsive (took ages to just get a SSH-connection).
After limiting the I/O by setting a lower maximum sync speed, the
system got responsive and the messages disappeared. Comment #36 in the
bug report actually suggests the same workaround.

Best regards
Kenni