[CentOS] CentOS 6.6 - reshape of RAID 6 is stucked

Tue Aug 25 10:13:06 UTC 2015
Daniel Reich <Daniel.Reich at 2sic.com>

Hello

I have a CentOS 6.6 Server with 13 disks in a RAID 6. Some weeks ago, i upgraded it to 17 disks, two of them configured as spare. The reshape worked like normal in the beginning. But at 69% it stopped.

md2 : active raid6 sdj1[0] sdg1[18](S) sdh1[2] sdi1[5] sdm1[15] sds1[12] sdr1[14] sdk1[9] sdo1[6] sdn1[13] sdl1[8] sdd1[20] sdf1[19] sdq1[16] sdb1[10] sde1[17](S) sdc1[21]
      19533803520 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]
      [=============>.......]  reshape = 69.0% (1347861324/1953380352) finish=46103134.8min speed=0K/sec

I already tried to stop the raid and start it again, the reshape will start but stop again after some minutes. If I reboot the server, the reshape won't start:

md2 : active raid6 sdj1[0] sdg1[18](S) sdh1[2] sdi1[5] sdm1[15] sds1[12] sdr1[14] sdk1[9] sdo1[6] sdn1[13] sdl1[8] sdd1[20] sdf1[19] sdq1[16] sdb1[10] sde1[17](S) sdc1[21]
      19533803520 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]
       resync=PENDING

Just if I restart the raid again, it will start the reshape process and stop it like above.

In dmesg and messages logs I just found:

dmesg
md/raid:md2: reshape: not enough stripes.  Needed 1024

messages
23:14:56 data kernel: md/raid:md2: not clean -- starting background reconstruction
23:14:56 data kernel: md/raid:md2: reshape will continue
23:14:56 data kernel: md/raid:md2: device sdj1 operational as raid disk 0
23:14:56 data kernel: md/raid:md2: device sdh1 operational as raid disk 2
23:14:56 data kernel: md/raid:md2: device sdi1 operational as raid disk 5
23:14:56 data kernel: md/raid:md2: device sdn1 operational as raid disk 11
23:14:56 data kernel: md/raid:md2: device sds1 operational as raid disk 3
23:14:56 data kernel: md/raid:md2: device sdm1 operational as raid disk 1
23:14:56 data kernel: md/raid:md2: device sdf1 operational as raid disk 14
23:14:56 data kernel: md/raid:md2: device sdd1 operational as raid disk 13
23:14:56 data kernel: md/raid:md2: device sdb1 operational as raid disk 10
23:14:56 data kernel: md/raid:md2: device sdq1 operational as raid disk 7
23:14:56 data kernel: md/raid:md2: device sdr1 operational as raid disk 4
23:14:56 data kernel: md/raid:md2: device sdl1 operational as raid disk 8
23:14:56 data kernel: md/raid:md2: device sdk1 operational as raid disk 9
23:14:56 data kernel: md/raid:md2: device sdc1 operational as raid disk 12
23:14:56 data kernel: md/raid:md2: device sdo1 operational as raid disk 6
23:14:56 data kernel: md/raid:md2: allocated 0kB
23:14:56 data kernel: md/raid:md2: raid level 6 active with 15 out of 15 devices, algorithm 2
23:14:56 data kernel: md2: Warning: Device sdi1 is misaligned
23:14:56 data kernel: md2: detected capacity change from 0 to 20002614804480
23:14:56 data kernel: md2: unknown partition table
23:14:56 data kernel: XFS (md2): Mounting Filesystem
23:14:56 data kernel: md/raid:md2: reshape: not enough stripes.  Needed 1024
23:14:56 data kernel: XFS (md2): Ending clean mount

So i fixed the stripes:
cat /sys/block/md2/md/stripe_cache_size
16384

But the reshape is still not working and the same error still appears in the logs.

Have anyone some idea?

Regards
Daniel