RE: [CentOS] Question about RAID 5 array rebuild with mdadm

17 Apr 2008


      Mark Hennessy wrote:
...
I'm using Centos 4.5 right now, and I had a RAID 5 array stop because  
two drives became unavailable.  After adjusting the cables on several  
occasions and shutting down and restarting, I was able to see the  
drives again.  This is when I snatched defeat from the jaws of  
victory.  Please, someone with vast knowledge of how RAID 5 with mdadm  
works, tell me if I have any chance at all that this array will pull  
through with most or all of my data.
It may be possible...
...
Background info about the machine
/dev/md0 is a RAID1 consisting of /dev/sda1 and /dev/sda2
/dev/md1 is a RAID1 consisting of /dev/sda2 and /dev/sdb2
/dev/md2 (our special friend) is a RAID5 consisting of /dev/sd[c-j]
/dev/sdi and /dev/sdj were the drives that detached from the array and  
were marked as faulty.
I did the following things that in hindsight were probably VERY BAD
Step 1 (Misassign drives to wrong array):
I could probably have had things going again in a tenth of a second if  
I hadn't typed this:
mdadm --manage --add /dev/md0 /dev/sdi
mdadm --manage --add /dev/md0 /dev/sdi
This clobbered the superblock and replaced it with that of /dev/md0, yes?
well, that's what mdadm --misc --examine /dev/sdi and sdj said anyhow.
Hmm, not good, but we will mark this drive 'sdi' as bad.
...
Ok, so what next?
Step 2 (rebuild the array but make sure the params are right!):
I wipe out the superblocks on all of the drives in the array and  
rebuild with --assume-clean
for i in c d e f g h i j ; do mdadm --zero-superblock /dev/sd$i ; done
mdadm --create /dev/md2 --assume-clean --level=5 --raid-devices=8 /dev/ 
sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj
Nooo, you need to make sure sdi is marked as 'bad' offline, you are
going to need to assemble the array degraded, then add sdi as a
replacement and let it rebuild sdi off the parity.
...
ok, now it says that the array is recovering and will take about 10  
hours to rebulid.
/dev/sd[c-i] say that they are "active	sync" and 
/dev/sdj says it's a  
spare that's rebuilding.
But now I scroll back in my history and see that oops, the chunk size  
is WRONG.  Not only that, but I don't stop the array until the rebuild  
is at around 8%
Well, now I think it's all messed up.
...
Ok, I stop the array and rebuild with
mdadm --create /dev/md2 --assume-clean --level=5 --chunk --raid- 
devices=8 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/ 
sdi /dev/sdj
Now it says it's going to take another 10 hours to rebuild.
It's truly hosed now.
...
How likely are my data irretrievable/gone and at what step would it  
have happened if so?
I hope you have backups cause your going to need them.
If only you posted to the list BEFORE you tried to recover it without
knowing what to do.
-Ross
______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

RE: [CentOS] Question about RAID 5 array rebuild with mdadm