[CentOS] Recover RAID

Thu Mar 4 01:54:28 UTC 2010
Jeff Sadino <jsadino.queens at gmail.com>

Hello Everyone,

First time CentOS poster :)  I have CentOS 4 installed on the head node of a
Sun Gridengine cluster set up in a RAID.  The head node has four hard
drives, and I assume that drives 1 and 2 are in a raid and then drives 3 and
4 are in another raid.  I was trying to expand the OS partition on drive 1
because it was full.  I took drive 1 out, put it in my Fedora 8 box as a
secondary drive, booted up into Fedora, and saw it had the partition
structure:
/ 8GB
/var 4GB
/swap 1GB
and an "unknown" partition 101.4GB

I did a cp -rfa on the / and /var files for a backup (I know, not the best
way).  Restarted my Fedora into Windows to take a look at it using Paragon
Partition Manger.  Restarted into Fedora and using gparted, formatted the
"unknown" partition as ext3 - I think that is where I made my fatal mistake
- and moved the /swap to the middle of the drive, moved the /var to the
middle and expanded to 10GB, and then expanded / to about 50GB to fill up
the rest.

I had also took drive 2 out of the head node and into my Fedora, and saw it
had the partition structure:
/swap 15GB
and an "unknown" partition 101.4GB

Ok, now when I put everything back into the head node, and reboot, the BIOS
sees all four drives, and from what I can tell, recognizes the first raid
(of drives 3 and 4), but says it can only find one disk for the second raid
(drives 1 and 2).  I can't find any way around this.

Looking at my /etc/raidtab file:
raiddev /dev/md0
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  0
        persistent-superblock 1
        device          /dev/sdc1
        raid-disk       0
        device          /dev/sdd1
        raid-disk       1

raiddev /dev/md1
        raid-level      0
        nr-raid-disks   2
        persistent-superblock 1
        chunk-size     4
        device          /dev/sda4
        raid-disk       0
        device          /dev/sdb2
        raid-disk       1

It says it can bring up md0 ok, but not md1.  Right now, I am going to try
to restore the "unknown" partition that I deleted from drive 1 using the
"unknown" partition from drive 2.

Any ideas on how to get myself out of this mess?  I feel like I really
messed it up good.  This is a server for our work, and we have a couple
years worth of data on it, so I would really like to fix it rather than
reinstall.

Thank you greatly for any help!
Jeff Sadino
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/centos/attachments/20100303/451da292/attachment-0003.html>