*very* ugly mdadm issue - Discuss

List overview All Threads
Download

newer

very ugly mdadm issue

older

Re: [CentOS] installing Desktop...

CentOS-announce Digest, Vol 115,...

m.roth＠5-cent.us

29 Aug 2014 29 Aug '14

8:26 p.m.

We have a machine that's a distro mirror - a *lot* of data, not just CentOS. We had the data on /dev/sdc. I added another drive, /dev/sdd, and created that as /dev/md4, with --missing, made an ext4 filesystem on it, and rsync'd everything from /dev/sdc.

Note that we did this on *raw*, unpartitioned drives (not my idea). I then umounted /dev/sdc, and mounted /dev/md4, and it looked fine; I added /dev/sdc to /dev/md4, and it started rebuilding.

Then I was told to reboot it, right after the rebuild started. I don't know if that was the problem. At any rate, it came back up... and /dev/sdc is on as /dev/md127, and no /dev/md4, nothing in /etc/mdadm.conf, and, oh, yes, mdadm -A /dev/md4 /dev/sdd mdadm: Cannot assemble mbr metadata on /dev/sdd mdadm: /dev/sdd has no superblock - assembly aborted

Oh, and mdadm -E /dev/sdd /dev/sdd: MBR Magic : aa55 Partition[0] : 4294967295 sectors at 1 (type ee)

ee? A quick google says that indicates a legacy MBR, followed by an EFI....

I *REALLY* don't want to loose all that data. Any ideas?

mark

Show replies by date

Warren Young

2 Sep 2 Sep

5:42 p.m.

On 8/29/2014 14:26, m.roth@5-cent.us wrote:

...

Note that we did this on *raw*, unpartitioned drives (not my idea).

Nothing wrong with that, particularly with big "midden" volumes like this one.

...

I added /dev/sdc to /dev/md4, and it started rebuilding.

*facepalm*

You forgot the primary maxim of data integrity: two is one, one is none.

When you overwrote your original copy with what you thought was a clone, you reduced yourself to a single copy again. If anything is wrong with that copy, you now have two copies of the error.

What you *should* have done is buy two drives, set them up as a new mirror, copy the data over to them, then pull the old /dev/sdc and put it on a shelf as an offline archive mirror. /dev/sdc has probably already given you its rated service life, so it's not like you're really saving money here. The drive has already depreciated to zero.

You're probably going to spend more in terms of your time (salary + benefits) to fix this than the extra drive would have cost you, and at the end of it, you still won't have the security of that offline archive mirror.

I know this isn't the answer you wanted, but it's probably the answer a lot of people *wanted* to give, but chose not to, going by the crickets. (It's either that or the 3-day holiday weekend.)

I don't know how much I can help you. I have always used hardware RAID on Linux, even for simple mirrors.

I don't see why it matters that your /dev/sdd partitioning is different from your /dev/sdc. When you told it to blast /dev/sdc with the contents of /dev/sdd, it should have copied the partitioning, too.

Are you certain /dev/sdc is partially overwritten now? What happens if you try to mount it? If it mounts, go buy that second fresh disk, then set the mirror up correctly this time.

Les Mikesell

6:02 p.m.

On Tue, Sep 2, 2014 at 12:42 PM, Warren Young warren@etr-usa.com wrote:

...

On 8/29/2014 14:26, m.roth@5-cent.us wrote:

...
Note that we did this on *raw*, unpartitioned drives (not my idea).

...
I added /dev/sdc to /dev/md4, and it started rebuilding.

...

I know this isn't the answer you wanted, but it's probably the answer a lot of people *wanted* to give, but chose not to, going by the crickets. (It's either that or the 3-day holiday weekend.)

I haven't used raw devices as members so I'm not sure I understand the scenario. However, I thought that devices over 2TB would not auto assemble so you would have to manually add the ARRAY entry for /dev/md4 in /etc/mdadm.conf containing /dev/sdd and /dev/sdc for the system to recognize it at bootup.

...

Are you certain /dev/sdc is partially overwritten now? What happens if you try to mount it? If it mounts, go buy that second fresh disk, then set the mirror up correctly this time.

But sdd _should_ have the correct data - it just isn't being detected as a raid member. I think with smaller devices - or at least devices with smaller partitions and FD type in the MBR it would have worked automatically with the kernel autodetect.

-- Les Mikesell lesmikesell@gmail.com

m.roth＠5-cent.us

6:33 p.m.

I'm the OP, here....

Les Mikesell wrote:

...

On Tue, Sep 2, 2014 at 12:42 PM, Warren Young warren@etr-usa.com wrote:

...
On 8/29/2014 14:26, m.roth@5-cent.us wrote:

...
Note that we did this on *raw*, unpartitioned drives (not my idea).

...
I added /dev/sdc to /dev/md4, and it started rebuilding.

<snip>

...

I haven't used raw devices as members so I'm not sure I understand the scenario. However, I thought that devices over 2TB would not auto assemble so you would have to manually add the ARRAY entry for /dev/md4 in /etc/mdadm.conf containing /dev/sdd and /dev/sdc for the system to recognize it at bootup.

Yeah. That was one thing I discovered. Silly me, assuming that the mdadm would create an entry in /etc/mdadm.conf. And this is not something I do more than once or twice a year, and haven't this year (we have a good number of Dells with a PERC 7, or then there's the JetStors....).

...

...
Are you certain /dev/sdc is partially overwritten now? What happens if you try to mount it? If it mounts, go buy that second fresh disk, then

set

...

...
the mirror up correctly this time.

It was toast.

...

But sdd _should_ have the correct data - it just isn't being detected as a raid member. I think with smaller devices - or at least devices with smaller partitions and FD type in the MBR it would have worked automatically with the kernel autodetect.

Both had a GPT on them, just no partitions. And that's the thing that really puzzles me - why mdadm couldn't find the RAID info on /dev/sdd, which *had* been just fine.

Anyway, the upshot was my manager was rather annoyed - I *should* have pulled sdc, and put in a new one, and just let that go. I still think it would have failed, given the inability of mdadm to find the info on sdd. We wound up just remaking the RAID, and rebuilding the mirror over the weekend.

mark

Les Mikesell

7:06 p.m.

On Tue, Sep 2, 2014 at 1:33 PM, m.roth@5-cent.us wrote:

...

...
I haven't used raw devices as members so I'm not sure I understand the scenario. However, I thought that devices over 2TB would not auto assemble so you would have to manually add the ARRAY entry for /dev/md4 in /etc/mdadm.conf containing /dev/sdd and /dev/sdc for the system to recognize it at bootup.

Yeah. That was one thing I discovered. Silly me, assuming that the mdadm would create an entry in /etc/mdadm.conf. And this is not something I do more than once or twice a year, and haven't this year (we have a good number of Dells with a PERC 7, or then there's the JetStors....).

With devices < 2TB and MBR's, you don't need /etc/mdadm.conf - the kernel just figures it all out at boot time, regardless of the disk location or detection order. I have sometimes set up single partitions as 'broken' raids just to get that autodetect/mount effect on boxes where the disks are moved around a lot because it worked long before distos started mounting by unique labels or uuids. And I miss it on big drives.

...

...
But sdd _should_ have the correct data - it just isn't being detected as a raid member. I think with smaller devices - or at least devices with smaller partitions and FD type in the MBR it would have worked automatically with the kernel autodetect.

Both had a GPT on them, just no partitions. And that's the thing that really puzzles me - why mdadm couldn't find the RAID info on /dev/sdd, which *had* been just fine.

Anyway, the upshot was my manager was rather annoyed - I *should* have pulled sdc, and put in a new one, and just let that go. I still think it would have failed, given the inability of mdadm to find the info on sdd. We wound up just remaking the RAID, and rebuilding the mirror over the weekend.

I think either adding the ARRAY entry in /etc/mdadm.conf and rebooting or some invocation of mdadm could have revived /dev/md4 with /dev/sdd (and the contents you wanted) active.

-- Les Mikesell lesmikesell@gmail.com

m.roth＠5-cent.us

8:03 p.m.

Les Mikesell wrote:

...

On Tue, Sep 2, 2014 at 1:33 PM, m.roth@5-cent.us wrote:

<snip>

...

...
...
But sdd _should_ have the correct data - it just isn't being detected as a raid member. I think with smaller devices - or at least devices with smaller partitions and FD type in the MBR it would have worked automatically with the kernel autodetect.

<snip>

...

...
Anyway, the upshot was my manager was rather annoyed - I *should* have pulled sdc, and put in a new one, and just let that go. I still think it would have failed, given the inability of mdadm to find the info on sdd. We wound up just remaking the RAID, and rebuilding the mirror over the weekend.

I think either adding the ARRAY entry in /etc/mdadm.conf and rebooting or some invocation of mdadm could have revived /dev/md4 with /dev/sdd (and the contents you wanted) active.

Tried that. No joy.

mark

GKH

8:07 p.m.

Hmm, very bad idea to create a file system on the raw disk. The swap type partitions know how to handle this well but for a partition with data why take the chance that something will write the MBR there. That's what happenned I bet.

The procedure is this:

Create a partition 1 on the new unused drive (use all space). That leaves space for the MBR.

Create the mirror on this new drive, use "--missing" mkfs -t ext4 on the new disk

mdadm -D /dev/mdx (where x is the number of the mirror) should show 1 drive on the mirror. cat /proc/mdstat should show same thing.

Now, copy the data from the old disk to the new "mirrored" disk.

When done, reboot. Yes, reboot now. If something had gone very wrong you would not lose your data and you would see your data on both disks.

Ok, you rebooted, you see the data on both disks. Do fdisk the old disk. Create one partition, add the partition to the mirror, wait for sync to end and reboot again.

You should be able to see your data mirrored.

..That's the right way!

Ok, so you did not do this and something tried to write the MBR in the raw disk and you lost all your data???

Well maybe.

Try using fsck with alternate superblocks. The first superblock should be 32.

Good luck dude.

GKH.

m.roth＠5-cent.us

8:16 p.m.

GKH wrote:

...

Hmm, very bad idea to create a file system on the raw disk. The swap type partitions know how to handle this well but for a partition with data why take the chance that something will write the MBR there. That's what happenned I bet.

I know how to do this - it *is* how I started. Also, I guess you didn't read the original post - these are 4TB drives, so no MBR, GPT only. <snip> And my manager has taken a fancy to raw drives; not sure why.

mark

Lamar Owen

10:41 p.m.

On 09/02/2014 04:16 PM, m.roth@5-cent.us wrote:

...

I know how to do this - it *is* how I started. Also, I guess you didn't read the original post - these are 4TB drives, so no MBR, GPT only.

<snip> And my manager has taken a fancy to raw drives; not sure why.

Wait just a minute. How can you use the raw device but still have a GPT on it? That doesn't seem right, to have a GUID Partition Table but no partitions.

Joseph L. Casale

11:36 p.m.

...

Wait just a minute. How can you use the raw device but still have a GPT on it? That doesn't seem right, to have a GUID Partition Table but no partitions.

Have you never deleted all the partitions on a disk under any scheme before?

Lamar Owen

4 Sep 4 Sep

2:59 p.m.

On 09/02/2014 07:36 PM, Joseph L. Casale wrote:

...

...
Wait just a minute. How can you use the raw device but still have a GPT on it? That doesn't seem right, to have a GUID Partition Table but no partitions.

Have you never deleted all the partitions on a disk under any scheme before?

Of course; but in the context of an MD RAID device with member devices as raw disks I would not expect a partition table of any kind, GPT or otherwise. Whether it can be there or not is not my point; it's whether it's expected or not.

Now, for C6 the default RAID superblock is version 1.2; but if you were to create a version 1.1 superblock it would go on the very first sector of the raw device, and would overwrite the partition table. (The 1.2 superblock goes 4K in from the first sector; prior to 1.1 the superblock went to the last sector of the drive).

Of course, ext4 at least for block group 0 skips the first 1k bytes.....

Les Mikesell

5:35 p.m.

On Thu, Sep 4, 2014 at 9:59 AM, Lamar Owen lowen@pari.edu wrote:

...

...
Of course; but in the context of an MD RAID device with member devices as raw disks I would not expect a partition table of any kind, GPT or otherwise. Whether it can be there or not is not my point; it's whether it's expected or not.

Now, for C6 the default RAID superblock is version 1.2; but if you were to create a version 1.1 superblock it would go on the very first sector of the raw device, and would overwrite the partition table. (The 1.2 superblock goes 4K in from the first sector; prior to 1.1 the superblock went to the last sector of the drive).

Does that mean autodetection/assembly would be possible with 1.2 but not 1.1? I've always considered that to be one of the best features of software raid.

...

Of course, ext4 at least for block group 0 skips the first 1k bytes.....

How does this mesh with the ability to mount a RAID1 member as a normal non-raid partition? I've done that for data recovery but never knew if it was safe to write that way.

-- Les Mikesell lesmikesell@gmail.com

Lamar Owen

7:56 p.m.

On 09/04/2014 01:35 PM, Les Mikesell wrote:

...

On Thu, Sep 4, 2014 at 9:59 AM, Lamar Owen lowen@pari.edu wrote:

...
.. (The 1.2 superblock goes 4K in from the first sector; prior to 1.1 the superblock went to the last sector of the drive).

Does that mean autodetection/assembly would be possible with 1.2 but not 1.1? I've always considered that to be one of the best features of software raid.

Don't know. Try it and let us know.....

...

...
Of course, ext4 at least for block group 0 skips the first 1k bytes.....

How does this mesh with the ability to mount a RAID1 member as a normal non-raid partition? I've done that for data recovery but never knew if it was safe to write that way.

Good question; try it and let us know. I have never tried it.

m.roth＠5-cent.us

8:30 p.m.

New subject: *very* ugly mdadm issue [Solved, badly]

Ok, folks,

Here's the answer: making a software RAID on a bare drive with no GPT works fine. If it has a GPT, and no partition, it fails on reboot, even with an /etc/mdadm.conf.

I've proved this: first, I created the array on the bare drive, rebooted, and /dev/md0 was there; then, I used parted to create a gpt, then the array, reboot, no md0, even with mdadm --assemble, even with /etc/mdadm.conf. finally, I got rid of the disk label (parted to make an msdos label, the zeroing out the beginning of the disk), and again made the RAID on the bare drives, reboot, and md0 is there.

So that's what killed me. Admins, take heed....

mark

Tom Bishop

8:42 p.m.

New subject: *very* ugly mdadm issue [Solved, badly]

On Thu, Sep 4, 2014 at 3:30 PM, m.roth@5-cent.us wrote:

...

Ok, folks,

Here's the answer: making a software RAID on a bare drive with no GPT works fine. If it has a GPT, and no partition, it fails on reboot, even with an /etc/mdadm.conf.

I've proved this: first, I created the array on the bare drive, rebooted, and /dev/md0 was there; then, I used parted to create a gpt, then the array, reboot, no md0, even with mdadm --assemble, even with /etc/mdadm.conf. finally, I got rid of the disk label (parted to make an msdos label, the zeroing out the beginning of the disk), and again made the RAID on the bare drives, reboot, and md0 is there.

So that's what killed me. Admins, take heed....
mark
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Thanks for the followup.

m.roth＠5-cent.us

8:48 p.m.

New subject: *very* ugly mdadm issue [Solved, badly]

Tom Bishop wrote:

...

On Thu, Sep 4, 2014 at 3:30 PM, m.roth@5-cent.us wrote:

...
Here's the answer: making a software RAID on a bare drive with no GPT works fine. If it has a GPT, and no partition, it fails on reboot, even with an /etc/mdadm.conf.

I've proved this: first, I created the array on the bare drive, rebooted, and /dev/md0 was there; then, I used parted to create a gpt, then the array, reboot, no md0, even with mdadm --assemble, even with /etc/mdadm.conf. finally, I got rid of the disk label (parted to make an msdos label, the zeroing out the beginning of the disk), and again made the RAID on the bare drives, reboot, and md0 is there.

So that's what killed me. Admins, take heed....

Thanks for the followup.

:)

Yup. My manager had asked me to figure it out, and as it just so happened our director wound up needing more space on another machine, that I could reboot as I will, I could do the testing.

mark, watch out for the gotchas

Richard Zimmerman

5 Sep 5 Sep

1:18 p.m.

New subject: Bare drive RAID question, was RE: *very* ugly mdadm issue [Solved, badly]

...

-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of m.roth@5-cent.us Sent: Thursday, September 04, 2014 4:31 PM To: CentOS mailing list Subject: Re: [CentOS] *very* ugly mdadm issue [Solved, badly]

Ok, folks,

Here's the answer: making a software RAID on a bare drive with no GPT works fine. If it has a GPT, and no partition, it fails on reboot, even with an /etc/mdadm.conf.

I've proved this: first, I created the array on the bare drive, rebooted, and /dev/md0 was there; then, I used parted to create a gpt, then the array, reboot, no md0, even with mdadm --assemble, even with /etc/mdadm.conf. finally, I got rid of the disk label (parted to make an msdos label, the zeroing out the beginning of the disk), and again made the RAID on the bare drives, reboot, and md0 is there.

So that's what killed me. Admins, take heed.... mark

If you all would mind...

Until I read this thread, I've never heard of building RAIDs on bare metal drives. I'm assuming no partition table, just a disk label?

What is the advantage of doing this?

Many thanks,

Richard

_______________________________________________ --- Richard Zimmerman Systems / Network Administrator River Bend Hose Specialty, Inc. 1111 S Main Street South Bend, IN 46601-3337 (574) 233-1133 (574) 280-7284 Fax

Warren Young

2:01 p.m.

New subject: Bare drive RAID question, was RE: *very* ugly mdadm issue [Solved, badly]

On 9/5/2014 07:18, Richard Zimmerman wrote:

...

Until I read this thread, I've never heard of building RAIDs on bare metal drives. I'm assuming no partition table, just a disk label?

I don't know what you mean by a disk label. BSD uses that term for their alternative to MBR and GPT partition tables, but I think you must mean something else.

In Linux terms, we're talking about /dev/sda, rather than /dev/sda1, for example.

...

What is the advantage of doing this?

The whole idea of a RAID is that you're going to take a bunch of member disks and combine them into a larger entity. On top of *that* you may wish to create partitions, LVMs, or whatever.

So the real question is, why do you believe you need to make each RAID member a *partition* on a disk, instead of just take over the entire disk? Unless you're going to do something insane like:

/dev/md0 /dev/sda1 /dev/sdb1 /dev/md1 /dev/sda2 /dev/sdb2

...you're not going to get any direct utility from composing a RAID from partitions on the RAID member drives.

(Why "insane?" Because now any I/O to /dev/md1 interferes with I/O to /dev/md0, because you only have two head assemblies, so you've wiped out the speed advantages you get from RAID-0 or -1.)

There are ancillary benefits, like the fact that a RAID element that spans the entire partition is inherently 4k-aligned. When there is a partition table taking space at the start of the first cylinder, you have to leave the rest of that cylinder unused in order to get back into 4k alignment.

The only downside I saw in this thread is that when you pull such a disk out of a Linux software RAID and put it into another machine, you don't see a clear Linux partition table, so you might think it is an empty drive. But the same thing is true of a hardware RAID member, too.

Scott Robbins

2:18 p.m.

New subject: Bare drive RAID question, was RE: *very* ugly mdadm issue [Solved, badly]

On Fri, Sep 05, 2014 at 08:01:05AM -0600, Warren Young wrote:

...

On 9/5/2014 07:18, Richard Zimmerman wrote:

...
Until I read this thread, I've never heard of building RAIDs on bare metal drives. I'm assuming no partition table, just a disk label?

I don't know what you mean by a disk label. BSD uses that term for their alternative to MBR and GPT partition tables, but I think you must mean something else.

There is another method of disk naming, I think it gained popularity between /dev/sda and UUID, that was something like LABEL=swap or LABEL=root. I haven't used it in years so don't remember the details.

As for building on bare metal, as it stands, during installation, the RedHat way is you make, for example, a /boot, / and swap, then make the same partitions on drive 2 (for a RAID-1). You then create 3 RAID devices, one for each partition. Then, when a drive fails, you have three devices to worry about. In contrast, as mentioned on the CentOS wiki, one can just do a normal install, then mirror the drive with just one RAID device. I'm guessing that is what was meant.

-- Scott Robbins PGP keyID EB3467D6 ( 1B48 077D 66F6 9DB0 FDC2 A409 FA54 EB34 67D6 ) gpg --keyserver pgp.mit.edu --recv-keys EB3467D6

Warren Young

2:43 p.m.

New subject: Bare drive RAID question, was RE: *very* ugly mdadm issue [Solved, badly]

On 9/5/2014 08:18, Scott Robbins wrote:

...

On Fri, Sep 05, 2014 at 08:01:05AM -0600, Warren Young wrote:

...
I don't know what you mean by a disk label.

There is another method of disk naming, I think it gained popularity between /dev/sda and UUID, that was something like LABEL=swap or LABEL=root.

That's a property of the filesystem, not the disk or partition. See tune2fs/mke2fs -L.

...

As for building on bare metal, as it stands, during installation, the RedHat way is you make, for example, a /boot, / and swap, then make the same partitions on drive 2 (for a RAID-1).

The system disk must be partitioned because you need a boot loader, which means you're already going to be taking over some space on cylinder 0, so the arguments in favor of raw disks have already gone out the window.

I'm only talking about data volumes here.

Bob Marcan

9:26 p.m.

New subject: Bare drive RAID question, was RE: *very* ugly mdadm issue [Solved, badly]

On Fri, 5 Sep 2014 10:18:13 -0400 Scott Robbins scottro@nyc.rr.com wrote:

...

On Fri, Sep 05, 2014 at 08:01:05AM -0600, Warren Young wrote:

...
On 9/5/2014 07:18, Richard Zimmerman wrote:

...
Until I read this thread, I've never heard of building RAIDs on bare metal drives. I'm assuming no partition table, just a disk label?

When the disk dies, the replacement disk must be exactly the same size. Been there, done that. I allways make partition few GB smaller than the physical size. It's not always possible to get the same type of the replacement disk.

My 2c, Bob

Dan Hyatt

10:47 p.m.

New subject: Bare drive RAID question, was RE: *very* ugly mdadm issue [Solved, badly]

I was under the understanding that you CAN put in larger drives, BUT they format identically to the smaller drive. There are some exceptions I do not remember what.

D. On 9/5/2014 4:26 PM, Bob Marcan wrote:

...

On Fri, 5 Sep 2014 10:18:13 -0400 Scott Robbins scottro@nyc.rr.com wrote:

...
On Fri, Sep 05, 2014 at 08:01:05AM -0600, Warren Young wrote:

...
On 9/5/2014 07:18, Richard Zimmerman wrote:

...
Until I read this thread, I've never heard of building RAIDs on bare metal drives. I'm assuming no partition table, just a disk label?

When the disk dies, the replacement disk must be exactly the same size. Been there, done that. I allways make partition few GB smaller than the physical size. It's not always possible to get the same type of the replacement disk.

My 2c, Bob

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Keith Keller

11:22 p.m.

New subject: Bare drive RAID question, was RE: *very* ugly mdadm issue [Solved, badly]

On 2014-09-05, Bob Marcan bob.marcan@gmail.com wrote:

...

When the disk dies, the replacement disk must be exactly the same size. Been there, done that. I allways make partition few GB smaller than the physical size. It's not always possible to get the same type of the replacement disk.

I thought that newer versions of md could accomodate small size differences (just as hardware RAID controllers can). I know I have an md array with at least two different drive models (though IIRC all from the same manufacturer).

--keith

-- kkeller@wombat.san-francisco.ca.us

Les Mikesell

4:54 p.m.

New subject: Bare drive RAID question, was RE: *very* ugly mdadm issue [Solved, badly]

On Fri, Sep 5, 2014 at 9:01 AM, Warren Young warren@etr-usa.com wrote:

...

So the real question is, why do you believe you need to make each RAID member a *partition* on a disk, instead of just take over the entire disk? Unless you're going to do something insane like:
/dev/md0
   /dev/sda1
   /dev/sdb1
/dev/md1
   /dev/sda2
   /dev/sdb2
...you're not going to get any direct utility from composing a RAID from partitions on the RAID member drives.

(Why "insane?" Because now any I/O to /dev/md1 interferes with I/O to /dev/md0, because you only have two head assemblies, so you've wiped out the speed advantages you get from RAID-0 or -1.)

Well, to exactly the same extent that putting multiple partitions and filesystems on a non-raid drive is insane for those reasons... And you generally can't avoid this if you want to boot from the same disks where you store data with mirroring. And the very nice up side is that you can now pull your drives out, put them in different bays, add others, etc. and the system will still assemble the right partitions into the right raid devices and mount them correctly. Or at least it would in the < 2TB days...

...

There are ancillary benefits, like the fact that a RAID element that spans the entire partition is inherently 4k-aligned. When there is a partition table taking space at the start of the first cylinder, you have to leave the rest of that cylinder unused in order to get back into 4k alignment.

Isn't it possible to duplicate that when you make a single partition and use the partition as a raid member? And get autoassembly if it is less than 2TB? I consider it a real loss that autoassembly doesn't work on large drives. People will almost certainly lose data in some scenarios as a result.

...

The only downside I saw in this thread is that when you pull such a disk out of a Linux software RAID and put it into another machine, you don't see a clear Linux partition table, so you might think it is an empty drive. But the same thing is true of a hardware RAID member, too.

I've always liked software raid1 just because you can recover the data from any single drive on any machine with a similar interface. But, I guess that's why we have backups...

-- Les Mikesell lesmikesell@gmail.com

Richard Zimmerman

6:32 p.m.

New subject: Bare drive RAID question, was RE: *very* ugly mdadm issue [Solved, badly]

-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Les Mikesell Sent: Friday, September 05, 2014 12:54 PM To: CentOS mailing list Subject: Re: [CentOS] Bare drive RAID question, was RE: *very* ugly mdadm issue [Solved, badly]

On Fri, Sep 5, 2014 at 9:01 AM, Warren Young warren@etr-usa.com wrote:

...

So the real question is, why do you believe you need to make each RAID member a *partition* on a disk, instead of just take over the entire disk? Unless you're going to do something insane like:
/dev/md0
   /dev/sda1
   /dev/sdb1
/dev/md1
   /dev/sda2
   /dev/sdb2
...you're not going to get any direct utility from composing a RAID from partitions on the RAID member drives.

(Why "insane?" Because now any I/O to /dev/md1 interferes with I/O to /dev/md0, because you only have two head assemblies, so you've wiped out the speed advantages you get from RAID-0 or -1.)

...

There are ancillary benefits, like the fact that a RAID element that spans the entire partition is inherently 4k-aligned. When there is a partition table taking space at the start of the first cylinder, you have to leave the rest of that cylinder unused in order to get back into 4k alignment.

...

The only downside I saw in this thread is that when you pull such a disk out of a Linux software RAID and put it into another machine, you don't see a clear Linux partition table, so you might think it is an empty drive. But the same thing is true of a hardware RAID member, too.

I've always liked software raid1 just because you can recover the data from any single drive on any machine with a similar interface. But, I guess that's why we have backups...

I just wanted to say thank you for the replies.... Wow, I got schooled today (in a good way). Much learning going on in my corner of the world...

Richard

--- Richard Zimmerman Systems / Network Administrator River Bend Hose Specialty, Inc. 1111 S Main Street South Bend, IN 46601-3337 (574) 233-1133 (574) 280-7284 Fax

Stephen Harris

7:02 p.m.

New subject: Bare drive RAID question, was RE: *very* ugly mdadm issue [Solved, badly]

On Fri, Sep 05, 2014 at 08:01:05AM -0600, Warren Young wrote:

...

So the real question is, why do you believe you need to make each RAID member a *partition* on a disk, instead of just take over the entire disk? Unless you're going to do something insane like:

For me I have things like sda1 sdb2 sdc3 sdd4 and I align the partitions to the physical slot.

This makes it easier to see what is the failed disk; "sdc3 has fallen out of the array; that's the disk in slot 3".

Because today's sdc may be tomorrow's sdf depending on any additional disks that have been added or kernel device discover order changes or whatever.

-- rgds Stephen

Valeri Galtsev

7:56 p.m.

New subject: Bare drive RAID question, was RE: *very* ugly mdadm issue [Solved, badly]

On Fri, September 5, 2014 2:02 pm, Stephen Harris wrote:

...

For me I have things like sda1 sdb2 sdc3 sdd4 and I align the partitions to the physical slot.

What do you do when it comes to 5,... (as MBR only supports 4 primary partitions ;-) ?

...

This makes it easier to see what is the failed disk; "sdc3 has fallen out of the array; that's the disk in slot 3".

Because today's sdc may be tomorrow's sdf depending on any additional disks that have been added or kernel device discover order changes or whatever.

That's why I like the [block] device naming strictly derived from topology of machine (e.g. FreeBSD does it that way), then you know, which physical drive (or other block device, e.g. attached hardware RAID) a device /dev/da[x] is. I remember hassle when Linux switched numbering of network interfaces eth0, eth1,... from order the are "detected" in to reverse order (which probably stemmed from pushing them into stack then pulling them back) - or was it other way around?

Valeri

++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++

m.roth＠5-cent.us

8:05 p.m.

New subject: Bare drive RAID question, was RE: *very* ugly mdadm issue [Solved, badly]

Valeri Galtsev wrote:

...

On Fri, September 5, 2014 2:02 pm, Stephen Harris wrote:

...
For me I have things like sda1 sdb2 sdc3 sdd4 and I align the partitions to the physical slot.

What do you do when it comes to 5,... (as MBR only supports 4 primary partitions ;-) ?

Then you make something an extended partition.

...

...
This makes it easier to see what is the failed disk; "sdc3 has fallen out of the array; that's the disk in slot 3".

Because today's sdc may be tomorrow's sdf depending on any additional disks that have been added or kernel device discover order changes or whatever.

That's why I like the [block] device naming strictly derived from topology of machine (e.g. FreeBSD does it that way), then you know, which physical drive (or other block device, e.g. attached hardware RAID) a device /dev/da[x] is. I remember hassle when Linux switched numbering of network

How? I've had them move around on a non-RAID m/b (for example, a drive fails, and you put one in an unused bay, and then you've got, say, sda, sdc and sdd, no sdb, until reboot), and even then, it's *still* a guessing game as to whether hot-swap bay upper left, lower left, upper right lower right are sda, sdb, sdc, sdd, or sda, sdc, sdb, sdd, or, for the fun one, lower right is sda....

mark

SilverTip257

6 Sep 6 Sep

12:22 p.m.

New subject: Bare drive RAID question, was RE: *very* ugly mdadm issue [Solved, badly]

On Fri, Sep 5, 2014 at 4:05 PM, m.roth@5-cent.us wrote:

...

...
That's why I like the [block] device naming strictly derived from

topology

...
of machine (e.g. FreeBSD does it that way), then you know, which physical drive (or other block device, e.g. attached hardware RAID) a device /dev/da[x] is. I remember hassle when Linux switched numbering of network

How? I've had them move around on a non-RAID m/b (for example, a drive fails, and you put one in an unused bay, and then you've got, say, sda, sdc and sdd, no sdb, until reboot), and even then, it's *still* a guessing game as to whether hot-swap bay upper left, lower left, upper right lower right are sda, sdb, sdc, sdd, or sda, sdc, sdb, sdd, or, for the fun one, lower right is sda....

Removing the device from the SCSI subsystem helps alleviate this problem.

By "logically" removing the failed device, you free up /dev/sdb (that just failed) to then use that again for the replacement drive. In all my cases the new drive goes in the same slot and through experience the new drive's device name has been the same as what I removed.

There are RH docs on the commands for removing/adding from the SCSI subsystem. Matter of fact, I posted links to the RH docs on this topic a little while back.

[0] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/htm...

-- ---~~.~~--- Mike // SilverTip257 //

Keith Keller

5 Sep 5 Sep

8:36 p.m.

New subject: Bare drive RAID question, was RE: *very* ugly mdadm issue [Solved, badly]

On 2014-09-05, Richard Zimmerman rzimmerman@riverbendhose.com wrote:

...

Until I read this thread, I've never heard of building RAIDs on bare metal drives. I'm assuming no partition table, just a disk label?

What is the advantage of doing this?

For just my purposes, the advantage is that I can treat my md RAID drives in the same way I treat my hardware RAID drives, which are bare drives. It's just easier conceptually for me to not have to remember to create a partition. As Warren said, this is for data volumes, not for arrays that need to host /boot or /.

--keith

-- kkeller@wombat.san-francisco.ca.us

Keith Keller

3 Sep 3 Sep

5:22 a.m.

On 2014-09-02, m.roth@5-cent.us m.roth@5-cent.us wrote:

...

And my manager has taken a fancy to raw drives; not sure why.

Some reasons have already been cited in this thread. No reasons are given, but the author of md and mdadm apparently prefers raw drives too.

https://raid.wiki.kernel.org/index.php/Partition_Types

I think the take-home message from that document is: "There is no right answer - you can choose."

--keith

-- kkeller@wombat.san-francisco.ca.us

Keith Keller

2 Sep 2 Sep

6:05 p.m.

On 2014-09-02, Warren Young warren@etr-usa.com wrote:

...

On 8/29/2014 14:26, m.roth@5-cent.us wrote:

...
Note that we did this on *raw*, unpartitioned drives (not my idea).

Nothing wrong with that, particularly with big "midden" volumes like this one.

Indeed--hardware RAID controllers don't partition their drives before creating their arrays.

...

I don't see why it matters that your /dev/sdd partitioning is different from your /dev/sdc. When you told it to blast /dev/sdc with the contents of /dev/sdd, it should have copied the partitioning, too.

If it was an rsync, then partitioning would not have been copied, just the filesystem contents.

As for the OP, while this certainly doesn't seem to be a problem with mdadm specifically (or linux md in general), the folks on the linux RAID mailing list may be able to help you recover (I, too, seldom use linux md, and do not know it well enough to be helpful).

http://vger.kernel.org/vger-lists.html#linux-raid

--keith

-- kkeller@wombat.san-francisco.ca.us

Warren Young

6:14 p.m.

On 9/2/2014 12:05, Keith Keller wrote:

...

On 2014-09-02, Warren Young warren@etr-usa.com wrote:

...
On 8/29/2014 14:26, m.roth@5-cent.us wrote:

...
Note that we did this on *raw*, unpartitioned drives (not my idea).

Nothing wrong with that, particularly with big "midden" volumes like this one.

Indeed--hardware RAID controllers don't partition their drives before creating their arrays.

It also has the side benefit that you don't have to worry about 4K partition alignment. Starting with 0 means you're always aligned.

Mark Tinberg

6:20 p.m.

On Aug 29, 2014, at 3:26 PM, m.roth@5-cent.us wrote:

...

mdadm -E /dev/sdd

Just to confirm that /dev/sdd is the new disk after you rebooted, the right model and serial number, drive letters are assigned based on the order the block devices are detected so can change on reboot.

— Mark Tinberg mtinberg@wisc.edu

4111

Age (days ago)

4119

Last active (days ago)

discuss@lists.centos.org

33 comments

16 participants

tags (0)

participants (16)

Bob Marcan
Dan Hyatt
GKH
Joseph L. Casale
Keith Keller
Lamar Owen
Les Mikesell
m.roth＠5-cent.us
Mark Tinberg
Richard Zimmerman
Scott Robbins
SilverTip257
Stephen Harris
Tom Bishop
Valeri Galtsev
Warren Young