Problem with mdadm, raid1 and automatically adds any disk to raid

List overview All Threads
Download

newer

older

how to use 2 Nvidia Cards in one...

hplips

Jobst Schmalenbach

25 Feb 2019 25 Feb '19

5:01 a.m.

Hi.

CENTOS 7.6.1810, fresh install - use this as a base to create/upgrade new/old machines.

I was trying to setup two disks as a RAID1 array, using these lines

mdadm --create --verbose /dev/md0 --level=0 --raid-devices=2 /dev/sdb1 /dev/sdc1 mdadm --create --verbose /dev/md1 --level=0 --raid-devices=2 /dev/sdb2 /dev/sdc2 mdadm --create --verbose /dev/md2 --level=0 --raid-devices=2 /dev/sdb3 /dev/sdc3

then I did a lsblk and realized that I used --level=0 instead of --level=1 (spelling mistake) The SIZE was reported double as I created a striped set by mistake, yet I wanted the mirrored.

Here starts my problem, I cannot get rid of the /dev/mdX no matter what I do (try to do).

I tried to delete the MDX, I removed the disks by failing them, then removing each array md0, md1 and md2. I also did

dd if=/dev/zero of=/dev/sdX bs=512 seek=$(($(blockdev --getsz /dev/sdX)-1024)) count=1024 dd if=/dev/zero of=/dev/sdX bs=512 count=1024 mdadm --zero-superblock /dev/sdX

Then I wiped each partition of the drives using fdisk.

Now every time I start fdisk to setup a new set of partitions I see in /var/log/messages as soon as I hit "W" in fdisk:

Feb 25 15:38:32 webber systemd: Started Timer to wait for more drives before activating degraded array md2.. Feb 25 15:38:32 webber systemd: Started Timer to wait for more drives before activating degraded array md1.. Feb 25 15:38:32 webber systemd: Started Timer to wait for more drives before activating degraded array md0.. Feb 25 15:38:32 webber kernel: md/raid1:md0: active with 1 out of 2 mirrors Feb 25 15:38:32 webber kernel: md0: detected capacity change from 0 to 5363466240 Feb 25 15:39:02 webber systemd: Created slice system-mdadm\x2dlast\x2dresort.slice. Feb 25 15:39:02 webber systemd: Starting Activate md array md1 even though degraded... Feb 25 15:39:02 webber systemd: Starting Activate md array md2 even though degraded... Feb 25 15:39:02 webber kernel: md/raid1:md1: active with 0 out of 2 mirrors Feb 25 15:39:02 webber kernel: md1: failed to create bitmap (-5) Feb 25 15:39:02 webber mdadm: mdadm: failed to start array /dev/md/1: Input/output error Feb 25 15:39:02 webber systemd: mdadm-last-resort@md1.service: main process exited, code=exited, status=1/FAILURE

I check /proc/mdstat and sure enough, there it is trying to assemble an Array I DID NOT TOLD IT TO DO.

I do NOT WANT this to happen, it creates the same "SHIT" (the incorrect array) over and over again (systemd frustration). So I tried to delete them again, wiped them again, killed processes, wiped disks.

No matter what I do as soon as I hit the "w" in fdisk systemd tries to assemble the array again without letting me to decide what to do.

Help! Jobst

-- windoze 98: <n.> useless extension to a minor patch release for 32-bit extensions and a graphical shell for a 16-bit patch to an 8-bit operating system originally coded for a 4-bit microprocessor, written by a 2-bit company that can't stand for 1 bit of competition! | |0| | Jobst Schmalenbach, General Manager | | |0| Barrett & Sales Essentials |0|0|0| +61 3 9533 0000, POBox 277, Caulfield South, 3162, Australia

Show replies by date

Simon Matter

25 Feb 25 Feb

5:50 a.m.

New subject: Problem with mdadm, raid1 and automatically adds any disk to raid

...

Hi.

CENTOS 7.6.1810, fresh install - use this as a base to create/upgrade new/old machines.

I was trying to setup two disks as a RAID1 array, using these lines

mdadm --create --verbose /dev/md0 --level=0 --raid-devices=2 /dev/sdb1 /dev/sdc1 mdadm --create --verbose /dev/md1 --level=0 --raid-devices=2 /dev/sdb2 /dev/sdc2 mdadm --create --verbose /dev/md2 --level=0 --raid-devices=2 /dev/sdb3 /dev/sdc3

then I did a lsblk and realized that I used --level=0 instead of --level=1 (spelling mistake) The SIZE was reported double as I created a striped set by mistake, yet I wanted the mirrored.

Here starts my problem, I cannot get rid of the /dev/mdX no matter what I do (try to do).

I tried to delete the MDX, I removed the disks by failing them, then removing each array md0, md1 and md2. I also did

dd if=/dev/zero of=/dev/sdX bs=512 seek=$(($(blockdev --getsz /dev/sdX)-1024)) count=1024

I didn't check but are you really sure you're cleaning up the end of the drive? Maybe you should clean the end of every partition first because metadata may be written there.

...

dd if=/dev/zero of=/dev/sdX bs=512 count=1024 mdadm --zero-superblock /dev/sdX

Then I wiped each partition of the drives using fdisk.

Now every time I start fdisk to setup a new set of partitions I see in /var/log/messages as soon as I hit "W" in fdisk:

Feb 25 15:38:32 webber systemd: Started Timer to wait for more drives before activating degraded array md2.. Feb 25 15:38:32 webber systemd: Started Timer to wait for more drives before activating degraded array md1.. Feb 25 15:38:32 webber systemd: Started Timer to wait for more drives before activating degraded array md0.. Feb 25 15:38:32 webber kernel: md/raid1:md0: active with 1 out of 2 mirrors Feb 25 15:38:32 webber kernel: md0: detected capacity change from 0 to 5363466240 Feb 25 15:39:02 webber systemd: Created slice system-mdadm\x2dlast\x2dresort.slice. Feb 25 15:39:02 webber systemd: Starting Activate md array md1 even though degraded... Feb 25 15:39:02 webber systemd: Starting Activate md array md2 even though degraded... Feb 25 15:39:02 webber kernel: md/raid1:md1: active with 0 out of 2 mirrors Feb 25 15:39:02 webber kernel: md1: failed to create bitmap (-5) Feb 25 15:39:02 webber mdadm: mdadm: failed to start array /dev/md/1: Input/output error Feb 25 15:39:02 webber systemd: mdadm-last-resort@md1.service: main process exited, code=exited, status=1/FAILURE

I check /proc/mdstat and sure enough, there it is trying to assemble an Array I DID NOT TOLD IT TO DO.

I do NOT WANT this to happen, it creates the same "SHIT" (the incorrect array) over and over again (systemd frustration).

Noooooo, you're wiping it wrong :-)

...

So I tried to delete them again, wiped them again, killed processes, wiped disks.

No matter what I do as soon as I hit the "w" in fdisk systemd tries to assemble the array again without letting me to decide what to do.

<don't try this at home> Nothing easier than that, just terminate systemd while doing the disk management and restart it after you're done. BTW, PID is 1. </don't try this at home>

Seriously, there is certainly some systemd unit you may be able to deactivate before doing such things. However, I don't know which one it is.

I've been fighting a similar crap: On HPE servers when running cciss_vol_status through the disk monitoring system, whenever cciss_vol_status is run and reports hardware RAID status, systemd scans all partition tables and tries to detect LVM2 devices and whatever. Kernel log is just filled with useless scans and I have no idea how to get rid of it. Nice new systemd world.

Regards, Simon

Jobst Schmalenbach

6:06 a.m.

New subject: Problem with mdadm, raid1 and automatically adds any disk to raid

On Mon, Feb 25, 2019 at 06:50:11AM +0100, Simon Matter via CentOS (centos@centos.org) wrote:

...

...
Hi.

dd if=/dev/zero of=/dev/sdX bs=512 seek=$(($(blockdev --getsz /dev/sdX)-1024)) count=1024

I didn't check but are you really sure you're cleaning up the end of the drive? Maybe you should clean the end of every partition first because metadata may be written there.

Mmmmhhh, not sure. I run fdisk on it, basically re-creating everything from the start.

The "trying to re-create the MDX's" happens when I use "w" in fdisk. As soon as I hit the "w" it starts re-creating the MDx!

Thats the annoying part.

[snip]

...

...
No matter what I do as soon as I hit the "w" in fdisk systemd tries to assemble the array again without letting me to decide what to do.

<don't try this at home>

I am not ;-), it's @ work.

Jobst

-- You seem (in my (humble) opinion (which doesn.t mean much)) to be (or possibly could be) more of a Lisp programmer (but I could be (and probably am) wrong) | |0| | Jobst Schmalenbach, General Manager | | |0| Barrett & Sales Essentials |0|0|0| +61 3 9533 0000, POBox 277, Caulfield South, 3162, Australia

tony＠softins.co.uk

11:23 a.m.

In article 20190225050144.GA5984@button.barrett.com.au, Jobst Schmalenbach jobst@barrett.com.au wrote:

...

Hi.

CENTOS 7.6.1810, fresh install - use this as a base to create/upgrade new/old machines.

I was trying to setup two disks as a RAID1 array, using these lines

mdadm --create --verbose /dev/md0 --level=0 --raid-devices=2 /dev/sdb1 /dev/sdc1 mdadm --create --verbose /dev/md1 --level=0 --raid-devices=2 /dev/sdb2 /dev/sdc2 mdadm --create --verbose /dev/md2 --level=0 --raid-devices=2 /dev/sdb3 /dev/sdc3

then I did a lsblk and realized that I used --level=0 instead of --level=1 (spelling mistake) The SIZE was reported double as I created a striped set by mistake, yet I wanted the mirrored.

Here starts my problem, I cannot get rid of the /dev/mdX no matter what I do (try to do).

I tried to delete the MDX, I removed the disks by failing them, then removing each array md0, md1 and md2. I also did

dd if=/dev/zero of=/dev/sdX bs=512 seek=$(($(blockdev --getsz /dev/sdX)-1024)) count=1024 dd if=/dev/zero of=/dev/sdX bs=512 count=1024 mdadm --zero-superblock /dev/sdX

Then I wiped each partition of the drives using fdisk.

The superblock is a property of each partition, not just of the whole disk.

So I believe you need to do:

mdadm --zero-superblock /dev/sdb1 mdadm --zero-superblock /dev/sdb2 mdadm --zero-superblock /dev/sdb3 mdadm --zero-superblock /dev/sdc1 mdadm --zero-superblock /dev/sdc2 mdadm --zero-superblock /dev/sdc3

Cheers Tony

-- Tony Mountifield Work: tony@softins.co.uk - http://www.softins.co.uk Play: tony@mountifield.org - http://tony.mountifield.org

Jobst Schmalenbach

26 Feb 26 Feb

10:59 p.m.

New subject: Problem with mdadm, raid1 and automatically adds any disk to raid

On Mon, Feb 25, 2019 at 11:23:12AM +0000, Tony Mountifield (tony@softins.co.uk) wrote:

...

In article 20190225050144.GA5984@button.barrett.com.au, Jobst Schmalenbach jobst@barrett.com.au wrote:

...
Hi. CENTOS 7.6.1810, fresh install - use this as a base to create/upgrade new/old machines.

I was trying to setup two disks as a RAID1 array, using these lines

mdadm --create --verbose /dev/md0 --level=0 --raid-devices=2 /dev/sdb1 /dev/sdc1 mdadm --create --verbose /dev/md1 --level=0 --raid-devices=2 /dev/sdb2 /dev/sdc2 mdadm --create --verbose /dev/md2 --level=0 --raid-devices=2 /dev/sdb3 /dev/sdc3

then I did a lsblk and realized that I used --level=0 instead of --level=1 (spelling mistake)

So I believe you need to do:

mdadm --zero-superblock /dev/sdb1 mdadm --zero-superblock /dev/sdb2

I actually deleted the partitions, at first using fdisk than parted (read a few ideas on the internet). Also from the second try onwards I also changed the partition sizes, filesystems. Also I tried with one disk missing (either sda or sdb).

Jobst

-- If proof denies faith, and uncertainty denies proof, then uncertainty is proof of God's existence. | |0| | Jobst Schmalenbach, General Manager | | |0| Barrett & Sales Essentials |0|0|0| +61 3 9533 0000, POBox 277, Caulfield South, 3162, Australia

Gordon Messmer

1:24 a.m.

New subject: Problem with mdadm, raid1 and automatically adds any disk to raid

On 2/24/19 9:01 PM, Jobst Schmalenbach wrote:

...

I tried to delete the MDX, I removed the disks by failing them, then removing each array md0, md1 and md2. I also did

dd if=/dev/zero of=/dev/sdX bs=512 seek=$(($(blockdev --getsz /dev/sdX)-1024)) count=1024

Clearing the initial sectors doesn't do anything to clear the data in the partitions. They don't become blank just because you remove them.

Partition your drives, and then use "wipefs -a /dev/sd{b,c}{1,2,3}"

...

I do NOT WANT this to happen, it creates the same "SHIT" (the incorrect array) over and over again (systemd frustration).

What makes you think this has *anything* to do with systemd? Bitching about systemd every time you hit a problem isn't helpful. Don't.

Simon Matter

5:54 a.m.

New subject: Problem with mdadm, raid1 and automatically adds any disk to raid

...

On 2/24/19 9:01 PM, Jobst Schmalenbach wrote:

...
I tried to delete the MDX, I removed the disks by failing them, then removing each array md0, md1 and md2. I also did

dd if=/dev/zero of=/dev/sdX bs=512 seek=$(($(blockdev --getsz /dev/sdX)-1024)) count=1024

Clearing the initial sectors doesn't do anything to clear the data in the partitions. They don't become blank just because you remove them.

Partition your drives, and then use "wipefs -a /dev/sd{b,c}{1,2,3}"

...
I do NOT WANT this to happen, it creates the same "SHIT" (the incorrect array) over and over again (systemd frustration).

What makes you think this has *anything* to do with systemd? Bitching about systemd every time you hit a problem isn't helpful. Don't.

If it's not systemd, who else does it? Can you elaborate, please?

Regards, Simon

Christofer C. Bell

2:10 p.m.

New subject: Problem with mdadm, raid1 and automatically adds any disk to raid

On Mon, Feb 25, 2019 at 11:54 PM Simon Matter via CentOS centos@centos.org wrote:

...

...
What makes you think this has *anything* to do with systemd? Bitching about systemd every time you hit a problem isn't helpful. Don't.

If it's not systemd, who else does it? Can you elaborate, please?

I'll wager it's the mdadm.service unit. You're seeing systemd in the log because systemd has a unit loaded that's managing your md devices. The package mdadm installs these files:

/usr/lib/systemd/system/mdadm.service /usr/lib/systemd/system/mdmonitor-takeover.service /usr/lib/systemd/system/mdmonitor.service

Perhaps if you turn off these services, you'll be able to manage your disks without interference. I do not use mdadm on my system, I'm just looking at the content of the rpm file on rpmfind.net. That said, systemd isn't the culprit here. It's doing what it's supposed to (starting a managed service on demand).

I do concede the logs are confusing. For example, this appears in my logs:

Feb 26 05:10:03 demeter systemd: Starting This service automatically renews any certbot certificates found...

While there is no indication in the log, this is being started by:

[cbell@demeter log]$ systemctl status certbot-renew.timer ● certbot-renew.timer - This is the timer to set the schedule for automated renewals Loaded: loaded (/usr/lib/systemd/system/certbot-renew.timer; enabled; vendor preset: disabled) Active: active (waiting) since Thu 2019-02-21 17:54:43 CST; 4 days ago

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable. [cbell@demeter log]$

And you can see the log message through the service unit using journalctl:

[cbell@demeter log]$ journalctl -u certbot-renew.service | grep "This service" | tail -1 Feb 26 05:10:07 demeter.home systemd[1]: Started This service automatically renews any certbot certificates found. [cbell@demeter log]$

You can see there's no indication in /var/log/messages that it's the certbot-renewal service (timer) that's logging this. So it's easy to misinterpret where the messages are coming from, like your mdadm messages. Perhaps having the journal indicate which service or timer is logging a message is a feature request for Lennart!

Hope this helps!

-- Chris "If you wish to make an apple pie from scratch, you must first invent the Universe." -- Carl Sagan

Simon Matter

2:37 p.m.

New subject: Problem with mdadm, raid1 and automatically adds any disk to raid

...

On Mon, Feb 25, 2019 at 11:54 PM Simon Matter via CentOS centos@centos.org wrote:

...
...
What makes you think this has *anything* to do with systemd? Bitching about systemd every time you hit a problem isn't helpful. Don't.

If it's not systemd, who else does it? Can you elaborate, please?

I'll wager it's the mdadm.service unit. You're seeing systemd in the log because systemd has a unit loaded that's managing your md devices. The package mdadm installs these files:

/usr/lib/systemd/system/mdadm.service /usr/lib/systemd/system/mdmonitor-takeover.service /usr/lib/systemd/system/mdmonitor.service

I'm not sure what your box runs but it's at least not CentOS 7.

CentOS 7 contains these md related units: /usr/lib/systemd/system/mdadm-grow-continue@.service /usr/lib/systemd/system/mdadm-last-resort@.service /usr/lib/systemd/system/mdadm-last-resort@.timer /usr/lib/systemd/system/mdmonitor.service /usr/lib/systemd/system/mdmon@.service

The only md related daemon running besides systemd is mdadm. I've never seen such behavior with EL6 and the mdadm there so I don't think it will ever do such things.

The message produced comes from mdadm-last-resort@.timer. Whatever triggers it it's either systemd or something like systemd-udevd.

How is it not systemd doing it? Such things didn't happen with pre systemd distributions.

Regards, Simon

Jobst Schmalenbach

10:52 p.m.

New subject: Problem with mdadm, raid1 and automatically adds any disk to raid

On Tue, Feb 26, 2019 at 03:37:34PM +0100, Simon Matter via CentOS (centos@centos.org) wrote:

...

...
On Mon, Feb 25, 2019 at 11:54 PM Simon Matter via CentOS centos@centos.org wrote:

...
...
What makes you think this has *anything* to do with systemd? Bitching about systemd every time you hit a problem isn't helpful. Don't.

If it's not systemd, who else does it? Can you elaborate, please?

How is it not systemd doing it? Such things didn't happen with pre systemd distributions.

I just had a hardware failure of a Raid controller (well they fail thats why we have backups). This means putting the drives onto a new controller I have to (re-) format them.

In Centos6 times this took me under an hour to fix this, mostly due to the rsyncing time. Yesterday it took me over 6 hours to move a system.

Jobst

-- Why don't sheep shrink when it rains? | |0| | Jobst Schmalenbach, General Manager | | |0| Barrett & Sales Essentials |0|0|0| +61 3 9533 0000, POBox 277, Caulfield South, 3162, Australia

Gordon Messmer

27 Feb 27 Feb

7:16 a.m.

New subject: Problem with mdadm, raid1 and automatically adds any disk to raid

On 2/26/19 6:37 AM, Simon Matter via CentOS wrote:

...

How is it not systemd doing it? Such things didn't happen with pre systemd distributions.

The following log is from a CentOS 6 system. I created RAID devices on two drives. I then stopped the RAID devices and 'dd' over the beginning of the drive. I then re-partition the drives.

At that point, the RAID devices auto-assemble. They actually partially fail, below, but the behavior that this thread discusses absolutely is not systemd-specific.

What you're seeing is that you're wiping the partition, but not the RAID information inside the partitions. When you remove and then re-create the partitions, you're hot-adding RAID components to the system. They auto-assemble, as they have (or should have) for a long time. It's probably more reliable under newer revisions, but this is long-standing behavior.

The problem isn't systemd. The problem is that you're not wiping what you think you're wiping. You need to use "wipefs -a" on each partition that's a RAID component first, and then "wipefs -a" on the drive itself to get rid of the partition table.

[root@localhost ~]# dd if=/dev/zero of=/dev/vdb bs=512 count=1024 1024+0 records in 1024+0 records out 524288 bytes (524 kB) copied, 0.0757563 s, 6.9 MB/s [root@localhost ~]# dd if=/dev/zero of=/dev/vdc bs=512 count=1024 1024+0 records in 1024+0 records out 524288 bytes (524 kB) copied, 0.0385181 s, 13.6 MB/s [root@localhost ~]# kpartx -a /dev/vdb Warning: Disk has a valid GPT signature but invalid PMBR. Assuming this disk is *not* a GPT disk anymore. Use gpt kernel option to override. Use GNU Parted to correct disk. [root@localhost ~]# kpartx -a /dev/vdc Warning: Disk has a valid GPT signature but invalid PMBR. Assuming this disk is *not* a GPT disk anymore. Use gpt kernel option to override. Use GNU Parted to correct disk. [root@localhost ~]# cat /proc/mdstat Personalities : [raid1] unused devices: <none> [root@localhost ~]# parted /dev/vdb -s mklabel gpt mkpart primary ext4 1M 200M mkpart primary ext4 200M 1224M mkpart primary ext4 1224M 100% [root@localhost ~]# parted /dev/vdc -s mklabel gpt mkpart primary ext4 1M 200M mkpart primary ext4 200M 1224M mkpart primary ext4 1224M 100% [root@localhost ~]# cat /proc/mdstat Personalities : [raid1] unused devices: <none> [root@localhost ~]# cat /proc/mdstat Personalities : [raid1] md2 : active raid1 vdc3[1] vdb3[0] 19775360 blocks super 1.0 [2/2] [UU]

md1 : active raid1 vdb2[0] 999360 blocks super 1.0 [2/1] [U_]

md0 : active raid1 vdb1[0] 194496 blocks super 1.0 [2/1] [U_]

unused devices: <none>

Jobst Schmalenbach

26 Feb 26 Feb

10:41 p.m.

New subject: Problem with mdadm, raid1 and automatically adds any disk to raid

On Mon, Feb 25, 2019 at 05:24:44PM -0800, Gordon Messmer (gordon.messmer@gmail.com) wrote:

...

On 2/24/19 9:01 PM, Jobst Schmalenbach wrote:

[snip]

...

What makes you think this has *anything* to do with systemd? Bitching about systemd every time you hit a problem isn't helpful. Don't.

Becasue of this.

Feb 25 15:38:32 webber systemd: Started Timer to wait for more drives before activating degraded array md2..

-- When you want a computer system that works, just choose Linux; When you want a computer system that works, just, choose Microsoft. | |0| | Jobst Schmalenbach, General Manager | | |0| Barrett & Sales Essentials |0|0|0| +61 3 9533 0000, POBox 277, Caulfield South, 3162, Australia

2425

Age (days ago)

2427

Last active (days ago)

discuss@lists.centos.org

11 comments

5 participants

tags (0)

participants (5)

Christofer C. Bell
Gordon Messmer
Jobst Schmalenbach
Simon Matter
tony＠softins.co.uk