Software RAID Level 1, smartd and changing dev numbers

List overview All Threads
Download

newer

older

Re: [CentOS] Any update on 5.6 / 6?

Re: [CentOS] Authentication...

James Smallacombe

16 Feb 2011 16 Feb '11

5 p.m.

We have about 50 CentOS servers with software RAID level 1 (mirroring). Each week, we swap out one of the drives (the one in the second of four hot-swap bays, only the first two of which contain drives) on each server and take them offsite for safekeeping.

The problem is, the kernel seemingly randomly switches between /dev/sdb and /dev/sdc for these devices. This makes the process slower by requiring more manual input where a script(s) could otherwise suffice.

It also confuses smartd, which AFAIK, needs the correct device names to report accurately.

Ideally, we'd like to force the OS at some level to always see these devices as /dev/sda and /dev/sdb. If not, is there at least some way to configure smartd to be "smart" and recognize which devices are in use?

TIA,

Show replies by date

Robert Heller

16 Feb 16 Feb

5:30 p.m.

At Wed, 16 Feb 2011 12:00:27 -0500 (EST) CentOS mailing list centos@centos.org wrote:

...

We have about 50 CentOS servers with software RAID level 1 (mirroring). Each week, we swap out one of the drives (the one in the second of four hot-swap bays, only the first two of which contain drives) on each server and take them offsite for safekeeping.

The problem is, the kernel seemingly randomly switches between /dev/sdb and /dev/sdc for these devices. This makes the process slower by requiring more manual input where a script(s) could otherwise suffice.

I'm assuming these are actually SATA disks with a controller that supports hot-swap.

What I think is happening is that the kernel retains some 'memory' of the pulled drive (say /dev/sdb) and when the fresh drive is installed, a new dev file is created (/dev/sdc). Eventually, /dev/sdb is forgotten by the time the next 'swap' and /dev/sdb is assigned to the next fresh disk.

Question: are you always swapping in a *new* disk each week or re-inserting the disk from the previous week?

...

It also confuses smartd, which AFAIK, needs the correct device names to report accurately.

Ideally, we'd like to force the OS at some level to always see these devices as /dev/sda and /dev/sdb. If not, is there at least some way to configure smartd to be "smart" and recognize which devices are in use?

The cure might be that you need to do a reboot to properly rescan the disks.

...

TIA, _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- Robert Heller -- 978-544-6933 / heller@deepsoft.com Deepwoods Software -- http://www.deepsoft.com/ () ascii ribbon campaign -- against html e-mail /\ www.asciiribbon.org -- against proprietary attachments

James Smallacombe

5:38 p.m.

...

At Wed, 16 Feb 2011 12:00:27 -0500 (EST) CentOS mailing list centos@centos.org wrote:

...
We have about 50 CentOS servers with software RAID level 1 (mirroring). Each week, we swap out one of the drives (the one in the second of four hot-swap bays, only the first two of which contain drives) on each server and take them offsite for safekeeping.

The problem is, the kernel seemingly randomly switches between /dev/sdb and /dev/sdc for these devices. This makes the process slower by requiring more manual input where a script(s) could otherwise suffice.

I'm assuming these are actually SATA disks with a controller that supports hot-swap.

Correct.

...

What I think is happening is that the kernel retains some 'memory' of the pulled drive (say /dev/sdb) and when the fresh drive is installed, a new dev file is created (/dev/sdc). Eventually, /dev/sdb is forgotten by the time the next 'swap' and /dev/sdb is assigned to the next fresh disk.

Interesting...one would think that this behavior would be consistent across all servers then, but it isn't. Most accept the same dev, /dev/sdb, but some assign /dev/sdc. Is there a way to just disable /dev/sdc and force the kernel to use /dev/sdb every time?

...

Question: are you always swapping in a *new* disk each week or re-inserting the disk from the previous week?

It's a rotation, so re-inserting from the previous week.

...

...
It also confuses smartd, which AFAIK, needs the correct device names to report accurately.

Ideally, we'd like to force the OS at some level to always see these devices as /dev/sda and /dev/sdb. If not, is there at least some way to configure smartd to be "smart" and recognize which devices are in use?

The cure might be that you need to do a reboot to properly rescan the disks.

Ugh. Thanks for your reponse.

Robert Heller

6:41 p.m.

At Wed, 16 Feb 2011 12:38:53 -0500 (EST) CentOS mailing list centos@centos.org wrote:

...

...
At Wed, 16 Feb 2011 12:00:27 -0500 (EST) CentOS mailing list centos@centos.org wrote:

...
We have about 50 CentOS servers with software RAID level 1 (mirroring). Each week, we swap out one of the drives (the one in the second of four hot-swap bays, only the first two of which contain drives) on each server and take them offsite for safekeeping.

The problem is, the kernel seemingly randomly switches between /dev/sdb and /dev/sdc for these devices. This makes the process slower by requiring more manual input where a script(s) could otherwise suffice.

I'm assuming these are actually SATA disks with a controller that supports hot-swap.

Correct.

...
What I think is happening is that the kernel retains some 'memory' of the pulled drive (say /dev/sdb) and when the fresh drive is installed, a new dev file is created (/dev/sdc). Eventually, /dev/sdb is forgotten by the time the next 'swap' and /dev/sdb is assigned to the next fresh disk.

Interesting...one would think that this behavior would be consistent across all servers then, but it isn't. Most accept the same dev, /dev/sdb, but some assign /dev/sdc. Is there a way to just disable /dev/sdc and force the kernel to use /dev/sdb every time?

It could be something as simple as 'timing'. Like how long it takes for the kernel to get around to re-cycling the device objects. I would also look real closely at the *exact* order of tasks (mdadm -f ..., mdadm -r ..) and how much time there is between these tasks and how 'busy' the specific machine is. It could be that the disk is being pulled too soon or not enough time is left between the 'fail' and the 'remove' -- that is the kernel is still doing something with the disk (eg has some 'unfinished business') and is thus not releasing the device object. It is likely that the amount of time needed for things to 'settle' will vary based on things like system load and just what the system is doing (eg a database server will be different from a file server which will be different from a DNS server, etc.). And it might also depend on the size of the disks and the type of controller (and the driver it uses).

...

...
Question: are you always swapping in a *new* disk each week or re-inserting the disk from the previous week?

It's a rotation, so re-inserting from the previous week.

Umm. It has been stated elsewhere, but RAID is not really a substistute for proper backups.

...

...
...
It also confuses smartd, which AFAIK, needs the correct device names to report accurately.

Ideally, we'd like to force the OS at some level to always see these devices as /dev/sda and /dev/sdb. If not, is there at least some way to configure smartd to be "smart" and recognize which devices are in use?

The cure might be that you need to do a reboot to properly rescan the disks.

Ugh. Thanks for your reponse.

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

James Smallacombe

6:47 p.m.

...

At Wed, 16 Feb 2011 12:38:53 -0500 (EST) CentOS mailing list centos@centos.org wrote:

...
...
At Wed, 16 Feb 2011 12:00:27 -0500 (EST) CentOS mailing list centos@centos.org wrote:

...
We have about 50 CentOS servers with software RAID level 1

(mirroring).

...
...
Each week, we swap out one of the drives (the one in the second of

four

...
...
hot-swap bays, only the first two of which contain drives) on each server and take them offsite for safekeeping.

The problem is, the kernel seemingly randomly switches between

/dev/sdb

...
...
and /dev/sdc for these devices. This makes the process slower by requiring more manual input where a script(s) could otherwise

suffice.

...
I'm assuming these are actually SATA disks with a controller that supports hot-swap.

Correct.

...
What I think is happening is that the kernel retains some 'memory' of the pulled drive (say /dev/sdb) and when the fresh drive is installed,

a

...
new dev file is created (/dev/sdc). Eventually, /dev/sdb is forgotten by the time the next 'swap' and /dev/sdb is assigned to the next fresh disk.

Interesting...one would think that this behavior would be consistent across all servers then, but it isn't. Most accept the same dev, /dev/sdb, but some assign /dev/sdc. Is there a way to just disable /dev/sdc and force the kernel to use /dev/sdb every time?

It could be something as simple as 'timing'. Like how long it takes for the kernel to get around to re-cycling the device objects. I would also look real closely at the *exact* order of tasks (mdadm -f ..., mdadm -r ..) and how much time there is between these tasks and how 'busy' the specific machine is. It could be that the disk is being pulled too soon or not enough time is left between the 'fail' and the 'remove' -- that is the kernel is still doing something with the disk (eg has some 'unfinished business') and is thus not releasing the device object. It is likely that the amount of time needed for things to 'settle' will vary based on things like system load and just what the system is doing (eg a database server will be different from a file server which will be different from a DNS server, etc.). And it might also depend on the size of the disks and the type of controller (and the driver it uses).

Interesting...I will discuss with the tech who swaps the drives out.

...

...
...
Question: are you always swapping in a *new* disk each week or re-inserting the disk from the previous week?

It's a rotation, so re-inserting from the previous week.

Umm. It has been stated elsewhere, but RAID is not really a substistute for proper backups.

I agree. Proper archiving is also in place. This system is also in place, to allow for a faster recovery in the event of other hardware failure. It has been useful many times already.

Brian Mathis

7:19 p.m.

On Wed, Feb 16, 2011 at 1:41 PM, Robert Heller heller@deepsoft.com wrote:

...

Umm. It has been stated elsewhere, but RAID is not really a substistute for proper backups.

[...]

...

-- Robert Heller -- 978-544-6933 / heller@deepsoft.com Deepwoods Software -- http://www.deepsoft.com/

I know this is the popular thing to say, but it should not be said blindly. This case is an example of exactly where it is not appropriate to say such a thing. The OP is clearly using the mirroring ability of RAID1, then breaking the mirror to move the copy offsite. In fact, this is exactly an implementation of a "proper backups".

For further information, when people say "RAID is not backup," they are referring to the situation where people rely solely on RAID to cover all aspects of backup. They simply don't think through all the scenarios of when you need a backup, such as when files are deleted, filesystem corruption, fire/flood, virus, etc... People using RAID like this don't have tapes, don't have offsites, and rely on all data sitting within the machine to be safe. Again, that's clearly not how it's being used here.

Keith Roberts

6:46 p.m.

On Wed, 16 Feb 2011, James Smallacombe wrote:

...

To: CentOS mailing list centos@centos.org From: James Smallacombe james@sicom.com Subject: Re: [CentOS] Software RAID Level 1, smartd and changing dev numbers

...
At Wed, 16 Feb 2011 12:00:27 -0500 (EST) CentOS mailing list centos@centos.org wrote:

...
The problem is, the kernel seemingly randomly switches between /dev/sdb and /dev/sdc for these devices. This makes the process slower by requiring more manual input where a script(s) could otherwise suffice.

I'm assuming these are actually SATA disks with a controller that supports hot-swap.

Correct.

...
What I think is happening is that the kernel retains some 'memory' of the pulled drive (say /dev/sdb) and when the fresh drive is installed, a new dev file is created (/dev/sdc). Eventually, /dev/sdb is forgotten by the time the next 'swap' and /dev/sdb is assigned to the next fresh disk.

Interesting...one would think that this behavior would be consistent across all servers then, but it isn't. Most accept the same dev, /dev/sdb, but some assign /dev/sdc. Is there a way to just disable /dev/sdc and force the kernel to use /dev/sdb every time?

Can you identify any differences in the machines that don't re-assign the dev files, and the machines that do?

Is this anything to do with UUID's on the drives/partitions?

What parts do you have on the RAID drives?

How are the drives setup as RAID - as bare drives/partitions, or via LVG?

Keith

----------------------------------------------------------------- Websites: http://www.karsites.net http://www.php-debuggers.net http://www.raised-from-the-dead.org.uk

All email addresses are challenge-response protected with TMDA [http://tmda.net] -----------------------------------------------------------------

yonatan pingle

6:27 p.m.

partprobe as root should refresh the kernel partition / disk cache instead of a reboot.

On Wed, Feb 16, 2011 at 7:30 PM, Robert Heller heller@deepsoft.com wrote:

...

At Wed, 16 Feb 2011 12:00:27 -0500 (EST) CentOS mailing list centos@centos.org wrote:

...
We have about 50 CentOS servers with software RAID level 1 (mirroring). Each week, we swap out one of the drives (the one in the second of four hot-swap bays, only the first two of which contain drives) on each server and take them offsite for safekeeping.

The problem is, the kernel seemingly randomly switches between /dev/sdb and /dev/sdc for these devices. This makes the process slower by requiring more manual input where a script(s) could otherwise suffice.

I'm assuming these are actually SATA disks with a controller that supports hot-swap.

What I think is happening is that the kernel retains some 'memory' of the pulled drive (say /dev/sdb) and when the fresh drive is installed, a new dev file is created (/dev/sdc). Eventually, /dev/sdb is forgotten by the time the next 'swap' and /dev/sdb is assigned to the next fresh disk.

Question: are you always swapping in a *new* disk each week or re-inserting the disk from the previous week?

...
It also confuses smartd, which AFAIK, needs the correct device names to report accurately.

Ideally, we'd like to force the OS at some level to always see these devices as /dev/sda and /dev/sdb. If not, is there at least some way to configure smartd to be "smart" and recognize which devices are in use?

The cure might be that you need to do a reboot to properly rescan the disks.

...
TIA, _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- Robert Heller -- 978-544-6933 / heller@deepsoft.com Deepwoods Software -- http://www.deepsoft.com/ () ascii ribbon campaign -- against html e-mail /\ www.asciiribbon.org -- against proprietary attachments

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- Best Regards, Yonatan Pingle RHCT | RHCSA | CCNA1

compdoc

6:09 p.m.

...

The problem is, the kernel seemingly randomly switches between /dev/sdb and /dev/sdc for these devices.

I use the UUID in fstab rather than '/dev/sda', etc

Les Mikesell

6:15 p.m.

On 2/16/2011 12:09 PM, compdoc wrote:

...

...
The problem is, the kernel seemingly randomly switches between /dev/sdb and /dev/sdc for these devices.

I use the UUID in fstab rather than '/dev/sda', etc

In this case it would be something you give to mdadm to add a device back to a set. And you'd have to know which one in a rotation was coming back to which machine, something you wouldn't otherwise have to track since it is going to overwrite everything with the re-sync anyway.

-- Les Mikesell lesmikesell@gmail.com

James Smallacombe

6:43 p.m.

...

On 2/16/2011 12:09 PM, compdoc wrote:

...
...
The problem is, the kernel seemingly randomly switches between /dev/sdb and /dev/sdc for these devices.

I use the UUID in fstab rather than '/dev/sda', etc

In this case it would be something you give to mdadm to add a device back to a set. And you'd have to know which one in a rotation was coming back to which machine, something you wouldn't otherwise have to track since it is going to overwrite everything with the re-sync anyway.

We do track (and physically label) that, because there are drives of different size/manufacturer/geometry on different servers, so that would be ok.

However, we're not set up for UUIDs, the fstab just shows /dev/md0, etc. Perhaps this is the answer for us, but I'll have to look into how tricky it would be to migrate roughly 50 production servers.

Thanks again!

Robert Heller

6:56 p.m.

At Wed, 16 Feb 2011 13:43:16 -0500 (EST) CentOS mailing list centos@centos.org wrote:

...

...
On 2/16/2011 12:09 PM, compdoc wrote:

...
...
The problem is, the kernel seemingly randomly switches between /dev/sdb and /dev/sdc for these devices.

I use the UUID in fstab rather than '/dev/sda', etc

In this case it would be something you give to mdadm to add a device back to a set. And you'd have to know which one in a rotation was coming back to which machine, something you wouldn't otherwise have to track since it is going to overwrite everything with the re-sync anyway.

We do track (and physically label) that, because there are drives of different size/manufacturer/geometry on different servers, so that would be ok.

Thought question: is there any *pattern* to the seemingly randomness of the /dev/sdb vs. /dev/sdc business? Do disks of certain sizes/manufacturer/geometry do the switch more or less often?

...

However, we're not set up for UUIDs, the fstab just shows /dev/md0, etc. Perhaps this is the answer for us, but I'll have to look into how tricky it would be to migrate roughly 50 production servers.

Thanks again! _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

compdoc

8:09 p.m.

...

However, we're not set up for UUIDs, the fstab just shows /dev/md0, etc.

I mentioned it because I recently installed and set up servers with ubuntu 10.04 and fedora 14, while I was waiting for C6. Using the UUID is the default now.

I also found it works fine in C5.5 - you just substitute the UUID for the /dev and format the fstab line properly.

However I use raid cards, and I don't know if mdadm can work with the UUID in centos. Sorry if it doesn't...

Tom H

18 Feb 18 Feb

2:05 a.m.

On Wed, Feb 16, 2011 at 3:09 PM, compdoc compdoc@hotrodpc.com wrote:

...

...
However, we're not set up for UUIDs, the fstab just shows /dev/md0, etc.

I mentioned it because I recently installed and set up servers with ubuntu 10.04 and fedora 14, while I was waiting for C6. Using the UUID is the default now.

In Ubuntu and Fedora, UUID's the default replacement of "/dev/sdXY" devices, but md and lvm devices are referred to in more "traditional" fstab stanzas.

Scott Robbins

3:22 a.m.

On Thu, Feb 17, 2011 at 09:05:41PM -0500, Tom H wrote:

...

On Wed, Feb 16, 2011 at 3:09 PM, compdoc compdoc@hotrodpc.com wrote:

...
...
In Ubuntu and Fedora, UUID's the default replacement of "/dev/sdXY" devices, but md and lvm devices are referred to in more "traditional" fstab stanzas.

Possibly worth mentioning that it does sometimes break--at least in Fedora, I can think of a few times it's happened to me, and a few more times where it's happened on their forums, where an update would then fail to boot, saying, unable to locate root (or something similar) which could be fixed by changing the UUID to /dev/sdwhatever

-- Scott Robbins PGP keyID EB3467D6 ( 1B48 077D 66F6 9DB0 FDC2 A409 FA54 EB34 67D6 ) gpg --keyserver pgp.mit.edu --recv-keys EB3467D6 Xander: Isn't that what they called The Slayer? Willow: Buffy, ohh scary. Xander: Someone has to talk to her people. That name is striking fear in nobody's hearts.

Scott Robbins

3:25 a.m.

On Thu, Feb 17, 2011 at 10:22:44PM -0500, Scott Robbins wrote:

...

On Thu, Feb 17, 2011 at 09:05:41PM -0500, Tom H wrote:

...
On Wed, Feb 16, 2011 at 3:09 PM, compdoc compdoc@hotrodpc.com wrote:

...
...
In Ubuntu and Fedora, UUID's the default replacement of "/dev/sdXY" devices, but md and lvm devices are referred to in more "traditional" fstab stanzas.

Possibly worth mentioning that it does sometimes break--at least in Fedora, I can think of a few times it's happened to me, and a few more times where it's happened on their forums, where an update would then fail to boot, saying, unable to locate root (or something similar) which could be fixed by changing the UUID to /dev/sdwhatever

To reply to my own email, I should add that I don't want to spread FUD. Most of the time UUID works as it should, and can be handy for things such as installing from a USB, where the system thinks the USB is /dev/sda. I just wanted to point out that on VERY RARE occasions, it has failed to work.

-- Scott Robbins PGP keyID EB3467D6 ( 1B48 077D 66F6 9DB0 FDC2 A409 FA54 EB34 67D6 ) gpg --keyserver pgp.mit.edu --recv-keys EB3467D6 Xander: And they say that young people don't learn anything in high school nowadays, but I've learned to be afraid.

Tom H

3:46 p.m.

On Thu, Feb 17, 2011 at 10:22 PM, Scott Robbins scottro@nyc.rr.com wrote:

...

On Thu, Feb 17, 2011 at 09:05:41PM -0500, Tom H wrote:

...
On Wed, Feb 16, 2011 at 3:09 PM, compdoc compdoc@hotrodpc.com wrote:

...
...
In Ubuntu and Fedora, UUID's the default replacement of "/dev/sdXY" devices, but md and lvm devices are referred to in more "traditional" fstab stanzas.

Possibly worth mentioning that it does sometimes break--at least in Fedora, I can think of a few times it's happened to me, and a few more times where it's happened on their forums, where an update would then fail to boot, saying, unable to locate root (or something similar) which could be fixed by changing the UUID to /dev/sdwhatever

I've only seen two cases of UUIDs "breaking".

1. You put a second Linux install on the box, mkswap's run during the installation process and the UUID of the swap partition's modified so that the initial install cannot recognize its swap partition.

2. There's more than one filesystem signature in the MBR and mount's therefore (understandably) confused.

5284

Age (days ago)

5286

Last active (days ago)

discuss@lists.centos.org

16 comments

9 participants

tags (0)

participants (9)

Brian Mathis
compdoc
James Smallacombe
Keith Roberts
Les Mikesell
Robert Heller
Scott Robbins
Tom H
yonatan pingle