> At Wed, 16 Feb 2011 12:38:53 -0500 (EST) CentOS mailing list > <centos at centos.org> wrote: > >> >> > At Wed, 16 Feb 2011 12:00:27 -0500 (EST) CentOS mailing list >> > <centos at centos.org> wrote: >> > >> >> >> >> We have about 50 CentOS servers with software RAID level 1 >> (mirroring). >> >> Each week, we swap out one of the drives (the one in the second of >> four >> >> hot-swap bays, only the first two of which contain drives) on each >> >> server >> >> and take them offsite for safekeeping. >> >> >> >> The problem is, the kernel seemingly randomly switches between >> /dev/sdb >> >> and /dev/sdc for these devices. This makes the process slower by >> >> requiring more manual input where a script(s) could otherwise >> suffice. >> > >> > I'm assuming these are actually SATA disks with a controller that >> > supports hot-swap. >> >> Correct. >> >> > What I think is happening is that the kernel retains some 'memory' of >> > the pulled drive (say /dev/sdb) and when the fresh drive is installed, >> a >> > new dev file is created (/dev/sdc). Eventually, /dev/sdb is forgotten >> > by the time the next 'swap' and /dev/sdb is assigned to the next fresh >> > disk. >> >> Interesting...one would think that this behavior would be consistent >> across all servers then, but it isn't. Most accept the same dev, >> /dev/sdb, but some assign /dev/sdc. Is there a way to just disable >> /dev/sdc and force the kernel to use /dev/sdb every time? > > It could be something as simple as 'timing'. Like how long it takes for > the kernel to get around to re-cycling the device objects. I would also > look real closely at the *exact* order of tasks (mdadm -f ..., mdadm -r > ..) and how much time there is between these tasks and how 'busy' the > specific machine is. It could be that the disk is being pulled too soon > or not enough time is left between the 'fail' and the 'remove' -- that > is the kernel is still doing something with the disk (eg has some > 'unfinished business') and is thus not releasing the device object. It > is likely that the amount of time needed for things to 'settle' will > vary based on things like system load and just what the system is doing > (eg a database server will be different from a file server which will be > different from a DNS server, etc.). And it might also depend on the > size of the disks and the type of controller (and the driver it uses). Interesting...I will discuss with the tech who swaps the drives out. >> > Question: are you always swapping in a *new* disk each week or >> > re-inserting the disk from the previous week? >> >> It's a rotation, so re-inserting from the previous week. > > Umm. It has been stated elsewhere, but RAID is not really a substistute > for proper backups. I agree. Proper archiving is also in place. This system is also in place, to allow for a faster recovery in the event of other hardware failure. It has been useful many times already.