[CentOS] Strange problem with LVM, device-mapper, and software RAID...

Fri Jul 22 19:08:11 UTC 2011
Robert Heller <heller at deepsoft.com>

Running on a up-to-date CentOS 5.6 x86_64 machine:

[heller at ravel ~]$ uname -a
Linux ravel.60villagedrive 2.6.18-238.19.1.el5 #1 SMP Fri Jul 15 07:31:24 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

with a TYAN Computer Corp S4881 motherboard, which has a nVidia 4
channel SATA controller.  It also has a Marvell Technology Group Ltd.
88SX7042 PCI-e 4-port SATA-II (rev 02).

This machine has a 120G SATA system disk on the motherboard controller
as the system disk:

[heller at ravel ~]$ sudo /sbin/fdisk -l /dev/sda

Disk /dev/sda: 120.0 GB, 120034123776 bytes
255 heads, 63 sectors/track, 14593 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1         125     1004031   83  Linux
/dev/sda2             126       14593   116214210   8e  Linux LVM

/dev/sda1 is /boot and /dev/sda2 is a LVM Volume group (named "RavelSystem")
with two logical volumes (named "root" and "swap"), containing the root
file system and a base 1G swap area.  So far so good.

On the Marvell controller are 4 1.5GB disks arranged as a RAID10 array:

[heller at ravel ~]$ cat /proc/mdstat
Personalities : [raid10] 
md1 : active raid10 sdg1[3] sdf1[2] sde1[1] sdd1[0]
      2930270208 blocks 512K chunks 2 far-copies [4/4] [UUUU]
      
unused devices: <none>
[heller at ravel ~]$ sudo /sbin/mdadm --detail /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Tue Jun 21 19:04:19 2011
     Raid Level : raid10
     Array Size : 2930270208 (2794.52 GiB 3000.60 GB)
  Used Dev Size : 1465135616 (1397.26 GiB 1500.30 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Fri Jul 22 14:37:04 2011
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : far=2
     Chunk Size : 512K

           UUID : 7a257206:a2b7c9d9:c5d004ae:2bdf6faf
         Events : 0.578

    Number   Major   Minor   RaidDevice State
       0       8       49        0      active sync   /dev/sdd1
       1       8       65        1      active sync   /dev/sde1
       2       8       81        2      active sync   /dev/sdf1
       3       8       97        3      active sync   /dev/sdg1

This RAID10 array has a second LVM Volume Group (named "RavelData2") on
it, also with two logical volumes (named "data" and "largeswap"), a
large data file system and a 16gig swap area.

We are getting a strange message from device mapper on boot up:

[heller at ravel ~]$ dmesg | grep device-mapper -A 5 -B 5
sdg: Write Protect is off
sdg: Mode Sense: 00 3a 00 00
SCSI device sdg: drive cache: write back
 sdg: sdg1
sd 7:0:0:0: Attached scsi disk sdg
device-mapper: uevent: version 1.0.3
device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised:
dm-devel at redhat.com
device-mapper: dm-raid45: initialized v0.2594l
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
SELinux:  Disabled at runtime.
SELinux:  Unregistering netfilter hooks
type=1404 audit(1311358429.773:2): selinux=0 auid=4294967295
ses=4294967295
--
md: bind<sdg1>
md: running: <sdg1><sdf1><sde1><sdd1>
md: raid10 personality registered for level 10
raid10: raid set md1 active with 4 out of 4 devices
md: ... autorun DONE.
device-mapper: multipath: version 1.0.6 loaded
device-mapper: table: 253:6: linear: dm-linear: Device lookup failed
device-mapper: ioctl: error adding target to table
device-mapper: ioctl: device doesn't appear to be in the dev hash table.
EXT3 FS on dm-5, internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 180 seconds

'253:6' happens to be the large swap space:

[heller at ravel ~]$ dir -l /dev/mapper/
total 0
crw------- 1 root root  10, 63 Jul 22 14:13 control
brw-rw---- 1 root disk 253,  2 Jul 22 14:13 nvidia_biaabdaf
brw-rw---- 1 root disk 253,  3 Jul 22 14:13 nvidia_biaabdafp1
brw-rw---- 1 root disk 253,  0 Jul 22 14:13 nvidia_efjjfcad
brw-rw---- 1 root disk 253,  1 Jul 22 14:13 nvidia_efjjfcadp1
brw-rw---- 1 root disk 253,  7 Jul 22 14:13 RavelData2-data
brw-rw---- 1 root disk 253,  6 Jul 22 14:13 RavelData2-largeswap
brw-rw---- 1 root disk 253,  5 Jul 22 14:13 RavelSystem-root
brw-rw---- 1 root disk 253,  4 Jul 22 14:13 RavelSystem-swap

The weirdness is this:

1) the large swap space is not being activated automatically on boot. It
can be manually activated with the swapon command, and as a stopgap
measure, I've added a swapon command to rc.local

2) vgscan, vgchange, and vgdisplay are not seeing RavelData2, *even
though* the system has no problem mounting the data disk during boot
and swapon has no problem activating the largeswap swap area (at least
once the system is in full multi-user mode). And (appearently) the
device mapper is perfectly happy to do its thing (more or less) with
these logical volumes.

[heller at ravel ~]$ sudo /sbin/vgscan -v --ignorelockingfailure --mknodes -d
    Wiping cache of LVM-capable devices
    Wiping internal VG cache
  Reading all physical volumes.  This may take a while...
    Finding all volume groups
    Finding volume group "RavelSystem"
  Found volume group "RavelSystem" using metadata type lvm2
    Finding all logical volumes
[heller at ravel ~]$ sudo /sbin/vgchange -a y
  2 logical volume(s) in volume group "RavelSystem" now active
[heller at ravel ~]$ sudo /sbin/vgchange -a y RavelData2
  Volume group "RavelData2" not found
[heller at ravel ~]$ sudo /usr/sbin/vgdisplay RavelData2
  Volume group "RavelData2" not found


Does anyone have any guesses as to what is going on here?

-- 
Robert Heller             -- 978-544-6933 / heller at deepsoft.com
Deepwoods Software        -- http://www.deepsoft.com/
()  ascii ribbon campaign -- against html e-mail
/\  www.asciiribbon.org   -- against proprietary attachments