Hi,
I see some strange and dangerous things happening on a HPE server with HPE Smart Array controller where EL9 ends up with wrong device nodes/symlinks to the attached disks/raid volumes:
(I didn't touch anything here but at 08:09 some symlinks were changed) /dev/disk/by-id/: lrwxrwxrwx 1 root root 9 Mar 1 07:57 scsi-0HP_LOGICAL_VOLUME_00000000 -> ../../sdc lrwxrwxrwx 1 root root 10 Mar 1 07:57 scsi-0HP_LOGICAL_VOLUME_00000000-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Mar 1 07:57 scsi-0HP_LOGICAL_VOLUME_00000000-part2 -> ../../sdc2 lrwxrwxrwx 1 root root 9 Mar 1 07:57 scsi-0HP_LOGICAL_VOLUME_01000000 -> ../../sdb lrwxrwxrwx 1 root root 9 Mar 1 08:09 scsi-0HP_LOGICAL_VOLUME_02000000 -> ../../sda lrwxrwxrwx 1 root root 9 Mar 1 07:57 scsi-0HP_LOGICAL_VOLUME_03000000 -> ../../sdd lrwxrwxrwx 1 root root 9 Mar 1 08:09 scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 -> ../../sda lrwxrwxrwx 1 root root 10 Mar 1 07:57 scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Mar 1 07:57 scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part2 -> ../../sdc2
/dev/disk/by-path/: lrwxrwxrwx 1 root root 9 Mar 1 07:57 pci-0000:03:00.0-scsi-0:1:0:0 -> ../../sdc lrwxrwxrwx 1 root root 10 Mar 1 07:57 pci-0000:03:00.0-scsi-0:1:0:0-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Mar 1 07:57 pci-0000:03:00.0-scsi-0:1:0:0-part2 -> ../../sdc2 lrwxrwxrwx 1 root root 9 Mar 1 07:57 pci-0000:03:00.0-scsi-0:1:0:1 -> ../../sdb lrwxrwxrwx 1 root root 9 Mar 1 08:09 pci-0000:03:00.0-scsi-0:1:0:2 -> ../../sda lrwxrwxrwx 1 root root 9 Mar 1 07:57 pci-0000:03:00.0-scsi-0:1:0:3 -> ../../sdd
After rebooting, the things are different but also wrong:
(here nothing has changed after boot but symlinks are already wrong) /dev/disk/by-id/: lrwxrwxrwx 1 root root 9 Mar 1 10:56 scsi-0HP_LOGICAL_VOLUME_00000000 -> ../../sdb lrwxrwxrwx 1 root root 10 Mar 1 10:56 scsi-0HP_LOGICAL_VOLUME_00000000-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Mar 1 10:56 scsi-0HP_LOGICAL_VOLUME_00000000-part2 -> ../../sdb2 lrwxrwxrwx 1 root root 9 Mar 1 10:56 scsi-0HP_LOGICAL_VOLUME_01000000 -> ../../sda lrwxrwxrwx 1 root root 9 Mar 1 10:56 scsi-0HP_LOGICAL_VOLUME_02000000 -> ../../sdd lrwxrwxrwx 1 root root 9 Mar 1 10:56 scsi-0HP_LOGICAL_VOLUME_03000000 -> ../../sdc lrwxrwxrwx 1 root root 9 Mar 1 10:56 scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 -> ../../sda lrwxrwxrwx 1 root root 10 Mar 1 10:56 scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Mar 1 10:56 scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part2 -> ../../sdb2
/dev/disk/by-path/: lrwxrwxrwx 1 root root 9 Mar 1 10:56 pci-0000:03:00.0-scsi-0:1:0:0 -> ../../sdb lrwxrwxrwx 1 root root 10 Mar 1 10:56 pci-0000:03:00.0-scsi-0:1:0:0-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Mar 1 10:56 pci-0000:03:00.0-scsi-0:1:0:0-part2 -> ../../sdb2 lrwxrwxrwx 1 root root 9 Mar 1 10:56 pci-0000:03:00.0-scsi-0:1:0:1 -> ../../sda lrwxrwxrwx 1 root root 9 Mar 1 10:56 pci-0000:03:00.0-scsi-0:1:0:2 -> ../../sdd lrwxrwxrwx 1 root root 9 Mar 1 10:56 pci-0000:03:00.0-scsi-0:1:0:3 -> ../../sdc
Note that two things are strange:
1) the /dev/sd* nodes are in a random order after every restart. # lsscsi [1:0:0:0] storage HP P410i 6.64 - [1:1:0:0] disk HP LOGICAL VOLUME 6.64 /dev/sdb [1:1:0:1] disk HP LOGICAL VOLUME 6.64 /dev/sda [1:1:0:2] disk HP LOGICAL VOLUME 6.64 /dev/sdd [1:1:0:3] disk HP LOGICAL VOLUME 6.64 /dev/sdc
2) some symlinks created by udev are just wrong and therefore very dangerous to use: scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 -> ../../sda scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part1 -> ../../sdb1 scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part2 -> ../../sdb2
While 1 may be expected(???) I think 2 should really not happen.
I've tried to find out where things go wrong but the whole udev stuff started to hurt my brain :)
I'm quite sure HPE Smart Array based servers are quite common so my big question is: do others see that same?
While it's possible to live with this mess I'd really like to fix it somehow.
Thanks, Simon
Simon Matter simon.matter@invoca.ch
- some symlinks created by udev are just wrong and therefore very
dangerous to use: scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 -> ../../sda scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part1 -> ../../sdb1 scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part2 -> ../../sdb2
I think it maybe caused by sd driver asynchronous scanning. I am lucky that I didn't see this before. nvme may have similar issues, but nvme has boot parameter to avoid it. Suse has boot parameter to avoid it. with EL9 we will wait until EL 9.3 if we are lucky. I had report issue: https://bugzilla.redhat.com/show_bug.cgi?id=2140017
Hi,
Simon Matter simon.matter@invoca.ch
- some symlinks created by udev are just wrong and therefore very
dangerous to use: scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 -> ../../sda scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part1 -> ../../sdb1 scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part2 -> ../../sdb2
I think it maybe caused by sd driver asynchronous scanning. I am lucky that I didn't see this before. nvme may have similar issues, but nvme has boot parameter to avoid it. Suse has boot parameter to avoid it. with EL9 we will wait until EL 9.3 if we are lucky. I had report issue: https://bugzilla.redhat.com/show_bug.cgi?id=2140017
Thanks for confirming that I'm not alone with this "feature"
In the above example, it's much fun if you want to wipe the two partitions on /dev/disk/by-id/scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 and therefore wipe this device. You end up wiping the wrong disk!
When I see such things my blood start boiling :(
Regards, Simon
Simon Matter simon.matter@invoca.ch
Thanks for confirming that I'm not alone with this "feature"
In the above example, it's much fun if you want to wipe the two partitions on /dev/disk/by-id/scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 and therefore wipe this device. You end up wiping the wrong disk!
When I see such things my blood start boiling :(
so I said I am lucky that my storage controllers didn't have such behavior. nvme has similar situation so there are people who destroy the wrong drive. https://github.com/linux-nvme/nvme-cli/issues/501