[CentOS] EL9/udev generates wrong device nodes/symlinks with HPE Smart Array controller

Wed Mar 1 11:22:35 UTC 2023
Simon Matter <simon.matter at invoca.ch>

Hi,

I see some strange and dangerous things happening on a HPE server with HPE
Smart Array controller where EL9 ends up with wrong device nodes/symlinks
to the attached disks/raid volumes:

(I didn't touch anything here but at 08:09 some symlinks were changed)
/dev/disk/by-id/:
lrwxrwxrwx 1 root root  9 Mar  1 07:57 scsi-0HP_LOGICAL_VOLUME_00000000 ->
../../sdc
lrwxrwxrwx 1 root root 10 Mar  1 07:57
scsi-0HP_LOGICAL_VOLUME_00000000-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Mar  1 07:57
scsi-0HP_LOGICAL_VOLUME_00000000-part2 -> ../../sdc2
lrwxrwxrwx 1 root root  9 Mar  1 07:57 scsi-0HP_LOGICAL_VOLUME_01000000 ->
../../sdb
lrwxrwxrwx 1 root root  9 Mar  1 08:09 scsi-0HP_LOGICAL_VOLUME_02000000 ->
../../sda
lrwxrwxrwx 1 root root  9 Mar  1 07:57 scsi-0HP_LOGICAL_VOLUME_03000000 ->
../../sdd
lrwxrwxrwx 1 root root  9 Mar  1 08:09
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 -> ../../sda
lrwxrwxrwx 1 root root 10 Mar  1 07:57
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Mar  1 07:57
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part2 -> ../../sdc2

/dev/disk/by-path/:
lrwxrwxrwx 1 root root  9 Mar  1 07:57 pci-0000:03:00.0-scsi-0:1:0:0 ->
../../sdc
lrwxrwxrwx 1 root root 10 Mar  1 07:57 pci-0000:03:00.0-scsi-0:1:0:0-part1
-> ../../sdc1
lrwxrwxrwx 1 root root 10 Mar  1 07:57 pci-0000:03:00.0-scsi-0:1:0:0-part2
-> ../../sdc2
lrwxrwxrwx 1 root root  9 Mar  1 07:57 pci-0000:03:00.0-scsi-0:1:0:1 ->
../../sdb
lrwxrwxrwx 1 root root  9 Mar  1 08:09 pci-0000:03:00.0-scsi-0:1:0:2 ->
../../sda
lrwxrwxrwx 1 root root  9 Mar  1 07:57 pci-0000:03:00.0-scsi-0:1:0:3 ->
../../sdd

After rebooting, the things are different but also wrong:

(here nothing has changed after boot but symlinks are already wrong)
/dev/disk/by-id/:
lrwxrwxrwx 1 root root   9 Mar  1 10:56 scsi-0HP_LOGICAL_VOLUME_00000000
-> ../../sdb
lrwxrwxrwx 1 root root  10 Mar  1 10:56
scsi-0HP_LOGICAL_VOLUME_00000000-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  10 Mar  1 10:56
scsi-0HP_LOGICAL_VOLUME_00000000-part2 -> ../../sdb2
lrwxrwxrwx 1 root root   9 Mar  1 10:56 scsi-0HP_LOGICAL_VOLUME_01000000
-> ../../sda
lrwxrwxrwx 1 root root   9 Mar  1 10:56 scsi-0HP_LOGICAL_VOLUME_02000000
-> ../../sdd
lrwxrwxrwx 1 root root   9 Mar  1 10:56 scsi-0HP_LOGICAL_VOLUME_03000000
-> ../../sdc
lrwxrwxrwx 1 root root   9 Mar  1 10:56
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 -> ../../sda
lrwxrwxrwx 1 root root  10 Mar  1 10:56
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  10 Mar  1 10:56
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part2 -> ../../sdb2

/dev/disk/by-path/:
lrwxrwxrwx 1 root root   9 Mar  1 10:56 pci-0000:03:00.0-scsi-0:1:0:0 ->
../../sdb
lrwxrwxrwx 1 root root  10 Mar  1 10:56
pci-0000:03:00.0-scsi-0:1:0:0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  10 Mar  1 10:56
pci-0000:03:00.0-scsi-0:1:0:0-part2 -> ../../sdb2
lrwxrwxrwx 1 root root   9 Mar  1 10:56 pci-0000:03:00.0-scsi-0:1:0:1 ->
../../sda
lrwxrwxrwx 1 root root   9 Mar  1 10:56 pci-0000:03:00.0-scsi-0:1:0:2 ->
../../sdd
lrwxrwxrwx 1 root root   9 Mar  1 10:56 pci-0000:03:00.0-scsi-0:1:0:3 ->
../../sdc

Note that two things are strange:

1) the /dev/sd* nodes are in a random order after every restart.
# lsscsi
[1:0:0:0]    storage HP       P410i            6.64  -
[1:1:0:0]    disk    HP       LOGICAL VOLUME   6.64  /dev/sdb
[1:1:0:1]    disk    HP       LOGICAL VOLUME   6.64  /dev/sda
[1:1:0:2]    disk    HP       LOGICAL VOLUME   6.64  /dev/sdd
[1:1:0:3]    disk    HP       LOGICAL VOLUME   6.64  /dev/sdc

2) some symlinks created by udev are just wrong and therefore very
dangerous to use:
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 -> ../../sda
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part1 -> ../../sdb1
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part2 -> ../../sdb2

While 1 may be expected(???) I think 2 should really not happen.

I've tried to find out where things go wrong but the whole udev stuff
started to hurt my brain :)

I'm quite sure HPE Smart Array based servers are quite common so my big
question is: do others see that same?

While it's possible to live with this mess I'd really like to fix it somehow.

Thanks,
Simon