[CentOS] CentOS 7, Xeon CPUs, not booting, [SOLVED], bug filed

Thu Feb 18 21:25:00 UTC 2016
m.roth at 5-cent.us <m.roth at 5-cent.us>

Paul Heinlein wrote:
> On Thu, 18 Feb 2016, m.roth at 5-cent.us wrote:
>
>> This is happening on anything other than plain vanilla Dell servers. One
>> R730, with dual Tesla cards, one R420, with a fibre card for a RAID
>> device, it never switches root. All these systems have Xeons, not AMD
>> CPUs.
>>
>> We've had this with every one of the 327 kernels. In addition, it seems
>> to happen also with the 229.20.1; the 229.14.1 has no such problem.
>>
>> From the rdsosreport:
>> starting at line 126:
>> /dev/disk/by-label:
>> total 0
>> lrwxrwxrwx 1 root 0 10 Jan 27 19:03 SWAP -> ../../sda2
>> lrwxrwxrwx 1 root 0 10 Jan 27 19:03 \x2f -> ../../sda3
>> lrwxrwxrwx 1 root 0 10 Jan 27 19:03 \x2fboot -> ../../sda1
>>
>> Then, starting at line 1283:
>> [    3.317027] <servername> systemd[1]: Found device ST500NM0003-9ZM172
>> /.
>> [    3.317974] <servername> systemd[1]: Starting File System Check on
>> /dev/disk/by-label/\x2f...
>> [    3.320089] <servername> systemd-fsck[590]: Failed to detect device
>> /dev/disk/by-label//
>> [    3.320567] <servername> systemd[1]: systemd-fsck-root.service: main
>> process exited, code=exited, status=1/FAILURE
>> [    3.320972] <servername> systemd[1]: Failed to start File System
>> Check
>> on /dev/disk/by-label/\x2f.
>>
>> Does *ANYONE* have any clues as to what's going on?
>>
>> Meanwhile, on a plain vanilla Dell R420, I see:
>> ll /dev/disk/by-label/
>> total 0
>> lrwxrwxrwx. 1 root root 10 Feb 17 10:06 SWAP -> ../../sda2
>> lrwxrwxrwx. 1 root root 10 Feb 17 10:06 boot -> ../../sda1
>> lrwxrwxrwx. 1 root root 10 Feb 17 10:06 root -> ../../sda3
>>
>> So, what is this by-label with the x2f, and why can't it find the
>> drives?
>>
>> Or do I have to file a bug report? This is a true show-stopper.
>
> Here are a few related thoughts:
>
> The 'x2f' looks to me very similar to me to %2F, the URL encoding for
> the forward slash (/).
>
> If you look in /usr/lib/udev/rules.d, you'll see rules like
>
> ENV{ID_FS_USAGE}=="filesystem|other", ENV{ID_FS_LABEL_ENC}=="?*",
> SYMLINK+="disk/by-label/$env{ID_FS_LABEL_ENC}"
>
> where, if ID_FS_LABEL_ENC were equal to "/", then the rule would be
> disk/by-label// -- with two trailing slashes, which (perhaps) gets
> interpreted not as one slash (like cd might do) by as "/x2f".
>
> That's the end of random thought #1.
>
> The second is like it:
>
> A local C7 machine has this root entry in /etc/fstab:
>
>    /dev/mapper/vg00-rootdev  /  xfs  defaults  0  0
>
> When I search my system logs for messages like the ones in your
> original post, I see
>
>    systemd: Found device /dev/mapper/vg00-rootdev.
>    systemd: Starting File System Check on /dev/mapper/vg00-rootdev...
>
> It's only after that's complete that I get device-specific messages
> like
>
>    systemd: Found device ST9600204SS.
>
> So I'm interested to know the content of your /etc/fstab file.
>
> End of thought #2.

I just successfully brought up one that consistently failed. And filed a
bug report, 0010398.

What I did:
1. in /etc/fstab, I changed LABEL= to /dev/sda*
2. I did rebuild the initramfs with that.
That still didn't do it.

Finally, I did this: from the grub2 boot menu, I edited the kernel line so
that instead of reading ... root=LABEL=/, it read root=/dev/sda3, and it
booted with zero issues.

There is, therefore, a bug in grub2? the handoff to systemd? where it does
not handle LABEL correctly.

        mark