This is happening on anything other than plain vanilla Dell servers. One R730, with dual Tesla cards, one R420, with a fibre card for a RAID device, it never switches root. All these systems have Xeons, not AMD CPUs.
We've had this with every one of the 327 kernels. In addition, it seems to happen also with the 229.20.1; the 229.14.1 has no such problem.
From the rdsosreport:
starting at line 126: /dev/disk/by-label: total 0 lrwxrwxrwx 1 root 0 10 Jan 27 19:03 SWAP -> ../../sda2 lrwxrwxrwx 1 root 0 10 Jan 27 19:03 \x2f -> ../../sda3 lrwxrwxrwx 1 root 0 10 Jan 27 19:03 \x2fboot -> ../../sda1
Then, starting at line 1283: [ 3.317027] <servername> systemd[1]: Found device ST500NM0003-9ZM172 /. [ 3.317974] <servername> systemd[1]: Starting File System Check on /dev/disk/by-label/\x2f... [ 3.320089] <servername> systemd-fsck[590]: Failed to detect device /dev/disk/by-label// [ 3.320567] <servername> systemd[1]: systemd-fsck-root.service: main process exited, code=exited, status=1/FAILURE [ 3.320972] <servername> systemd[1]: Failed to start File System Check on /dev/disk/by-label/\x2f.
Does *ANYONE* have any clues as to what's going on?
Meanwhile, on a plain vanilla Dell R420, I see: ll /dev/disk/by-label/ total 0 lrwxrwxrwx. 1 root root 10 Feb 17 10:06 SWAP -> ../../sda2 lrwxrwxrwx. 1 root root 10 Feb 17 10:06 boot -> ../../sda1 lrwxrwxrwx. 1 root root 10 Feb 17 10:06 root -> ../../sda3
So, what is this by-label with the x2f, and why can't it find the drives?
Or do I have to file a bug report? This is a true show-stopper.
mark
On Thu, 18 Feb 2016, m.roth@5-cent.us wrote:
This is happening on anything other than plain vanilla Dell servers. One R730, with dual Tesla cards, one R420, with a fibre card for a RAID device, it never switches root. All these systems have Xeons, not AMD CPUs.
We've had this with every one of the 327 kernels. In addition, it seems to happen also with the 229.20.1; the 229.14.1 has no such problem.
From the rdsosreport: starting at line 126: /dev/disk/by-label: total 0 lrwxrwxrwx 1 root 0 10 Jan 27 19:03 SWAP -> ../../sda2 lrwxrwxrwx 1 root 0 10 Jan 27 19:03 \x2f -> ../../sda3 lrwxrwxrwx 1 root 0 10 Jan 27 19:03 \x2fboot -> ../../sda1
Then, starting at line 1283: [ 3.317027] <servername> systemd[1]: Found device ST500NM0003-9ZM172 /. [ 3.317974] <servername> systemd[1]: Starting File System Check on /dev/disk/by-label/\x2f... [ 3.320089] <servername> systemd-fsck[590]: Failed to detect device /dev/disk/by-label// [ 3.320567] <servername> systemd[1]: systemd-fsck-root.service: main process exited, code=exited, status=1/FAILURE [ 3.320972] <servername> systemd[1]: Failed to start File System Check on /dev/disk/by-label/\x2f.
Does *ANYONE* have any clues as to what's going on?
Meanwhile, on a plain vanilla Dell R420, I see: ll /dev/disk/by-label/ total 0 lrwxrwxrwx. 1 root root 10 Feb 17 10:06 SWAP -> ../../sda2 lrwxrwxrwx. 1 root root 10 Feb 17 10:06 boot -> ../../sda1 lrwxrwxrwx. 1 root root 10 Feb 17 10:06 root -> ../../sda3
So, what is this by-label with the x2f, and why can't it find the drives?
Or do I have to file a bug report? This is a true show-stopper.
Here are a few related thoughts:
The 'x2f' looks to me very similar to me to %2F, the URL encoding for the forward slash (/).
If you look in /usr/lib/udev/rules.d, you'll see rules like
ENV{ID_FS_USAGE}=="filesystem|other", ENV{ID_FS_LABEL_ENC}=="?*", SYMLINK+="disk/by-label/$env{ID_FS_LABEL_ENC}"
where, if ID_FS_LABEL_ENC were equal to "/", then the rule would be disk/by-label// -- with two trailing slashes, which (perhaps) gets interpreted not as one slash (like cd might do) by as "/x2f".
That's the end of random thought #1.
The second is like it:
A local C7 machine has this root entry in /etc/fstab:
/dev/mapper/vg00-rootdev / xfs defaults 0 0
When I search my system logs for messages like the ones in your original post, I see
systemd: Found device /dev/mapper/vg00-rootdev. systemd: Starting File System Check on /dev/mapper/vg00-rootdev...
It's only after that's complete that I get device-specific messages like
systemd: Found device ST9600204SS.
So I'm interested to know the content of your /etc/fstab file.
End of thought #2.
Paul Heinlein wrote:
On Thu, 18 Feb 2016, m.roth@5-cent.us wrote:
This is happening on anything other than plain vanilla Dell servers. One R730, with dual Tesla cards, one R420, with a fibre card for a RAID device, it never switches root. All these systems have Xeons, not AMD CPUs.
We've had this with every one of the 327 kernels. In addition, it seems to happen also with the 229.20.1; the 229.14.1 has no such problem.
From the rdsosreport: starting at line 126: /dev/disk/by-label: total 0 lrwxrwxrwx 1 root 0 10 Jan 27 19:03 SWAP -> ../../sda2 lrwxrwxrwx 1 root 0 10 Jan 27 19:03 \x2f -> ../../sda3 lrwxrwxrwx 1 root 0 10 Jan 27 19:03 \x2fboot -> ../../sda1
Then, starting at line 1283: [ 3.317027] <servername> systemd[1]: Found device ST500NM0003-9ZM172 /. [ 3.317974] <servername> systemd[1]: Starting File System Check on /dev/disk/by-label/\x2f... [ 3.320089] <servername> systemd-fsck[590]: Failed to detect device /dev/disk/by-label// [ 3.320567] <servername> systemd[1]: systemd-fsck-root.service: main process exited, code=exited, status=1/FAILURE [ 3.320972] <servername> systemd[1]: Failed to start File System Check on /dev/disk/by-label/\x2f.
Does *ANYONE* have any clues as to what's going on?
Meanwhile, on a plain vanilla Dell R420, I see: ll /dev/disk/by-label/ total 0 lrwxrwxrwx. 1 root root 10 Feb 17 10:06 SWAP -> ../../sda2 lrwxrwxrwx. 1 root root 10 Feb 17 10:06 boot -> ../../sda1 lrwxrwxrwx. 1 root root 10 Feb 17 10:06 root -> ../../sda3
So, what is this by-label with the x2f, and why can't it find the drives?
Or do I have to file a bug report? This is a true show-stopper.
Here are a few related thoughts:
The 'x2f' looks to me very similar to me to %2F, the URL encoding for the forward slash (/).
If you look in /usr/lib/udev/rules.d, you'll see rules like
ENV{ID_FS_USAGE}=="filesystem|other", ENV{ID_FS_LABEL_ENC}=="?*", SYMLINK+="disk/by-label/$env{ID_FS_LABEL_ENC}"
where, if ID_FS_LABEL_ENC were equal to "/", then the rule would be disk/by-label// -- with two trailing slashes, which (perhaps) gets interpreted not as one slash (like cd might do) by as "/x2f".
That's the end of random thought #1.
The second is like it:
A local C7 machine has this root entry in /etc/fstab:
/dev/mapper/vg00-rootdev / xfs defaults 0 0
When I search my system logs for messages like the ones in your original post, I see
systemd: Found device /dev/mapper/vg00-rootdev. systemd: Starting File System Check on /dev/mapper/vg00-rootdev...
It's only after that's complete that I get device-specific messages like
systemd: Found device ST9600204SS.
So I'm interested to know the content of your /etc/fstab file.
End of thought #2.
I just successfully brought up one that consistently failed. And filed a bug report, 0010398.
What I did: 1. in /etc/fstab, I changed LABEL= to /dev/sda* 2. I did rebuild the initramfs with that. That still didn't do it.
Finally, I did this: from the grub2 boot menu, I edited the kernel line so that instead of reading ... root=LABEL=/, it read root=/dev/sda3, and it booted with zero issues.
There is, therefore, a bug in grub2? the handoff to systemd? where it does not handle LABEL correctly.
mark
<SNIP>
What I did:
- in /etc/fstab, I changed LABEL= to /dev/sda*
- I did rebuild the initramfs with that.
That still didn't do it.
Finally, I did this: from the grub2 boot menu, I edited the kernel line so that instead of reading ... root=LABEL=/, it read root=/dev/sda3, and it booted with zero issues.
There is, therefore, a bug in grub2? the handoff to systemd? where it does not handle LABEL correctly.
One more bit of information, which I added to the bug report: using e2label, I relabeled /boot and / to boot and root, and edited /etc/fstab and /etc/grub2.cfg to reflect that... and it booted with no trouble. I believe that a month ago, I neglected to edit grub2.cfg.
Note that /dev/sdd1 and /dev/sde1, which both have labels that begin with a leading slash, mounted correctly. This, to me, indicates the bug is with grub2's handling of LABEL=.
mark
On Thu, Feb 18, 2016 at 2:25 PM, m.roth@5-cent.us wrote:
Note that /dev/sdd1 and /dev/sde1, which both have labels that begin with a leading slash, mounted correctly. This, to me, indicates the bug is with grub2's handling of LABEL=.
I'm pretty sure grub2 just passes strings to the kernel. Also, if you're able to select an older kernel and the system boots, then the signs probably point to a problem with the kernel's handling of labels.