Hi, folks,
Well, still more fun (for values of fun approaching zero):
1. Went to install CUDA 9.0... well, gee, there is *no* CUDA 9.0. Even though I installed the 9 repo, all that I get is 8. I've used their webform, and an waiting on a reply. 2. I remove all nvidia packages. 3. It appears that the kmod-nvidia is what I need; that's what nvidia-detect says. So I try to install... bzzt, thank you for playing.
a: uname -a: 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux b: Installing : kmod-nvidia-384.90-1.el7_4.elrepo.x86_64 1/2
Broadcast message from systemd-journald@lyon.cit.nih.gov (Wed 2017-09-27 11:43:12 EDT):
dracut[32409]: /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod?
Message from syslogd@lyon at Sep 27 11:43:12 ... dracut:/lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod?
Message from syslogd@lyon at Sep 27 11:43:12 ... dracut: /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod? Working. This may take some time ... /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod? /sbin/weak-modules: line 116: /boot/initramfs-3.10.0-693.el7.x86_64.tmp: No such file or directory /sbin/weak-modules: line 132: /boot/initramfs-3.10.0-693.el7.x86_64.tmp: No such file or directory /sbin/weak-modules: line 137: /boot/initramfs-3.10.0-693.el7.x86_64.tmp: No such file or directory Unable to decompress /boot/initramfs-3.10.0-693.el7.x86_64.tmp: Unknown format /sbin/weak-modules: line 175: /tmp/weak-modules.oC1A7x/new_initramfs.img: No such file or directory rm: cannot remove '/tmp/weak-modules.oC1A7x/new_initramfs.img': No such file or directory mv: cannot stat '/boot/initramfs-3.10.0-693.el7.x86_64.tmp': No such file or directory Done. Installing : nvidia-x11-drv-384.90-1.el7.elrepo.x86_64 2/2 etckeeper: post transaction commit Verifying : kmod-nvidia-384.90-1.el7_4.elrepo.x86_64 1/2 Verifying : nvidia-x11-drv-384.90-1.el7.elrepo.x86_64 2/2
Installed: kmod-nvidia.x86_64 0:384.90-1.el7_4.elrepo
Dependency Installed: nvidia-x11-drv.x86_64 0:384.90-1.el7.elrepo
Complete!
Well, no it's not complete, and it's trying to install in the *previous* kernel, not the running one.
mark
m.roth@5-cent.us wrote:
Hi, folks,
Well, still more fun (for values of fun approaching zero):
- Went to install CUDA 9.0... well, gee, there is *no* CUDA 9.0. Even though I installed the 9 repo, all that I get is 8. I've used their webform, and an waiting on a reply.
- I remove all nvidia packages.
- It appears that the kmod-nvidia is what I need; that's what nvidia-detect says. So I try to install... bzzt, thank you for playing.
If your intention is to use current NVIDIA drivers, you could try the download from their website. I´ve had good success with installing them directly from the download NVIDIA provides.
I know we aren´t supposed to do that, but after using that for years and then using distribution-provided NVIDIA drivers, I went back to the NVIDIA package because that was far more trouble-free and continues to be so. When you get a new kernel and when some libraries are updated, you need to reinstall the NVIDIA drivers, but I can live with that.
a: uname -a: 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux b: Installing : kmod-nvidia-384.90-1.el7_4.elrepo.x86_64 1/2
Broadcast message from systemd-journald@lyon.cit.nih.gov (Wed 2017-09-27 11:43:12 EDT):
dracut[32409]: /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod?
Message from syslogd@lyon at Sep 27 11:43:12 ... dracut:/lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod?
Message from syslogd@lyon at Sep 27 11:43:12 ... dracut: /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod? Working. This may take some time ... /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod? /sbin/weak-modules: line 116: /boot/initramfs-3.10.0-693.el7.x86_64.tmp: No such file or directory /sbin/weak-modules: line 132: /boot/initramfs-3.10.0-693.el7.x86_64.tmp: No such file or directory /sbin/weak-modules: line 137: /boot/initramfs-3.10.0-693.el7.x86_64.tmp: No such file or directory Unable to decompress /boot/initramfs-3.10.0-693.el7.x86_64.tmp: Unknown format /sbin/weak-modules: line 175: /tmp/weak-modules.oC1A7x/new_initramfs.img: No such file or directory rm: cannot remove '/tmp/weak-modules.oC1A7x/new_initramfs.img': No such file or directory mv: cannot stat '/boot/initramfs-3.10.0-693.el7.x86_64.tmp': No such file or directory Done. Installing : nvidia-x11-drv-384.90-1.el7.elrepo.x86_64 2/2 etckeeper: post transaction commit Verifying : kmod-nvidia-384.90-1.el7_4.elrepo.x86_64 1/2 Verifying : nvidia-x11-drv-384.90-1.el7.elrepo.x86_64 2/2
Installed: kmod-nvidia.x86_64 0:384.90-1.el7_4.elrepo
Dependency Installed: nvidia-x11-drv.x86_64 0:384.90-1.el7.elrepo
Complete!
Well, no it's not complete, and it's trying to install in the *previous* kernel, not the running one.
mark
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
On 27/09/17 16:49, m.roth@5-cent.us wrote:
Hi, folks,
Well, still more fun (for values of fun approaching zero): 1. Went to install CUDA 9.0... well, gee, there is *no* CUDA 9.0. Even though I installed the 9 repo, all that I get is 8. I've used their webform, and an waiting on a reply. 2. I remove all nvidia packages. 3. It appears that the kmod-nvidia is what I need; that's what nvidia-detect says. So I try to install... bzzt, thank you for playing. a: uname -a: 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux b: Installing : kmod-nvidia-384.90-1.el7_4.elrepo.x86_64 1/2
Broadcast message from systemd-journald@lyon.cit.nih.gov (Wed 2017-09-27 11:43:12 EDT):
dracut[32409]: /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod?
Message from syslogd@lyon at Sep 27 11:43:12 ... dracut:/lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod?
Message from syslogd@lyon at Sep 27 11:43:12 ... dracut: /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod? Working. This may take some time ... /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod? /sbin/weak-modules: line 116: /boot/initramfs-3.10.0-693.el7.x86_64.tmp: No such file or directory /sbin/weak-modules: line 132: /boot/initramfs-3.10.0-693.el7.x86_64.tmp: No such file or directory /sbin/weak-modules: line 137: /boot/initramfs-3.10.0-693.el7.x86_64.tmp: No such file or directory Unable to decompress /boot/initramfs-3.10.0-693.el7.x86_64.tmp: Unknown format /sbin/weak-modules: line 175: /tmp/weak-modules.oC1A7x/new_initramfs.img: No such file or directory rm: cannot remove '/tmp/weak-modules.oC1A7x/new_initramfs.img': No such file or directory mv: cannot stat '/boot/initramfs-3.10.0-693.el7.x86_64.tmp': No such file or directory Done. Installing : nvidia-x11-drv-384.90-1.el7.elrepo.x86_64 2/2 etckeeper: post transaction commit Verifying : kmod-nvidia-384.90-1.el7_4.elrepo.x86_64 1/2 Verifying : nvidia-x11-drv-384.90-1.el7.elrepo.x86_64 2/2
Installed: kmod-nvidia.x86_64 0:384.90-1.el7_4.elrepo
Dependency Installed: nvidia-x11-drv.x86_64 0:384.90-1.el7.elrepo
Complete!
Well, no it's not complete, and it's trying to install in the *previous* kernel, not the running one.
mark
kmod packages are a special class of package on RHEL that take advantage of the stable kernel ABI in Red Hat Enterprise Linux. When a kmod package is compiled against a kernel, the kernel module will be installed for that kernel and the weak-modules script will then weak link the module against all other kABI-compatible kernels installed on the system. This means that you do not need to rebuild the kernel module for each and every kernel update (or worse, delay updating your kernel whilst you wait for me to rebuild the module for you).
So yes, the module will likely be installed against a previous kernel, and maybe one that isn't even installed on your system. But it will weak link against your current kernel(s) providing none of the kernel symbols used by the module have changed between the kernel the module was built against and the current kernel in question. If you don't understand, just think of it as magic and be grateful you are running an Enterprise Linux kernel and not a fedora kernel.
As to the earlier error messages, have you been playing with depmod? Where is your modules.dep for your installed kernels? Anyway, the magic described above has likely not worked correctly due to missing modules.dep, so I would uninstall the nvidia packages, sort out your kernel(s) / depmod information and try again once you have a sane system.
Phil Perry wrote:
On 27/09/17 16:49, m.roth@5-cent.us wrote:
Hi, folks,
Well, still more fun (for values of fun approaching zero): 1. Went to install CUDA 9.0... well, gee, there is *no* CUDA 9.0. Even though I installed the 9 repo, all that I get is 8. I've used their webform, and an waiting on a reply. 2. I remove all nvidia packages. 3. It appears that the kmod-nvidia is what I need; that's what nvidia-detect says. So I try to install... bzzt, thank you for playing. a: uname -a: 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12
22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux b: Installing : kmod-nvidia-384.90-1.el7_4.elrepo.x86_64 1/2
Broadcast message from systemd-journald@lyon.cit.nih.gov (Wed 2017-09-27 11:43:12 EDT):
dracut[32409]: /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod?
Message from syslogd@lyon at Sep 27 11:43:12 ... dracut:/lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod?
Message from syslogd@lyon at Sep 27 11:43:12 ... dracut: /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod? Working. This may take some time ... /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod? /sbin/weak-modules: line 116: /boot/initramfs-3.10.0-693.el7.x86_64.tmp: No such file or directory /sbin/weak-modules: line 132: /boot/initramfs-3.10.0-693.el7.x86_64.tmp: No such file or directory /sbin/weak-modules: line 137: /boot/initramfs-3.10.0-693.el7.x86_64.tmp: No such file or directory Unable to decompress /boot/initramfs-3.10.0-693.el7.x86_64.tmp: Unknown format /sbin/weak-modules: line 175: /tmp/weak-modules.oC1A7x/new_initramfs.img: No such file or directory rm: cannot remove '/tmp/weak-modules.oC1A7x/new_initramfs.img': No such file or directory mv: cannot stat '/boot/initramfs-3.10.0-693.el7.x86_64.tmp': No such file or directory Done. Installing : nvidia-x11-drv-384.90-1.el7.elrepo.x86_64 2/2 etckeeper: post transaction commit Verifying : kmod-nvidia-384.90-1.el7_4.elrepo.x86_64 1/2 Verifying : nvidia-x11-drv-384.90-1.el7.elrepo.x86_64 2/2
Installed: kmod-nvidia.x86_64 0:384.90-1.el7_4.elrepo
Dependency Installed: nvidia-x11-drv.x86_64 0:384.90-1.el7.elrepo
Complete!
Well, no it's not complete, and it's trying to install in the *previous* kernel, not the running one.
kmod packages are a special class of package on RHEL that take advantage of the stable kernel ABI in Red Hat Enterprise Linux. When a kmod package is compiled against a kernel, the kernel module will be installed for that kernel and the weak-modules script will then weak link the module against all other kABI-compatible kernels installed on the system. This means that you do not need to rebuild the kernel module for each and every kernel update (or worse, delay updating your kernel whilst you wait for me to rebuild the module for you).
Ok. I had thought it did.
So yes, the module will likely be installed against a previous kernel, and maybe one that isn't even installed on your system. But it will weak link against your current kernel(s) providing none of the kernel symbols used by the module have changed between the kernel the module was built against and the current kernel in question. If you don't understand, just think of it as magic and be grateful you are running an Enterprise Linux kernel and not a fedora kernel.
As to the earlier error messages, have you been playing with depmod? Where is your modules.dep for your installed kernels? Anyway, the magic described above has likely not worked correctly due to missing modules.dep, so I would uninstall the nvidia packages, sort out your kernel(s) / depmod information and try again once you have a sane system.
Odd. The original kernel is installed, so I don't know why modules.dep wasn't there. I haven't had to run depmod before.
Btw, about your previous email: nvidia-detect tells me to use kmod-nvidia for the K20c. When I go to the elrepo page about it, and follow the link, for the 340, I don't see it supporting them, but the non-legacy does.
mark
Ok... I've cleaned up, ran a depmod on the previous/original kernel, and reinstalled kmod-nvidia. Both the depmod and the install didn't find a modules.order and another one, but seemed to install fine.
Now, I see that kmod-nvidia includes the nvidia-uvm-kmod, as well as cuda libraries. How do I test to see if it can see the Tesla cards? It used to be that I'd install cuda, build the samples, and run enum_gpu. When I rebuilt the other server, with a pair of M2090s, I could build the proprietary install, and install cuda, and then build the samples, and run bin/deviceQueryDrv.
Is there something I can run that I can see that it sees the cards? I haven't found anything yet.
mark
On 27/09/17 20:24, m.roth@5-cent.us wrote:
Phil Perry wrote:
On 27/09/17 16:49, m.roth@5-cent.us wrote:
Hi, folks,
Well, still more fun (for values of fun approaching zero): 1. Went to install CUDA 9.0... well, gee, there is *no* CUDA 9.0. Even though I installed the 9 repo, all that I get is 8. I've used their webform, and an waiting on a reply. 2. I remove all nvidia packages. 3. It appears that the kmod-nvidia is what I need; that's what nvidia-detect says. So I try to install... bzzt, thank you for playing. a: uname -a: 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12
22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux b: Installing : kmod-nvidia-384.90-1.el7_4.elrepo.x86_64 1/2
Broadcast message from systemd-journald@lyon.cit.nih.gov (Wed 2017-09-27 11:43:12 EDT):
dracut[32409]: /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod?
Message from syslogd@lyon at Sep 27 11:43:12 ... dracut:/lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod?
Message from syslogd@lyon at Sep 27 11:43:12 ... dracut: /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod? Working. This may take some time ... /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run depmod? /sbin/weak-modules: line 116: /boot/initramfs-3.10.0-693.el7.x86_64.tmp: No such file or directory /sbin/weak-modules: line 132: /boot/initramfs-3.10.0-693.el7.x86_64.tmp: No such file or directory /sbin/weak-modules: line 137: /boot/initramfs-3.10.0-693.el7.x86_64.tmp: No such file or directory Unable to decompress /boot/initramfs-3.10.0-693.el7.x86_64.tmp: Unknown format /sbin/weak-modules: line 175: /tmp/weak-modules.oC1A7x/new_initramfs.img: No such file or directory rm: cannot remove '/tmp/weak-modules.oC1A7x/new_initramfs.img': No such file or directory mv: cannot stat '/boot/initramfs-3.10.0-693.el7.x86_64.tmp': No such file or directory Done. Installing : nvidia-x11-drv-384.90-1.el7.elrepo.x86_64 2/2 etckeeper: post transaction commit Verifying : kmod-nvidia-384.90-1.el7_4.elrepo.x86_64 1/2 Verifying : nvidia-x11-drv-384.90-1.el7.elrepo.x86_64 2/2
Installed: kmod-nvidia.x86_64 0:384.90-1.el7_4.elrepo
Dependency Installed: nvidia-x11-drv.x86_64 0:384.90-1.el7.elrepo
Complete!
Well, no it's not complete, and it's trying to install in the *previous* kernel, not the running one.
kmod packages are a special class of package on RHEL that take advantage of the stable kernel ABI in Red Hat Enterprise Linux. When a kmod package is compiled against a kernel, the kernel module will be installed for that kernel and the weak-modules script will then weak link the module against all other kABI-compatible kernels installed on the system. This means that you do not need to rebuild the kernel module for each and every kernel update (or worse, delay updating your kernel whilst you wait for me to rebuild the module for you).
Ok. I had thought it did.
So yes, the module will likely be installed against a previous kernel, and maybe one that isn't even installed on your system. But it will weak link against your current kernel(s) providing none of the kernel symbols used by the module have changed between the kernel the module was built against and the current kernel in question. If you don't understand, just think of it as magic and be grateful you are running an Enterprise Linux kernel and not a fedora kernel.
As to the earlier error messages, have you been playing with depmod? Where is your modules.dep for your installed kernels? Anyway, the magic described above has likely not worked correctly due to missing modules.dep, so I would uninstall the nvidia packages, sort out your kernel(s) / depmod information and try again once you have a sane system.
Odd. The original kernel is installed, so I don't know why modules.dep wasn't there. I haven't had to run depmod before.
Btw, about your previous email: nvidia-detect tells me to use kmod-nvidia for the K20c. When I go to the elrepo page about it, and follow the link, for the 340, I don't see it supporting them, but the non-legacy does.
mark
I would trust what nvidia-detect tells you. It is based on the definitive information provided by NVIDIA in their docs:
http://us.download.nvidia.com/XFree86/Linux-x86_64/384.90/README/supportedch...