[CentOS] Semi-OT: hardware: NVidia proprietary driver, C7.4

Wed Sep 27 19:03:25 UTC 2017
Phil Perry <pperry at elrepo.org>

On 27/09/17 16:49, m.roth at 5-cent.us wrote:
> Hi, folks,
> 
>     Well, still more fun (for values of fun approaching zero):
> 
>     1. Went to install CUDA 9.0... well, gee, there is *no* CUDA 9.0.
>          Even though I installed the 9 repo, all that I get is 8. I've
>          used their webform, and an waiting on a reply.
>     2. I remove all nvidia packages.
>     3. It appears that the kmod-nvidia is what I need; that's what
>          nvidia-detect says. So I try to install... bzzt, thank you
>          for playing.
> 
>        a: uname -a:  3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13
> UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>        b:
>    Installing : kmod-nvidia-384.90-1.el7_4.elrepo.x86_64
>   1/2
> 
> Broadcast message from systemd-journald at lyon.cit.nih.gov (Wed 2017-09-27
> 11:43:12 EDT):
> 
> dracut[32409]: /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing.
> Did you run depmod?
> 
> 
> Message from syslogd at lyon at Sep 27 11:43:12 ...
>   dracut:/lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did
> you run depmod?
> 
> Message from syslogd at lyon at Sep 27 11:43:12 ...
>   dracut: /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did
> you run depmod?
> Working. This may take some time ...
> /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run
> depmod?
> /sbin/weak-modules: line 116: /boot/initramfs-3.10.0-693.el7.x86_64.tmp:
> No such file or directory
> /sbin/weak-modules: line 132: /boot/initramfs-3.10.0-693.el7.x86_64.tmp:
> No such file or directory
> /sbin/weak-modules: line 137: /boot/initramfs-3.10.0-693.el7.x86_64.tmp:
> No such file or directory
> Unable to decompress /boot/initramfs-3.10.0-693.el7.x86_64.tmp: Unknown
> format
> /sbin/weak-modules: line 175: /tmp/weak-modules.oC1A7x/new_initramfs.img:
> No such file or directory
> rm: cannot remove '/tmp/weak-modules.oC1A7x/new_initramfs.img': No such
> file or directory
> mv: cannot stat '/boot/initramfs-3.10.0-693.el7.x86_64.tmp': No such file
> or directory
> Done.
>    Installing : nvidia-x11-drv-384.90-1.el7.elrepo.x86_64
>   2/2
> etckeeper: post transaction commit
>    Verifying  : kmod-nvidia-384.90-1.el7_4.elrepo.x86_64
>   1/2
>    Verifying  : nvidia-x11-drv-384.90-1.el7.elrepo.x86_64
>   2/2
> 
> Installed:
>    kmod-nvidia.x86_64 0:384.90-1.el7_4.elrepo
> 
> Dependency Installed:
>    nvidia-x11-drv.x86_64 0:384.90-1.el7.elrepo
> 
> Complete!
> 
> Well, no it's not complete, and it's trying to install in the *previous*
> kernel, not the running one.
> 
>       mark
> 

kmod packages are a special class of package on RHEL that take advantage 
of the stable kernel ABI in Red Hat Enterprise Linux. When a kmod 
package is compiled against a kernel, the kernel module will be 
installed for that kernel and the weak-modules script will then weak 
link the module against all other kABI-compatible kernels installed on 
the system. This means that you do not need to rebuild the kernel module 
for each and every kernel update (or worse, delay updating your kernel 
whilst you wait for me to rebuild the module for you).

So yes, the module will likely be installed against a previous kernel, 
and maybe one that isn't even installed on your system. But it will weak 
link against your current kernel(s) providing none of the kernel symbols 
used by the module have changed between the kernel the module was built 
against and the current kernel in question. If you don't understand, 
just think of it as magic and be grateful you are running an Enterprise 
Linux kernel and not a fedora kernel.

As to the earlier error messages, have you been playing with depmod? 
Where is your modules.dep for your installed kernels? Anyway, the magic 
described above has likely not worked correctly due to missing 
modules.dep, so I would uninstall the nvidia packages, sort out your 
kernel(s) / depmod information and try again once you have a sane system.