[CentOS] Semi-OT: hardware: NVidia proprietary driver, C7.4

Wed Sep 27 19:24:57 UTC 2017
m.roth at 5-cent.us <m.roth at 5-cent.us>

Phil Perry wrote:
> On 27/09/17 16:49, m.roth at 5-cent.us wrote:
>> Hi, folks,
>>
>>     Well, still more fun (for values of fun approaching zero):
>>
>>     1. Went to install CUDA 9.0... well, gee, there is *no* CUDA 9.0.
>>          Even though I installed the 9 repo, all that I get is 8. I've
>>          used their webform, and an waiting on a reply.
>>     2. I remove all nvidia packages.
>>     3. It appears that the kmod-nvidia is what I need; that's what
>>          nvidia-detect says. So I try to install... bzzt, thank you
>>          for playing.
>>
>>        a: uname -a:  3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12
>> 22:26:13
>> UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>>        b:
>>    Installing : kmod-nvidia-384.90-1.el7_4.elrepo.x86_64
>>   1/2
>>
>> Broadcast message from systemd-journald at lyon.cit.nih.gov (Wed 2017-09-27
>> 11:43:12 EDT):
>>
>> dracut[32409]: /lib/modules/3.10.0-693.el7.x86_64//modules.dep is
>> missing.
>> Did you run depmod?
>>
>>
>> Message from syslogd at lyon at Sep 27 11:43:12 ...
>>   dracut:/lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did
>> you run depmod?
>>
>> Message from syslogd at lyon at Sep 27 11:43:12 ...
>>   dracut: /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing.
>> Did
>> you run depmod?
>> Working. This may take some time ...
>> /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run
>> depmod?
>> /sbin/weak-modules: line 116: /boot/initramfs-3.10.0-693.el7.x86_64.tmp:
>> No such file or directory
>> /sbin/weak-modules: line 132: /boot/initramfs-3.10.0-693.el7.x86_64.tmp:
>> No such file or directory
>> /sbin/weak-modules: line 137: /boot/initramfs-3.10.0-693.el7.x86_64.tmp:
>> No such file or directory
>> Unable to decompress /boot/initramfs-3.10.0-693.el7.x86_64.tmp: Unknown
>> format
>> /sbin/weak-modules: line 175:
>> /tmp/weak-modules.oC1A7x/new_initramfs.img:
>> No such file or directory
>> rm: cannot remove '/tmp/weak-modules.oC1A7x/new_initramfs.img': No such
>> file or directory
>> mv: cannot stat '/boot/initramfs-3.10.0-693.el7.x86_64.tmp': No such
>> file
>> or directory
>> Done.
>>    Installing : nvidia-x11-drv-384.90-1.el7.elrepo.x86_64
>>   2/2
>> etckeeper: post transaction commit
>>    Verifying  : kmod-nvidia-384.90-1.el7_4.elrepo.x86_64
>>   1/2
>>    Verifying  : nvidia-x11-drv-384.90-1.el7.elrepo.x86_64
>>   2/2
>>
>> Installed:
>>    kmod-nvidia.x86_64 0:384.90-1.el7_4.elrepo
>>
>> Dependency Installed:
>>    nvidia-x11-drv.x86_64 0:384.90-1.el7.elrepo
>>
>> Complete!
>>
>> Well, no it's not complete, and it's trying to install in the *previous*
>> kernel, not the running one.
>>
>
> kmod packages are a special class of package on RHEL that take advantage
> of the stable kernel ABI in Red Hat Enterprise Linux. When a kmod
> package is compiled against a kernel, the kernel module will be
> installed for that kernel and the weak-modules script will then weak
> link the module against all other kABI-compatible kernels installed on
> the system. This means that you do not need to rebuild the kernel module
> for each and every kernel update (or worse, delay updating your kernel
> whilst you wait for me to rebuild the module for you).

Ok. I had thought it did.
>
> So yes, the module will likely be installed against a previous kernel,
> and maybe one that isn't even installed on your system. But it will weak
> link against your current kernel(s) providing none of the kernel symbols
> used by the module have changed between the kernel the module was built
> against and the current kernel in question. If you don't understand,
> just think of it as magic and be grateful you are running an Enterprise
> Linux kernel and not a fedora kernel.
>
> As to the earlier error messages, have you been playing with depmod?
> Where is your modules.dep for your installed kernels? Anyway, the magic
> described above has likely not worked correctly due to missing
> modules.dep, so I would uninstall the nvidia packages, sort out your
> kernel(s) / depmod information and try again once you have a sane system.
>
Odd. The original kernel is installed, so I don't know why modules.dep
wasn't there. I haven't had to run depmod before.

Btw, about your previous email: nvidia-detect tells me to use kmod-nvidia
for the  K20c. When I go to the elrepo page about it, and follow the link,
for the 340, I don't see it supporting them, but the non-legacy does.

      mark