[CentOS] Semi-OT: hardware: NVidia proprietary driver, C7.4

Wed Sep 27 20:59:10 UTC 2017
Phil Perry <pperry at elrepo.org>

On 27/09/17 20:24, m.roth at 5-cent.us wrote:
> Phil Perry wrote:
>> On 27/09/17 16:49, m.roth at 5-cent.us wrote:
>>> Hi, folks,
>>>
>>>      Well, still more fun (for values of fun approaching zero):
>>>
>>>      1. Went to install CUDA 9.0... well, gee, there is *no* CUDA 9.0.
>>>           Even though I installed the 9 repo, all that I get is 8. I've
>>>           used their webform, and an waiting on a reply.
>>>      2. I remove all nvidia packages.
>>>      3. It appears that the kmod-nvidia is what I need; that's what
>>>           nvidia-detect says. So I try to install... bzzt, thank you
>>>           for playing.
>>>
>>>         a: uname -a:  3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12
>>> 22:26:13
>>> UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>>>         b:
>>>     Installing : kmod-nvidia-384.90-1.el7_4.elrepo.x86_64
>>>    1/2
>>>
>>> Broadcast message from systemd-journald at lyon.cit.nih.gov (Wed 2017-09-27
>>> 11:43:12 EDT):
>>>
>>> dracut[32409]: /lib/modules/3.10.0-693.el7.x86_64//modules.dep is
>>> missing.
>>> Did you run depmod?
>>>
>>>
>>> Message from syslogd at lyon at Sep 27 11:43:12 ...
>>>    dracut:/lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did
>>> you run depmod?
>>>
>>> Message from syslogd at lyon at Sep 27 11:43:12 ...
>>>    dracut: /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing.
>>> Did
>>> you run depmod?
>>> Working. This may take some time ...
>>> /lib/modules/3.10.0-693.el7.x86_64//modules.dep is missing. Did you run
>>> depmod?
>>> /sbin/weak-modules: line 116: /boot/initramfs-3.10.0-693.el7.x86_64.tmp:
>>> No such file or directory
>>> /sbin/weak-modules: line 132: /boot/initramfs-3.10.0-693.el7.x86_64.tmp:
>>> No such file or directory
>>> /sbin/weak-modules: line 137: /boot/initramfs-3.10.0-693.el7.x86_64.tmp:
>>> No such file or directory
>>> Unable to decompress /boot/initramfs-3.10.0-693.el7.x86_64.tmp: Unknown
>>> format
>>> /sbin/weak-modules: line 175:
>>> /tmp/weak-modules.oC1A7x/new_initramfs.img:
>>> No such file or directory
>>> rm: cannot remove '/tmp/weak-modules.oC1A7x/new_initramfs.img': No such
>>> file or directory
>>> mv: cannot stat '/boot/initramfs-3.10.0-693.el7.x86_64.tmp': No such
>>> file
>>> or directory
>>> Done.
>>>     Installing : nvidia-x11-drv-384.90-1.el7.elrepo.x86_64
>>>    2/2
>>> etckeeper: post transaction commit
>>>     Verifying  : kmod-nvidia-384.90-1.el7_4.elrepo.x86_64
>>>    1/2
>>>     Verifying  : nvidia-x11-drv-384.90-1.el7.elrepo.x86_64
>>>    2/2
>>>
>>> Installed:
>>>     kmod-nvidia.x86_64 0:384.90-1.el7_4.elrepo
>>>
>>> Dependency Installed:
>>>     nvidia-x11-drv.x86_64 0:384.90-1.el7.elrepo
>>>
>>> Complete!
>>>
>>> Well, no it's not complete, and it's trying to install in the *previous*
>>> kernel, not the running one.
>>>
>>
>> kmod packages are a special class of package on RHEL that take advantage
>> of the stable kernel ABI in Red Hat Enterprise Linux. When a kmod
>> package is compiled against a kernel, the kernel module will be
>> installed for that kernel and the weak-modules script will then weak
>> link the module against all other kABI-compatible kernels installed on
>> the system. This means that you do not need to rebuild the kernel module
>> for each and every kernel update (or worse, delay updating your kernel
>> whilst you wait for me to rebuild the module for you).
> 
> Ok. I had thought it did.
>>
>> So yes, the module will likely be installed against a previous kernel,
>> and maybe one that isn't even installed on your system. But it will weak
>> link against your current kernel(s) providing none of the kernel symbols
>> used by the module have changed between the kernel the module was built
>> against and the current kernel in question. If you don't understand,
>> just think of it as magic and be grateful you are running an Enterprise
>> Linux kernel and not a fedora kernel.
>>
>> As to the earlier error messages, have you been playing with depmod?
>> Where is your modules.dep for your installed kernels? Anyway, the magic
>> described above has likely not worked correctly due to missing
>> modules.dep, so I would uninstall the nvidia packages, sort out your
>> kernel(s) / depmod information and try again once you have a sane system.
>>
> Odd. The original kernel is installed, so I don't know why modules.dep
> wasn't there. I haven't had to run depmod before.
> 
> Btw, about your previous email: nvidia-detect tells me to use kmod-nvidia
> for the  K20c. When I go to the elrepo page about it, and follow the link,
> for the 340, I don't see it supporting them, but the non-legacy does.
> 
>        mark
> 

I would trust what nvidia-detect tells you. It is based on the 
definitive information provided by NVIDIA in their docs:

http://us.download.nvidia.com/XFree86/Linux-x86_64/384.90/README/supportedchips.html