Tru Huynh wrote: > On Wed, Mar 26, 2014 at 09:40:17AM -0400, m.roth at 5-cent.us wrote: >> Johnny Hughes wrote: > ... >> > Are you connecting to the server to do X related things remotely ... >> > and therefore need NVIDIA drivers for that? >> > >> I think you missed that part of my original post: no X. This box has two >> Tesla GPUs, and my users are using them for heavy duty scientific >> computing.... > > afaik, in order to use your Tesla cards, you need to have the nvidia > driver loaded, but ymmv. > I am aware of that. Here's the latest in my fight: I got one server, which I'd updated but not rebooted, and it's still on 358-18. I yum downgraded kmod-nvidia and nvidia-x11-drv to what it had been running, 325.15-1, and it's happy as a clam (after I reloaded the nvidia driver). BUT, I note that modinfo shows /lib/modules/2.6.32-358.18.1.el6.x86_64/weak-updates/nvidia/nvid, which is a link to /lib/modules/2.6.32-358.el6.x86_64/extra/nvidia/nvidia.ko. I find this... odd. Now, running the new 431.5.1 kernel on the other server, the one that's been rebooted, and I'm still fighting, I did the same... and see /lib/modules/2.6.32-431.5.1.el6.x86_64/weak-updates/nvidia/nvidia.ko -> /lib/modules/2.6.32-358.el6.x86_64/extra/nvidia/nvidia.ko THAT does not look right at all. dmesg shows NVRM: loading NVIDIA UNIX x86_64 Kernel Module 325.15 Wed Jul 31 18:50:56 PDT 2013 nvidia 0000:05:00.0: irq 113 for MSI/MSI-X NVRM: RmInitAdapter failed! (0x25:0x48:1157) NVRM: rm_init_adapter(0) failed nvidia 0000:05:00.0: irq 113 for MSI/MSI-X NVRM: RmInitAdapter failed! (0x25:0x48:1157) NVRM: rm_init_adapter(0) failed At least I've got one back. As a last resort, I can reboot to the older kernel and see if that works with this version of kmod-nvidia, but I'd *REALLY* like to have the new kernel mark