[CentOS] NVidia, again

Wed Mar 26 13:40:17 UTC 2014
m.roth at 5-cent.us <m.roth at 5-cent.us>

Johnny Hughes wrote:
> On 03/26/2014 08:14 AM, m.roth at 5-cent.us wrote:
>> Johnny Hughes wrote:
>>> On 03/26/2014 07:01 AM, mark wrote:
>>>> On 03/26/14 03:01, Johnny Hughes wrote:
>>>>> On 03/25/2014 04:36 PM, m.roth at 5-cent.us wrote:
>>>>>> Got a HBS (y'know, Honkin' Big Server, one o' them technical terms),
>>>>>> a Dell 720 with two Tesla GPUs. I updated the o/s, 6.5, and I cannot
>>>>>> get the GPUs recognized. As a last resort, I d/l NVidia's proprietary
>>>>>> driver/installer, 325, and it builds fine... I've yum removed the
>>>>>> kmod-nvidia I had on the system, nouveau is blacklisted, and when I
>>>>>> reboot, lsmod shows me nvidia loaded, which modinfo tells me looks
>>>>>> like the one I built.... but enum_gpu, which is from a CUDA group,
>>>>>> builds... but can't enumerate the GPUs (how we wake them up for the
>> users). I
>>>>>> see the /dev/nvidia*, and they're a+r, a+w.... Oh, and selinux is
>>>>>> permissive.
>>>>>>
>>>>>> Anyone got a clue? If I can't get this working, I'm going to have to
>>>>>> downgrade the system several kernels.
>>>>> Do you have an /etc/X11/xorg.conf file or something in
>>>>> /etc/X11/xorg.conf.d/ that actually name nvidia and not nv as the
>>>>> driver?
>>>> Nope - nothing there.
>>> When you run the ./NVIDIA<version> command to build the driver, one of
>>> the last steps is to have it "automatically update your configuration
>>> file" .. select yes for that and it should create an xorg.conf file
>>> that
>>> will use the nvidia driver.
>> a) I didn't have that before - did kmod-nvidia handle loading the
>> correct
>> one *without* an
>>     xorg.conf?
>> b) Do you think it'll do the right thing - this *is* a headless server.
>>
>> And a general question: what *does* kmod-nvidia do - is it different
>> than, say, setting up a flag, or a script to notice that you're booting
a new
>> kernel, and run the proprietary installer -a -s?
>
> Are you connecting to the server to do X related things remotely ... and
> therefore need NVIDIA drivers for that?
>
I think you missed that part of my original post: no X. This box has two
Tesla GPUs, and my users are using them for heavy duty scientific
computing.... And my problem is that neither their programs, nor the
utility I use (I *think* it that it seems to be part of the CUDA toolkit -
I didn't set that part up) can enumerate them... meaning that they can't
see or use the GPUs.

> I'll let one of the elrepo guys explain their RPM.

Fair 'nough. I just threw that out as a general question, not expecting
that was yours to answer.

       mark