[CentOS] Crash and automatical reboot when using the NVIDIA card

Panruo Wu armiuswu at gmail.com
Fri Nov 22 19:36:56 UTC 2013


David McGiven <davidmcgivenn at ...> writes:

> 
> Hello there,
> 
> I'm running a Supermicro server with the latest CentOS 6.4 versions (kernel
> : 2.6.32-358.23.2.el6.x86_64) and the latest nvidia driver (331.20).
> 
> A few minutes after using the GPU for doing some HPC calculations, the
> server crashes and reboots itself. This is happening every time. I know it
> will be rebooted but I don't know when. Sometimes it's 20 minutes after
> starting using it. Sometimes it's 2 hours.
> 
> If I unplug the GPU card and put some stress on the server, it works ok. So
> I suspect there's a bug in the kernel/nvidia driver.
> 
> I can't find any messages on /var/log/messages.
> 
> What should I do ? Should I file a bug on the centos bugtracking system ?
> Is there anyway I can gather more information ? The server is in a remote
> location so I have a hard time accessing the console.
> 
> Thanks.
> 


Hi there,

I also have the same problem with all my 4 Supermicro machines. I don't
know why it happens but nvidia driver seems to be blamed for me. 
I'm using CentOS 6.3 and nVidia driver version 304.54 or 319.37.


Best,
Panruo





More information about the CentOS mailing list