I have a CentOS 6 system that has suddenly starting significantly slowing down. It runs a django app with an Apache server and MySQL server. There is plenty of disk space and no process seems to be hogging the memory or CPU. But operations that used to take 5 minutes are now taking hours and hours.
Coinciding with this slow down I see these messages in /var/log/messages (this is hand typed, as I can't copy/paste or send email from this system, although for a change I do have remote access via 2 VPN hops):
kernel: BUG: soft lockup - CPU#5 stuck for 67s! [khungtaskd] kernel: BUG: soft lockup - CPU#7 stuck for 67s! [khugepaged]
These messages started appearing today around 7 AM, which is when users started reporting the slowdown. They are still occurring periodically and the system is still slow. Are these messages caused by the slowdown or are they the reason for it? Are they useful in tracking down the problem?
kernel: BUG: soft lockup - CPU#5 stuck for 67s! [khungtaskd] kernel: BUG: soft lockup - CPU#7 stuck for 67s! [khugepaged]
These messages started appearing today around 7 AM, which is when users started reporting the slowdown. They are still occurring periodically and the system is still slow. Are these messages caused by the slowdown or are they the reason for it? Are they useful in tracking down the problem?
They are what is causing it to appear to run slow.
The soft lockup is when a CPU is spending a lot of time in kernel mode. The cause can be a number of things, including overloading or a bug in the kernel - or sometimes a hardware issue.
Are you seeing any other possibly hardware related errors in the logs?
Do you have the 'sensors' package installed and configured - is everything OK? (An overclocked CPU or a low voltage/bad PSU can cause this sort of thing.)
Is the system heavily loaded when you get these errors? Not just with processes, but memory as well. (One of the hung processes is a kernel paging process.)
P.
On Thu, May 18, 2017 at 1:04 PM, Pete Biggs pete@biggs.org.uk wrote:
kernel: BUG: soft lockup - CPU#5 stuck for 67s! [khungtaskd] kernel: BUG: soft lockup - CPU#7 stuck for 67s! [khugepaged]
These messages started appearing today around 7 AM, which is when users started reporting the slowdown. They are still occurring periodically and the system is still slow. Are these messages caused by the slowdown or are they the reason for it? Are they useful in tracking down the problem?
They are what is causing it to appear to run slow.
The soft lockup is when a CPU is spending a lot of time in kernel mode. The cause can be a number of things, including overloading or a bug in the kernel - or sometimes a hardware issue.
Are you seeing any other possibly hardware related errors in the logs?
Do you have the 'sensors' package installed and configured - is everything OK? (An overclocked CPU or a low voltage/bad PSU can cause this sort of thing.)
The IMM for the system is reporting a warning on the power system "Redundancy Degraded for power unit has asserted" Going to contact the HW vendor about this.
Is the system heavily loaded when you get these errors? Not just with processes, but memory as well. (One of the hung processes is a kernel paging process.)
Well it gets heavily loaded because everything takes longer then it should so everything backs up.
Thanks!!
On Thu, May 18, 2017 at 1:04 PM, Pete Biggs pete@biggs.org.uk wrote:
kernel: BUG: soft lockup - CPU#5 stuck for 67s! [khungtaskd] kernel: BUG: soft lockup - CPU#7 stuck for 67s! [khugepaged]
These messages started appearing today around 7 AM, which is when users started reporting the slowdown. They are still occurring periodically and the system is still slow. Are these messages caused by the slowdown or are they the reason for it? Are they useful in tracking down the problem?
They are what is causing it to appear to run slow.
The soft lockup is when a CPU is spending a lot of time in kernel mode. The cause can be a number of things, including overloading or a bug in the kernel - or sometimes a hardware issue.
Are you seeing any other possibly hardware related errors in the logs?
Do you have the 'sensors' package installed and configured - is everything OK? (An overclocked CPU or a low voltage/bad PSU can cause this sort of thing.)
Sorry for the delay in responding, but that system was down until today for other reasons. Indeed it was a bad power supply that was making it run slow. Never would have guessed that. Thanks so much for the suggestion. Had it replaced and it's running fine now.