Greetings everyone.
I am experiencing the kworker issue where a kworker is using 100% of a given core.
This behavior was only observed after having installed a new water cooler, to replace a cooler that had died. (Sealed closed loop device from Corsair)
The processor is staying around 40 degrees celsius as reported by the MB UEFI interface. (ksysguard shows between 29 degrees and 30 degrees, when running KDE 4)
I am unable to isolate a cause, or resolution for getting the kworker usage back to normal.
Very basic description of specs are as follows:
AMD FX-8350 (not overclocked)
32GB Corsair memory (1866Mhz non-overclocked)
Sabortooth 990FX MB
I have checked the locations for interrupts which has been indicated online, however, I do not have any specific interrupts that are showing as even having a high integer value to unset.
Here is the result of running the referenced command to view interrupts.
[sysadmin@imaginationland ~]$ grep . -r /sys/firmware/acpi/interrupts/ /sys/firmware/acpi/interrupts/sci: 0 /sys/firmware/acpi/interrupts/error: 0 /sys/firmware/acpi/interrupts/gpe00: 0 invalid /sys/firmware/acpi/interrupts/gpe01: 0 invalid /sys/firmware/acpi/interrupts/gpe02: 0 invalid /sys/firmware/acpi/interrupts/gpe03: 0 disabled /sys/firmware/acpi/interrupts/gpe04: 0 disabled /sys/firmware/acpi/interrupts/gpe05: 0 enabled /sys/firmware/acpi/interrupts/gpe06: 0 invalid /sys/firmware/acpi/interrupts/gpe07: 0 invalid /sys/firmware/acpi/interrupts/gpe08: 0 invalid /sys/firmware/acpi/interrupts/gpe09: 0 invalid /sys/firmware/acpi/interrupts/gpe10: 0 disabled /sys/firmware/acpi/interrupts/gpe11: 0 disabled /sys/firmware/acpi/interrupts/gpe12: 0 disabled /sys/firmware/acpi/interrupts/gpe13: 0 invalid /sys/firmware/acpi/interrupts/gpe14: 0 invalid /sys/firmware/acpi/interrupts/gpe15: 0 invalid /sys/firmware/acpi/interrupts/gpe16: 0 enabled /sys/firmware/acpi/interrupts/gpe0A: 0 enabled /sys/firmware/acpi/interrupts/gpe17: 0 invalid /sys/firmware/acpi/interrupts/gpe0B: 0 disabled /sys/firmware/acpi/interrupts/gpe18: 0 disabled /sys/firmware/acpi/interrupts/gpe0C: 0 invalid /sys/firmware/acpi/interrupts/gpe19: 0 invalid /sys/firmware/acpi/interrupts/gpe0D: 0 invalid /sys/firmware/acpi/interrupts/gpe0E: 0 invalid /sys/firmware/acpi/interrupts/gpe0F: 0 disabled /sys/firmware/acpi/interrupts/gpe1A: 0 invalid /sys/firmware/acpi/interrupts/gpe1B: 0 disabled /sys/firmware/acpi/interrupts/gpe1C: 0 invalid /sys/firmware/acpi/interrupts/gpe1D: 0 invalid /sys/firmware/acpi/interrupts/gpe1E: 0 invalid /sys/firmware/acpi/interrupts/gpe1F: 0 invalid /sys/firmware/acpi/interrupts/sci_not: 0 /sys/firmware/acpi/interrupts/ff_pmtimer: 0 invalid /sys/firmware/acpi/interrupts/ff_rt_clk: 0 disabled /sys/firmware/acpi/interrupts/gpe_all: 0 /sys/firmware/acpi/interrupts/ff_gbl_lock: 0 enabled /sys/firmware/acpi/interrupts/ff_pwr_btn: 0 enabled /sys/firmware/acpi/interrupts/ff_slp_btn: 0 invalid
Respectfully,
Martes G Wigglesworth
Greetings again everyone.
Can someone please respond to this?
I am not sure what is causing this issue, however, others seem to have experienced it in the past.
Could this be caused by the processor being damaged?
Nothing has changed on the system, except for the heat warning shutoff when the water cooler died, and the system was not restarted until the new one was installed.
Any input would be appreciated, however, no response at all is just disappointing.
Thanks for any help that can be provided.
Respectfully,
Martes G Wigglesworth
From: "Martes" mailinglistmember@mgwigglesworth.net To: "centos" centos@centos.org Sent: Sunday, April 19, 2015 4:46:12 PM Subject: kworker 100% of single core on mulit-core processor usage inquiry
Greetings everyone.
I am experiencing the kworker issue where a kworker is using 100% of a given core.
This behavior was only observed after having installed a new water cooler, to replace a cooler that had died. (Sealed closed loop device from Corsair)
The processor is staying around 40 degrees celsius as reported by the MB UEFI interface. (ksysguard shows between 29 degrees and 30 degrees, when running KDE 4)
I am unable to isolate a cause, or resolution for getting the kworker usage back to normal.
Very basic description of specs are as follows:
AMD FX-8350 (not overclocked)
32GB Corsair memory (1866Mhz non-overclocked)
Sabortooth 990FX MB
I have checked the locations for interrupts which has been indicated online, however, I do not have any specific interrupts that are showing as even having a high integer value to unset.
Here is the result of running the referenced command to view interrupts.
[sysadmin@imaginationland ~]$ grep . -r /sys/firmware/acpi/interrupts/ /sys/firmware/acpi/interrupts/sci: 0 /sys/firmware/acpi/interrupts/error: 0 /sys/firmware/acpi/interrupts/gpe00: 0 invalid /sys/firmware/acpi/interrupts/gpe01: 0 invalid /sys/firmware/acpi/interrupts/gpe02: 0 invalid /sys/firmware/acpi/interrupts/gpe03: 0 disabled /sys/firmware/acpi/interrupts/gpe04: 0 disabled /sys/firmware/acpi/interrupts/gpe05: 0 enabled /sys/firmware/acpi/interrupts/gpe06: 0 invalid /sys/firmware/acpi/interrupts/gpe07: 0 invalid /sys/firmware/acpi/interrupts/gpe08: 0 invalid /sys/firmware/acpi/interrupts/gpe09: 0 invalid /sys/firmware/acpi/interrupts/gpe10: 0 disabled /sys/firmware/acpi/interrupts/gpe11: 0 disabled /sys/firmware/acpi/interrupts/gpe12: 0 disabled /sys/firmware/acpi/interrupts/gpe13: 0 invalid /sys/firmware/acpi/interrupts/gpe14: 0 invalid /sys/firmware/acpi/interrupts/gpe15: 0 invalid /sys/firmware/acpi/interrupts/gpe16: 0 enabled /sys/firmware/acpi/interrupts/gpe0A: 0 enabled /sys/firmware/acpi/interrupts/gpe17: 0 invalid /sys/firmware/acpi/interrupts/gpe0B: 0 disabled /sys/firmware/acpi/interrupts/gpe18: 0 disabled /sys/firmware/acpi/interrupts/gpe0C: 0 invalid /sys/firmware/acpi/interrupts/gpe19: 0 invalid /sys/firmware/acpi/interrupts/gpe0D: 0 invalid /sys/firmware/acpi/interrupts/gpe0E: 0 invalid /sys/firmware/acpi/interrupts/gpe0F: 0 disabled /sys/firmware/acpi/interrupts/gpe1A: 0 invalid /sys/firmware/acpi/interrupts/gpe1B: 0 disabled /sys/firmware/acpi/interrupts/gpe1C: 0 invalid /sys/firmware/acpi/interrupts/gpe1D: 0 invalid /sys/firmware/acpi/interrupts/gpe1E: 0 invalid /sys/firmware/acpi/interrupts/gpe1F: 0 invalid /sys/firmware/acpi/interrupts/sci_not: 0 /sys/firmware/acpi/interrupts/ff_pmtimer: 0 invalid /sys/firmware/acpi/interrupts/ff_rt_clk: 0 disabled /sys/firmware/acpi/interrupts/gpe_all: 0 /sys/firmware/acpi/interrupts/ff_gbl_lock: 0 enabled /sys/firmware/acpi/interrupts/ff_pwr_btn: 0 enabled /sys/firmware/acpi/interrupts/ff_slp_btn: 0 invalid
Respectfully,
Martes G Wigglesworth
On Mon, Apr 20, 2015 at 09:00:07AM -0400, Martes wrote:
Greetings again everyone.
Can someone please respond to this?
You just posted yesterday...
Anyway, you didn't give much context. I can see from a google search why you provided the interrupts output, but judging from what you posted, that's not the problem.
Perhaps you have a CPU task that's running at 100%? Did you try and run 'top'?
Also, what version of CentOS are you running? Are you running the latest kernel?
Greetings Johnathan.
Thank you for the reply.
I have run top, and iftop, etc...
The only process that is listed is as follows:
Tasks: 272 total, 2 running, 270 sleeping, 0 stopped, 0 zombie %Cpu(s): 7.1 us, 18.3 sy, 0.0 ni, 73.8 id, 0.7 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem : 32679644 total, 402520 free, 9889728 used, 22387396 buff/cache KiB Swap: 0 total, 0 free, 0 used. 21951812 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 91 root 20 0 0 0 0 R 94.1 0.0 1540:35 kworker/7:1
This is the only worker that is currently taking up 100% of whatever core it is on. (Right now it is assigned core number 8)
My kernel is version 3.10.0-229.1.2.el7.x86_64.
I don't really have very much context to provide other than it wasn't a problem prior to restarting after I installed the new liquid cooler.
If I can provide any further analysis, or information for a more clear view of the issue, then let me know.
Thanks for getting back a reply.
Respectfully,
Martes G Wigglesworth
On Mon, Apr 20, 2015 at 11:03:43AM -0400, Martes wrote:
Tasks: 272 total, 2 running, 270 sleeping, 0 stopped, 0 zombie %Cpu(s): 7.1 us, 18.3 sy, 0.0 ni, 73.8 id, 0.7 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem : 32679644 total, 402520 free, 9889728 used, 22387396 buff/cache KiB Swap: 0 total, 0 free, 0 used. 21951812 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 91 root 20 0 0 0 0 R 94.1 0.0 1540:35 kworker/7:1
I suggest trying out the suggestions listed here: https://lkml.org/lkml/2011/3/31/68
In particular, if you look at the PID of the kworker using nearly 100% of a core, you could run: (using the above output from top)
# cat /proc/91/stack
And you'll see what function(s) it is running. This will give you an idea of what task it is spinning on.
Greetings everyone.
I ran the cat on the proc files for the reference pid before I sent the first email, however here it is, since I never listed the contents.
sudo cat /proc/91/stack [<ffffffffffffffff>] 0xffffffffffffffff
And when following the information from the closed bugzilla debug info, I get the following:
cat /sys/kernel/debug/tracing/trace_pipe |grep cpu=7 kworker/7:1-91 [007] d.s. 211349.026006: workqueue_queue_work: work struct=ffff88083edcfff0 function=cs_dbs_timer workqueue=ffff88081e008c00 req_cpu=7 cpu=7 kworker/7:1-91 [007] d... 211349.123123: workqueue_queue_work: work struct=ffff88080e2ba2f8 function=nouveau_connector_hotplug_work [nouveau] workqueue=ffff88081e008c00 req_cpu=5120 cpu=7 kworker/7:1-91 [007] d.h. 211349.123156: workqueue_queue_work: work struct=ffff88080e30e448 function=nvkm_connector_hpd_work [nouveau] workqueue=ffff88081e008c00 req_cpu=5120 cpu=7
These messages are cyclically printed to the trace_pipe file, in this order.
I figured that this may be sourced within the fact that the nouveau driver is being used, instead of the NVidia drivers I was using last fall while doing CUDA research.
I had a Titan in there, and have reverted back to a more basic card, just prior to the water cooler dying.
I re-installed the newest NVidia driver from nvidia and the kworker issue disappears.
However, true to Linux form, this action has somehow screwed up my GUI experience, causing 1) the graphic window manager fails to start, and just sits there in the text based startup screan saying it is trying to start GNOME Login Manager, etc.. 2) If I remove GNOME and reinstall KDE you boot into a text based login, and once in there you can startx and KDE will start, but as soon as I try to open firefox the GUI dies, and the messages say something about a ksmserver not being accessible, etc...
However, with KDE running, I have full access to the NVidia management tools, so it was working, with HDMI audio on the second monitor, etc...
So now I have to fix my GUI desktop environment, for whatever reason, however, all basic services are running such as the VMs and DNS, etc...
Respectfully,
Martes G Wigglesworth