On 03/09/2016 08:15 PM, Sarah Newman wrote:
I've been running 3.18.25-18.el6.x86_64 + our build of xen 4.4.3-9 on one host for the last couple of weeks and have gotten several soft lockups within the last 24 hours. I am posting here first in case anyone else has experienced the same issue.
Here is mpstat from around the time of the issue:
0:08:56 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 10:09:10 PM all 0.00 0.00 66.67 0.00 0.00 33.33 0.00 0.00 0.00 10:09:11 PM all 2.17 0.00 5.43 32.61 0.00 58.70 1.09 0.00 0.00 10:09:12 PM all 0.00 0.00 1.15 0.00 0.00 85.06 0.00 0.00 13.79 10:09:13 PM all 0.00 0.00 1.08 0.00 0.00 83.87 0.00 0.00 15.05 10:09:14 PM all 0.00 0.00 1.10 0.00 0.00 83.52 0.00 0.00 15.38 10:09:15 PM all 1.09 0.00 1.09 0.00 0.00 85.87 0.00 0.00 11.96 10:09:51 PM all 0.00 0.00 1.09 0.00 0.00 84.78 1.09 0.00 13.04 Message from syslogd at Mar 9 22:09:51 ... kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0] 10:10:02 PM all 0.00 0.00 33.33 50.00 0.00 16.67 0.00 0.00 0.00 10:10:03 PM all 3.16 0.00 10.53 8.42 0.00 2.11 1.05 0.00 74.74 10:10:04 PM all 0.00 0.00 3.23 38.71 0.00 1.08 1.08 0.00 55.91 10:10:05 PM all 0.00 0.00 4.30 11.83 0.00 3.23 1.08 0.00 79.57
Typical load:
10:22:15 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 10:22:16 PM all 0.00 0.00 1.02 0.00 0.00 1.02 0.00 0.00 97.96 10:22:17 PM all 0.00 0.00 0.00 0.00 0.00 0.00 1.04 0.00 98.96 10:22:18 PM all 0.00 0.00 0.00 0.00 0.00 1.01 1.01 0.00 97.98 10:22:19 PM all 0.00 0.00 1.01 0.00 0.00 1.01 0.00 0.00 97.98 10:22:20 PM all 0.00 0.00 0.00 0.00 0.00 0.00 1.02 0.00 98.98 10:22:21 PM all 0.00 0.00 1.02 0.00 0.00 1.02 0.00 0.00 97.96 10:22:22 PM all 0.00 0.00 0.00 0.00 0.00 1.01 1.01 0.00 97.98
I reverted to an older kernel since the older kernel had run for a couple of months without issues.