[CentOS-virt] Soft lockups with Xen4CentOS 3.18.25-18.el6.x86_64

Thu Mar 10 08:05:59 UTC 2016
Sarah Newman <srn at prgmr.com>

On 03/09/2016 08:15 PM, Sarah Newman wrote:
> I've been running 3.18.25-18.el6.x86_64 + our build of xen 4.4.3-9 on one host for the last couple of weeks and have gotten several soft lockups
> within the last 24 hours. I am posting here first in case anyone else has experienced the same issue.
> 

Here is mpstat from around the time of the issue:

0:08:56 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
10:09:10 PM  all    0.00    0.00   66.67    0.00    0.00   33.33    0.00    0.00    0.00
10:09:11 PM  all    2.17    0.00    5.43   32.61    0.00   58.70    1.09    0.00    0.00
10:09:12 PM  all    0.00    0.00    1.15    0.00    0.00   85.06    0.00    0.00   13.79
10:09:13 PM  all    0.00    0.00    1.08    0.00    0.00   83.87    0.00    0.00   15.05
10:09:14 PM  all    0.00    0.00    1.10    0.00    0.00   83.52    0.00    0.00   15.38
10:09:15 PM  all    1.09    0.00    1.09    0.00    0.00   85.87    0.00    0.00   11.96
10:09:51 PM  all    0.00    0.00    1.09    0.00    0.00   84.78    1.09    0.00   13.04
Message from syslogd at Mar  9 22:09:51 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
10:10:02 PM  all    0.00    0.00   33.33   50.00    0.00   16.67    0.00    0.00    0.00
10:10:03 PM  all    3.16    0.00   10.53    8.42    0.00    2.11    1.05    0.00   74.74
10:10:04 PM  all    0.00    0.00    3.23   38.71    0.00    1.08    1.08    0.00   55.91
10:10:05 PM  all    0.00    0.00    4.30   11.83    0.00    3.23    1.08    0.00   79.57

Typical load:

10:22:15 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
10:22:16 PM  all    0.00    0.00    1.02    0.00    0.00    1.02    0.00    0.00   97.96
10:22:17 PM  all    0.00    0.00    0.00    0.00    0.00    0.00    1.04    0.00   98.96
10:22:18 PM  all    0.00    0.00    0.00    0.00    0.00    1.01    1.01    0.00   97.98
10:22:19 PM  all    0.00    0.00    1.01    0.00    0.00    1.01    0.00    0.00   97.98
10:22:20 PM  all    0.00    0.00    0.00    0.00    0.00    0.00    1.02    0.00   98.98
10:22:21 PM  all    0.00    0.00    1.02    0.00    0.00    1.02    0.00    0.00   97.96
10:22:22 PM  all    0.00    0.00    0.00    0.00    0.00    1.01    1.01    0.00   97.98


I reverted to an older kernel since the older kernel had run for a couple of months without issues.