On 03/10/2016 12:05 AM, Sarah Newman wrote: > On 03/09/2016 08:15 PM, Sarah Newman wrote: >> I've been running 3.18.25-18.el6.x86_64 + our build of xen 4.4.3-9 on one host for the last couple of weeks and have gotten several soft lockups >> within the last 24 hours. I am posting here first in case anyone else has experienced the same issue. >> > > Here is mpstat from around the time of the issue: > > 0:08:56 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle > 10:09:10 PM all 0.00 0.00 66.67 0.00 0.00 33.33 0.00 0.00 0.00 > 10:09:11 PM all 2.17 0.00 5.43 32.61 0.00 58.70 1.09 0.00 0.00 > 10:09:12 PM all 0.00 0.00 1.15 0.00 0.00 85.06 0.00 0.00 13.79 > 10:09:13 PM all 0.00 0.00 1.08 0.00 0.00 83.87 0.00 0.00 15.05 > 10:09:14 PM all 0.00 0.00 1.10 0.00 0.00 83.52 0.00 0.00 15.38 > 10:09:15 PM all 1.09 0.00 1.09 0.00 0.00 85.87 0.00 0.00 11.96 > 10:09:51 PM all 0.00 0.00 1.09 0.00 0.00 84.78 1.09 0.00 13.04 > Message from syslogd at Mar 9 22:09:51 ... > kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0] > 10:10:02 PM all 0.00 0.00 33.33 50.00 0.00 16.67 0.00 0.00 0.00 > 10:10:03 PM all 3.16 0.00 10.53 8.42 0.00 2.11 1.05 0.00 74.74 > 10:10:04 PM all 0.00 0.00 3.23 38.71 0.00 1.08 1.08 0.00 55.91 > 10:10:05 PM all 0.00 0.00 4.30 11.83 0.00 3.23 1.08 0.00 79.57 > > Typical load: > > 10:22:15 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle > 10:22:16 PM all 0.00 0.00 1.02 0.00 0.00 1.02 0.00 0.00 97.96 > 10:22:17 PM all 0.00 0.00 0.00 0.00 0.00 0.00 1.04 0.00 98.96 > 10:22:18 PM all 0.00 0.00 0.00 0.00 0.00 1.01 1.01 0.00 97.98 > 10:22:19 PM all 0.00 0.00 1.01 0.00 0.00 1.01 0.00 0.00 97.98 > 10:22:20 PM all 0.00 0.00 0.00 0.00 0.00 0.00 1.02 0.00 98.98 > 10:22:21 PM all 0.00 0.00 1.02 0.00 0.00 1.02 0.00 0.00 97.96 > 10:22:22 PM all 0.00 0.00 0.00 0.00 0.00 1.01 1.01 0.00 97.98 > > > I reverted to an older kernel since the older kernel had run for a couple of months without issues. This did not fix it. I isolated the issue to a vif rate limit of 100Mb/s being applied to one of the guests and am now able to reproduce on a different machine. I will look into whether this has been fixed already; if so I will submit a pull request for the Xen4CentOS kernel and if not I will take it up with the xen-devel list.