[CentOS-virt] Soft lockups with Xen4CentOS 3.18.25-18.el6.x86_64
Sarah Newman
srn at prgmr.com
Sat Mar 12 23:47:16 UTC 2016
On 03/10/2016 12:05 AM, Sarah Newman wrote:
> On 03/09/2016 08:15 PM, Sarah Newman wrote:
>> I've been running 3.18.25-18.el6.x86_64 + our build of xen 4.4.3-9 on one host for the last couple of weeks and have gotten several soft lockups
>> within the last 24 hours. I am posting here first in case anyone else has experienced the same issue.
>>
>
> Here is mpstat from around the time of the issue:
>
> 0:08:56 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
> 10:09:10 PM all 0.00 0.00 66.67 0.00 0.00 33.33 0.00 0.00 0.00
> 10:09:11 PM all 2.17 0.00 5.43 32.61 0.00 58.70 1.09 0.00 0.00
> 10:09:12 PM all 0.00 0.00 1.15 0.00 0.00 85.06 0.00 0.00 13.79
> 10:09:13 PM all 0.00 0.00 1.08 0.00 0.00 83.87 0.00 0.00 15.05
> 10:09:14 PM all 0.00 0.00 1.10 0.00 0.00 83.52 0.00 0.00 15.38
> 10:09:15 PM all 1.09 0.00 1.09 0.00 0.00 85.87 0.00 0.00 11.96
> 10:09:51 PM all 0.00 0.00 1.09 0.00 0.00 84.78 1.09 0.00 13.04
> Message from syslogd at Mar 9 22:09:51 ...
> kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
> 10:10:02 PM all 0.00 0.00 33.33 50.00 0.00 16.67 0.00 0.00 0.00
> 10:10:03 PM all 3.16 0.00 10.53 8.42 0.00 2.11 1.05 0.00 74.74
> 10:10:04 PM all 0.00 0.00 3.23 38.71 0.00 1.08 1.08 0.00 55.91
> 10:10:05 PM all 0.00 0.00 4.30 11.83 0.00 3.23 1.08 0.00 79.57
>
> Typical load:
>
> 10:22:15 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
> 10:22:16 PM all 0.00 0.00 1.02 0.00 0.00 1.02 0.00 0.00 97.96
> 10:22:17 PM all 0.00 0.00 0.00 0.00 0.00 0.00 1.04 0.00 98.96
> 10:22:18 PM all 0.00 0.00 0.00 0.00 0.00 1.01 1.01 0.00 97.98
> 10:22:19 PM all 0.00 0.00 1.01 0.00 0.00 1.01 0.00 0.00 97.98
> 10:22:20 PM all 0.00 0.00 0.00 0.00 0.00 0.00 1.02 0.00 98.98
> 10:22:21 PM all 0.00 0.00 1.02 0.00 0.00 1.02 0.00 0.00 97.96
> 10:22:22 PM all 0.00 0.00 0.00 0.00 0.00 1.01 1.01 0.00 97.98
>
>
> I reverted to an older kernel since the older kernel had run for a couple of months without issues.
This did not fix it. I isolated the issue to a vif rate limit of 100Mb/s being applied to one of the guests and am now able to reproduce on a
different machine.
I will look into whether this has been fixed already; if so I will submit a pull request for the Xen4CentOS kernel and if not I will take it up with
the xen-devel list.
More information about the CentOS-virt
mailing list