[CentOS-virt] Soft lockups with Xen4CentOS 3.18.25-18.el6.x86_64
George Dunlap
dunlapg at umich.edu
Tue Mar 15 10:55:52 UTC 2016
On Sat, Mar 12, 2016 at 11:47 PM, Sarah Newman <srn at prgmr.com> wrote:
> On 03/10/2016 12:05 AM, Sarah Newman wrote:
>> On 03/09/2016 08:15 PM, Sarah Newman wrote:
>>> I've been running 3.18.25-18.el6.x86_64 + our build of xen 4.4.3-9 on one host for the last couple of weeks and have gotten several soft lockups
>>> within the last 24 hours. I am posting here first in case anyone else has experienced the same issue.
>>>
>>
>> Here is mpstat from around the time of the issue:
>>
>> 0:08:56 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
>> 10:09:10 PM all 0.00 0.00 66.67 0.00 0.00 33.33 0.00 0.00 0.00
>> 10:09:11 PM all 2.17 0.00 5.43 32.61 0.00 58.70 1.09 0.00 0.00
>> 10:09:12 PM all 0.00 0.00 1.15 0.00 0.00 85.06 0.00 0.00 13.79
>> 10:09:13 PM all 0.00 0.00 1.08 0.00 0.00 83.87 0.00 0.00 15.05
>> 10:09:14 PM all 0.00 0.00 1.10 0.00 0.00 83.52 0.00 0.00 15.38
>> 10:09:15 PM all 1.09 0.00 1.09 0.00 0.00 85.87 0.00 0.00 11.96
>> 10:09:51 PM all 0.00 0.00 1.09 0.00 0.00 84.78 1.09 0.00 13.04
>> Message from syslogd at Mar 9 22:09:51 ...
>> kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
>> 10:10:02 PM all 0.00 0.00 33.33 50.00 0.00 16.67 0.00 0.00 0.00
>> 10:10:03 PM all 3.16 0.00 10.53 8.42 0.00 2.11 1.05 0.00 74.74
>> 10:10:04 PM all 0.00 0.00 3.23 38.71 0.00 1.08 1.08 0.00 55.91
>> 10:10:05 PM all 0.00 0.00 4.30 11.83 0.00 3.23 1.08 0.00 79.57
>>
>> Typical load:
>>
>> 10:22:15 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
>> 10:22:16 PM all 0.00 0.00 1.02 0.00 0.00 1.02 0.00 0.00 97.96
>> 10:22:17 PM all 0.00 0.00 0.00 0.00 0.00 0.00 1.04 0.00 98.96
>> 10:22:18 PM all 0.00 0.00 0.00 0.00 0.00 1.01 1.01 0.00 97.98
>> 10:22:19 PM all 0.00 0.00 1.01 0.00 0.00 1.01 0.00 0.00 97.98
>> 10:22:20 PM all 0.00 0.00 0.00 0.00 0.00 0.00 1.02 0.00 98.98
>> 10:22:21 PM all 0.00 0.00 1.02 0.00 0.00 1.02 0.00 0.00 97.96
>> 10:22:22 PM all 0.00 0.00 0.00 0.00 0.00 1.01 1.01 0.00 97.98
>>
>>
>> I reverted to an older kernel since the older kernel had run for a couple of months without issues.
>
>
> This did not fix it. I isolated the issue to a vif rate limit of 100Mb/s being applied to one of the guests and am now able to reproduce on a
> different machine.
>
> I will look into whether this has been fixed already; if so I will submit a pull request for the Xen4CentOS kernel and if not I will take it up with
> the xen-devel list.
Yes, I was going to suggest posting this to xen-users -- it's not
unlikely someone has already run across this.
-George
More information about the CentOS-virt
mailing list