[CentOS] soft lockup after set multicast_router of bridge and it's port to 2

Wed Jan 10 10:03:51 UTC 2018
wuzhouhui <wuzhouhui14 at mails.ucas.ac.cn>

Never mind, commit 1a040eaca1a2 (bridge: fix multicast router rlist endless loop)
fixes it.

> -----原始邮件-----
> 发件人: wuzhouhui <wuzhouhui14 at mails.ucas.ac.cn>
> 发送时间: 2018-01-10 15:19:09 (星期三)
> 收件人: centos at centos.org
> 抄送: wuzhouhui14 <wuzhouhui14 at mails.ucas.ac.cn>
> 主题: soft lockup after set multicast_router of bridge and it's port to 2
> 
> OS: CentOS 6.5.
> 
> After I set multicast_router of bridge and it's port to 2, like following:
>     echo 2 > /sys/devices/virtual/net/eth81/bridge/multicast_router
>     echo 2 > /sys/devices/virtual/net/bond2/brport/multicast_router
> Then soft lockup occured:
>     Message from syslogd at node-0 at Jan  9 15:47:12 ...
>      kernel:BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
> And the call trace is
>     RIP: 0010:[<ffffffffa04f3608>]  [<ffffffffa04f3608>] br_multicast_flood+0x88/0x140 [bridge]
>     RSP: 0018:ffff88013bc038f0  EFLAGS: 00000246
>     RAX: ffff88404f816020 RBX: ffff88013bc03940 RCX: ffff88204e40a640
>     RDX: ffff882002b9ce01 RSI: ffff882002b9ce80 RDI: 0000000000000000
>     RBP: ffffffff8100bb93 R08: 0000000000000001 R09: 00000000ff09f4a1
>     R10: ffff88202c884070 R11: 0000000000000000 R12: ffff88013bc03870
>     R13: ffff882002b9ce80 R14: ffff88013bc03860 R15: ffffffff8151b225
>     FS:  0000000000000000(0000) GS:ffff88013bc00000(0000) knlGS:0000000000000000
>     CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>     CR2: 00007fa11a942000 CR3: 0000000001a85000 CR4: 00000000001407e0
>     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>     DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>     Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a8d020)
>     Stack:
>      880be7e100028813 ffff882002b9ce80 ffff882002b9ce80 ffffffffa04f3930
>     <d> 00000000880be7e1 ffff882002b9ce80 ffff882002b9ce80 ffff88200286c042
>     <d> ffff88202ae7c6e0 ffff882002b9ceb8 ffff88013bc03950 ffffffffa04f36d5
>     Call Trace:
>      <IRQ> 
>      [<ffffffffa04f3930>] ? __br_forward+0x0/0xd0 [bridge]
>      [<ffffffffa04f36d5>] ? br_multicast_forward+0x15/0x20 [bridge]
>      [<ffffffffa04f4a34>] ? br_handle_frame_finish+0x144/0x2a0 [bridge]
>      [<ffffffffa04fa938>] ? br_nf_pre_routing_finish+0x238/0x350 [bridge]
>      [<ffffffffa04faedb>] ? br_nf_pre_routing+0x48b/0x7b0 [bridge]
>      [<ffffffff8143ba57>] ? __kfree_skb+0x47/0xa0
>      [<ffffffff814734f9>] ? nf_iterate+0x69/0xb0
>      [<ffffffffa04f48f0>] ? br_handle_frame_finish+0x0/0x2a0 [bridge]
>      [<ffffffff814736b6>] ? nf_hook_slow+0x76/0x120
>      [<ffffffffa04f48f0>] ? br_handle_frame_finish+0x0/0x2a0 [bridge]
>      [<ffffffffa04f4d1c>] ? br_handle_frame+0x18c/0x250 [bridge]
>      [<ffffffff81445709>] ? __netif_receive_skb+0x529/0x750
>      [<ffffffff814397da>] ? __alloc_skb+0x7a/0x180
>      [<ffffffff814492f8>] ? netif_receive_skb+0x58/0x60
>      [<ffffffff81449400>] ? napi_skb_finish+0x50/0x70
>      [<ffffffff8144ab79>] ? napi_gro_receive+0x39/0x50
>      [<ffffffffa016887f>] ? bnx2x_rx_int+0x83f/0x1630 [bnx2x]
>      [<ffffffff810608dc>] ? perf_event_task_sched_out+0x4c/0x70
>      [<ffffffffa01698ae>] ? bnx2x_poll+0x23e/0x2f0 [bnx2x]
>      [<ffffffff8144ac93>] ? net_rx_action+0x103/0x2f0
>      [<ffffffff8107a811>] ? __do_softirq+0xc1/0x1e0
>      [<ffffffff810e6b30>] ? handle_IRQ_event+0x60/0x170
>      [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
>      [<ffffffff8100fa75>] ? do_softirq+0x65/0xa0
>      [<ffffffff8107a6c5>] ? irq_exit+0x85/0x90
>      [<ffffffff8151b165>] ? do_IRQ+0x75/0xf0
>      [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
>     <EOI> 
>      [<ffffffff81016627>] ? mwait_idle+0x77/0xd0
>      [<ffffffff815176fa>] ? atomic_notifier_call_chain+0x1a/0x20
>      [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
>      [<ffffffff814f6e3a>] ? rest_init+0x7a/0x80
>      [<ffffffff81c25f70>] ? start_kernel+0x405/0x411
>      [<ffffffff81c2533a>] ? x86_64_start_reservations+0x125/0x129
>      [<ffffffff81c25453>] ? x86_64_start_kernel+0x115/0x124
> 
> Does anyone know the reason?