[CentOS] CentOS 6.4 tcp_fatretrans_alert causes panic

Mon Nov 28 07:45:16 UTC 2016
Phil Wyett <philwyett.hemisphere at gmail.com>

On Mon, 2016-11-28 at 15:29 +0800, Zhang Qiang wrote:
> Hi all,
> 
> Our kernel is 2.6.32-358.14.1.x86_64, recently dozens of them panicked,
> since it's been OK for a long time and the problem emerged all of a sudden,
> I'm not sure if an upgrade caused this problem. Here's what I got from
> backtracing:
> 
> PID: 8136   TASK: ffff8803341aead0  CPU: 2   COMMAND: ""
>  #0 [ffff880028283610] panic at ffffffff815286b8
>  #1 [ffff880028283690] oops_end at ffffffff8152c8a2
>  #2 [ffff8800282836c0] no_context at ffffffff81046c1b
>  #3 [ffff880028283710] __bad_area_nosemaphore at ffffffff81046ea5
>  #4 [ffff880028283760] bad_area_nosemaphore at ffffffff81046f73
>  #5 [ffff880028283770] __do_page_fault at ffffffff810476d1
>  #6 [ffff880028283890] do_page_fault at ffffffff8152e7be
>  #7 [ffff8800282838c0] page_fault at ffffffff8152bb75
>     [exception RIP: tcp_fastretrans_alert+2754]
>     RIP: ffffffff814aed62  RSP: ffff880028283970  RFLAGS: 00010246
>     RAX: 0000000000000002  RBX: ffff88003d22c940  RCX: 0000000000000002
>     RDX: 0000000000000000  RSI: 0000000000000003  RDI: 0000000000000000
>     RBP: ffff8800282839b0   R8: 000000018033a9ac   R9: 0000000000000000
>     R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
>     R13: 0000000000000000  R14: 0000000000000d03  R15: 0000000000000000
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
>  #8 [ffff8800282839b8] tcp_ack at ffffffff814afb2c
>  #9 [ffff880028283a88] tcp_rcv_state_process at ffffffff814b1128
> #10 [ffff880028283b18] tcp_v4_do_rcv at ffffffff814b94f0
> #11 [ffff880028283bb8] tcp_v4_rcv at ffffffff814baf9a
> #12 [ffff880028283c48] ip_local_deliver_finish at ffffffff8149648d
> #13 [ffff880028283c78] ip_local_deliver at ffffffff81496718
> #14 [ffff880028283ca8] ip_rcv_finish at ffffffff81495bbd
> #15 [ffff880028283ce8] ip_rcv at ffffffff81496155
> #16 [ffff880028283d28] __netif_receive_skb at ffffffff8145db5b
> #17 [ffff880028283d88] netif_receive_skb at ffffffff814621b8
> #18 [ffff880028283dc8] virtnet_poll at ffffffffa0130565 [virtio_net]
> #19 [ffff880028283e68] net_rx_action at ffffffff81463193
> #20 [ffff880028283ec8] __do_softirq at ffffffff81078c71
> #21 [ffff880028283f38] call_softirq at ffffffff8100c1cc
> #22 [ffff880028283f50] do_softirq at ffffffff8100de05
> #23 [ffff880028283f70] irq_exit at ffffffff81078a55
> #24 [ffff880028283f80] do_IRQ at ffffffff81532365
> --- <IRQ stack> ---
> #25 [ffff88001e851f58] ret_from_intr at ffffffff8100b9d3
>     RIP: 00007fa080e1a538  RSP: 00007fa0781ec960  RFLAGS: 00000206
>     RAX: 0000000000000001  RBX: 00007fa0781ec9a0  RCX: 000000000001ef8c
>     RDX: 0000000000001000  RSI: 0000000000000006  RDI: 00007fa07c093df8
>     RBP: ffffffff8100b9ce   R8: 0000000000000006   R9: 0000000004000001
>     R10: 0000000000000001  R11: 0000000000000246  R12: 0000000000000000
>     R13: 00007fa0710a18f0  R14: 0000000000000120  R15: 0000000000001000
>     ORIG_RAX: ffffffffffffff8e  CS: 0033  SS: 002b
> 
> disassemble tcp_fasteretrans_alert+2754 gives:
> 
> 0xffffffff814aed62 <tcp_fastretrans_alert+2754>:        sub
>  0x58(%rdi),%r8d
> 
> I know this kernel is a bit old, but since these kernels are in production
> environment, I can't just upgrade them all to test if it's the problem of
> the old version. So I need some advice on how to debug or a bug report.
> Thanks.
> _______________________________________________


Hi,

Being in a production environment, all the more reason to have an
upgrade plan in place and be running latest package version with all the
fixes provided.

Is this isolated to one machine or many?

Can it be reproduced and if so how?

Regards

Phil

-- 

Google+: https://goo.gl/CPjvNo
Blog: https://philwyett-hemi.blogspot.co.uk/
GitLab: https://gitlab.com/philwyett_hemi/


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.centos.org/pipermail/centos/attachments/20161128/fe07cb1e/attachment-0004.sig>