Hi all,
Our kernel is 2.6.32-358.14.1.x86_64, recently dozens of them panicked, since it's been OK for a long time and the problem emerged all of a sudden, I'm not sure if an upgrade caused this problem. Here's what I got from backtracing:
PID: 8136 TASK: ffff8803341aead0 CPU: 2 COMMAND: "" #0 [ffff880028283610] panic at ffffffff815286b8 #1 [ffff880028283690] oops_end at ffffffff8152c8a2 #2 [ffff8800282836c0] no_context at ffffffff81046c1b #3 [ffff880028283710] __bad_area_nosemaphore at ffffffff81046ea5 #4 [ffff880028283760] bad_area_nosemaphore at ffffffff81046f73 #5 [ffff880028283770] __do_page_fault at ffffffff810476d1 #6 [ffff880028283890] do_page_fault at ffffffff8152e7be #7 [ffff8800282838c0] page_fault at ffffffff8152bb75 [exception RIP: tcp_fastretrans_alert+2754] RIP: ffffffff814aed62 RSP: ffff880028283970 RFLAGS: 00010246 RAX: 0000000000000002 RBX: ffff88003d22c940 RCX: 0000000000000002 RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000000 RBP: ffff8800282839b0 R8: 000000018033a9ac R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000d03 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #8 [ffff8800282839b8] tcp_ack at ffffffff814afb2c #9 [ffff880028283a88] tcp_rcv_state_process at ffffffff814b1128 #10 [ffff880028283b18] tcp_v4_do_rcv at ffffffff814b94f0 #11 [ffff880028283bb8] tcp_v4_rcv at ffffffff814baf9a #12 [ffff880028283c48] ip_local_deliver_finish at ffffffff8149648d #13 [ffff880028283c78] ip_local_deliver at ffffffff81496718 #14 [ffff880028283ca8] ip_rcv_finish at ffffffff81495bbd #15 [ffff880028283ce8] ip_rcv at ffffffff81496155 #16 [ffff880028283d28] __netif_receive_skb at ffffffff8145db5b #17 [ffff880028283d88] netif_receive_skb at ffffffff814621b8 #18 [ffff880028283dc8] virtnet_poll at ffffffffa0130565 [virtio_net] #19 [ffff880028283e68] net_rx_action at ffffffff81463193 #20 [ffff880028283ec8] __do_softirq at ffffffff81078c71 #21 [ffff880028283f38] call_softirq at ffffffff8100c1cc #22 [ffff880028283f50] do_softirq at ffffffff8100de05 #23 [ffff880028283f70] irq_exit at ffffffff81078a55 #24 [ffff880028283f80] do_IRQ at ffffffff81532365 --- <IRQ stack> --- #25 [ffff88001e851f58] ret_from_intr at ffffffff8100b9d3 RIP: 00007fa080e1a538 RSP: 00007fa0781ec960 RFLAGS: 00000206 RAX: 0000000000000001 RBX: 00007fa0781ec9a0 RCX: 000000000001ef8c RDX: 0000000000001000 RSI: 0000000000000006 RDI: 00007fa07c093df8 RBP: ffffffff8100b9ce R8: 0000000000000006 R9: 0000000004000001 R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fa0710a18f0 R14: 0000000000000120 R15: 0000000000001000 ORIG_RAX: ffffffffffffff8e CS: 0033 SS: 002b
disassemble tcp_fasteretrans_alert+2754 gives:
0xffffffff814aed62 <tcp_fastretrans_alert+2754>: sub 0x58(%rdi),%r8d
I know this kernel is a bit old, but since these kernels are in production environment, I can't just upgrade them all to test if it's the problem of the old version. So I need some advice on how to debug or a bug report. Thanks.
On Mon, 2016-11-28 at 15:29 +0800, Zhang Qiang wrote:
Hi all,
Our kernel is 2.6.32-358.14.1.x86_64, recently dozens of them panicked, since it's been OK for a long time and the problem emerged all of a sudden, I'm not sure if an upgrade caused this problem. Here's what I got from backtracing:
PID: 8136 TASK: ffff8803341aead0 CPU: 2 COMMAND: "" #0 [ffff880028283610] panic at ffffffff815286b8 #1 [ffff880028283690] oops_end at ffffffff8152c8a2 #2 [ffff8800282836c0] no_context at ffffffff81046c1b #3 [ffff880028283710] __bad_area_nosemaphore at ffffffff81046ea5 #4 [ffff880028283760] bad_area_nosemaphore at ffffffff81046f73 #5 [ffff880028283770] __do_page_fault at ffffffff810476d1 #6 [ffff880028283890] do_page_fault at ffffffff8152e7be #7 [ffff8800282838c0] page_fault at ffffffff8152bb75 [exception RIP: tcp_fastretrans_alert+2754] RIP: ffffffff814aed62 RSP: ffff880028283970 RFLAGS: 00010246 RAX: 0000000000000002 RBX: ffff88003d22c940 RCX: 0000000000000002 RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000000 RBP: ffff8800282839b0 R8: 000000018033a9ac R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000d03 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #8 [ffff8800282839b8] tcp_ack at ffffffff814afb2c #9 [ffff880028283a88] tcp_rcv_state_process at ffffffff814b1128 #10 [ffff880028283b18] tcp_v4_do_rcv at ffffffff814b94f0 #11 [ffff880028283bb8] tcp_v4_rcv at ffffffff814baf9a #12 [ffff880028283c48] ip_local_deliver_finish at ffffffff8149648d #13 [ffff880028283c78] ip_local_deliver at ffffffff81496718 #14 [ffff880028283ca8] ip_rcv_finish at ffffffff81495bbd #15 [ffff880028283ce8] ip_rcv at ffffffff81496155 #16 [ffff880028283d28] __netif_receive_skb at ffffffff8145db5b #17 [ffff880028283d88] netif_receive_skb at ffffffff814621b8 #18 [ffff880028283dc8] virtnet_poll at ffffffffa0130565 [virtio_net] #19 [ffff880028283e68] net_rx_action at ffffffff81463193 #20 [ffff880028283ec8] __do_softirq at ffffffff81078c71 #21 [ffff880028283f38] call_softirq at ffffffff8100c1cc #22 [ffff880028283f50] do_softirq at ffffffff8100de05 #23 [ffff880028283f70] irq_exit at ffffffff81078a55 #24 [ffff880028283f80] do_IRQ at ffffffff81532365 --- <IRQ stack> --- #25 [ffff88001e851f58] ret_from_intr at ffffffff8100b9d3 RIP: 00007fa080e1a538 RSP: 00007fa0781ec960 RFLAGS: 00000206 RAX: 0000000000000001 RBX: 00007fa0781ec9a0 RCX: 000000000001ef8c RDX: 0000000000001000 RSI: 0000000000000006 RDI: 00007fa07c093df8 RBP: ffffffff8100b9ce R8: 0000000000000006 R9: 0000000004000001 R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fa0710a18f0 R14: 0000000000000120 R15: 0000000000001000 ORIG_RAX: ffffffffffffff8e CS: 0033 SS: 002b
disassemble tcp_fasteretrans_alert+2754 gives:
0xffffffff814aed62 <tcp_fastretrans_alert+2754>: sub 0x58(%rdi),%r8d
I know this kernel is a bit old, but since these kernels are in production environment, I can't just upgrade them all to test if it's the problem of the old version. So I need some advice on how to debug or a bug report. Thanks. _______________________________________________
Hi,
Being in a production environment, all the more reason to have an upgrade plan in place and be running latest package version with all the fixes provided.
Is this isolated to one machine or many?
Can it be reproduced and if so how?
Regards
Phil