Hi all,
Our kernel is 2.6.32-358.14.1.x86_64, recently dozens of them panicked, since it's been OK for a long time and the problem emerged all of a sudden, I'm not sure if an upgrade caused this problem. Here's what I got from backtracing:
PID: 8136 TASK: ffff8803341aead0 CPU: 2 COMMAND: "" #0 [ffff880028283610] panic at ffffffff815286b8 #1 [ffff880028283690] oops_end at ffffffff8152c8a2 #2 [ffff8800282836c0] no_context at ffffffff81046c1b #3 [ffff880028283710] __bad_area_nosemaphore at ffffffff81046ea5 #4 [ffff880028283760] bad_area_nosemaphore at ffffffff81046f73 #5 [ffff880028283770] __do_page_fault at ffffffff810476d1 #6 [ffff880028283890] do_page_fault at ffffffff8152e7be #7 [ffff8800282838c0] page_fault at ffffffff8152bb75 [exception RIP: tcp_fastretrans_alert+2754] RIP: ffffffff814aed62 RSP: ffff880028283970 RFLAGS: 00010246 RAX: 0000000000000002 RBX: ffff88003d22c940 RCX: 0000000000000002 RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000000 RBP: ffff8800282839b0 R8: 000000018033a9ac R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000d03 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #8 [ffff8800282839b8] tcp_ack at ffffffff814afb2c #9 [ffff880028283a88] tcp_rcv_state_process at ffffffff814b1128 #10 [ffff880028283b18] tcp_v4_do_rcv at ffffffff814b94f0 #11 [ffff880028283bb8] tcp_v4_rcv at ffffffff814baf9a #12 [ffff880028283c48] ip_local_deliver_finish at ffffffff8149648d #13 [ffff880028283c78] ip_local_deliver at ffffffff81496718 #14 [ffff880028283ca8] ip_rcv_finish at ffffffff81495bbd #15 [ffff880028283ce8] ip_rcv at ffffffff81496155 #16 [ffff880028283d28] __netif_receive_skb at ffffffff8145db5b #17 [ffff880028283d88] netif_receive_skb at ffffffff814621b8 #18 [ffff880028283dc8] virtnet_poll at ffffffffa0130565 [virtio_net] #19 [ffff880028283e68] net_rx_action at ffffffff81463193 #20 [ffff880028283ec8] __do_softirq at ffffffff81078c71 #21 [ffff880028283f38] call_softirq at ffffffff8100c1cc #22 [ffff880028283f50] do_softirq at ffffffff8100de05 #23 [ffff880028283f70] irq_exit at ffffffff81078a55 #24 [ffff880028283f80] do_IRQ at ffffffff81532365 --- <IRQ stack> --- #25 [ffff88001e851f58] ret_from_intr at ffffffff8100b9d3 RIP: 00007fa080e1a538 RSP: 00007fa0781ec960 RFLAGS: 00000206 RAX: 0000000000000001 RBX: 00007fa0781ec9a0 RCX: 000000000001ef8c RDX: 0000000000001000 RSI: 0000000000000006 RDI: 00007fa07c093df8 RBP: ffffffff8100b9ce R8: 0000000000000006 R9: 0000000004000001 R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fa0710a18f0 R14: 0000000000000120 R15: 0000000000001000 ORIG_RAX: ffffffffffffff8e CS: 0033 SS: 002b
disassemble tcp_fasteretrans_alert+2754 gives:
0xffffffff814aed62 <tcp_fastretrans_alert+2754>: sub 0x58(%rdi),%r8d
I know this kernel is a bit old, but since these kernels are in production environment, I can't just upgrade them all to test if it's the problem of the old version. So I need some advice on how to debug or a bug report. Thanks.