Hi,
I got a panic when running CentOS-6.5:
crash> bt PID: 106074 TASK: ffff8839c1e32ae0 CPU: 4 COMMAND: "flushd4[cbd-sd-" #0 [ffff8839c2a91900] machine_kexec at ffffffff81038fa9 #1 [ffff8839c2a91960] crash_kexec at ffffffff810c5992 #2 [ffff8839c2a91a30] oops_end at ffffffff81515c90 #3 [ffff8839c2a91a60] no_context at ffffffff81049f1b #4 [ffff8839c2a91ab0] __bad_area_nosemaphore at ffffffff8104a1a5 #5 [ffff8839c2a91b00] bad_area_nosemaphore at ffffffff8104a273 #6 [ffff8839c2a91b10] __do_page_fault at ffffffff8104a9bf #7 [ffff8839c2a91c30] do_page_fault at ffffffff81517bae #8 [ffff8839c2a91c60] page_fault at ffffffff81514f95 [exception RIP: rb_next+1] RIP: ffffffff81286e21 RSP: ffff8839c2a91d10 RFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff88204b501c00 RCX: 0000000000000000 RDX: ffff88013bc56840 RSI: ffff88013bc568d8 RDI: 0000000000000010 RBP: ffff8839c2a91d60 R8: 0000000000000001 R9: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff8839c2a91d18] pick_next_task_fair at ffffffff81068121 #10 [ffff8839c2a91d68] schedule at ffffffff81511e08 #11 [ffff8839c2a91e28] flushd_run at ffffffffa07a2cbd [cbd] #12 [ffff8839c2a91ee8] kthread at ffffffff8109acd6 #13 [ffff8839c2a91f48] kernel_thread at ffffffff8100c20a
The [cbd] is a module developed by us, I think this bug has nothing to do with it.
And the contents of rq in pick_next_task(struct rq *rq) is (see attachement for full contents of struct rq):
struct rq { lock = { raw_lock = { slock = 67109881 } }, nr_running = 2, cpu_load = {0, 5923, 14993, 13888, 9115}, last_load_update_tick = 4365159236, nohz_balance_kick = 0 '\000', skip_clock_update = 0, load = { weight = 2, inv_weight = 0 }, nr_load_updates = 21530842, nr_switches = 148355748, cfs = { load = { weight = 2, inv_weight = 0 }, nr_running = 1, h_nr_running = 2, exec_clock = 3309310258875, min_vruntime = 1181294560093, tasks_timeline = { rb_node = 0x0 }, rb_leftmost = 0x0, tasks = { next = 0xffff88013bc568e8, prev = 0xffff88013bc568e8 }, balance_iterator = 0xffff88013bc568e8, curr = 0xffff88204b501e00, next = 0x0, last = 0x0, skip = 0x0, nr_spread_over = 5, ....
We can see that the value if rq->cfs.nr_running is not zero, but rb_leftmost is null. With skip is null, this causes null deference panic in pick_next_entity() of pick_next_task_fair().
Does anyone have encountered same problem or advice?
Thanks