Hi,
I got a panic when running CentOS-6.5:
crash> bt
PID: 106074 TASK: ffff8839c1e32ae0 CPU: 4 COMMAND: "flushd4[cbd-sd-"
#0 [ffff8839c2a91900] machine_kexec at ffffffff81038fa9
#1 [ffff8839c2a91960] crash_kexec at ffffffff810c5992
#2 [ffff8839c2a91a30] oops_end at ffffffff81515c90
#3 [ffff8839c2a91a60] no_context at ffffffff81049f1b
#4 [ffff8839c2a91ab0] __bad_area_nosemaphore at ffffffff8104a1a5
#5 [ffff8839c2a91b00] bad_area_nosemaphore at ffffffff8104a273
#6 [ffff8839c2a91b10] __do_page_fault at ffffffff8104a9bf
#7 [ffff8839c2a91c30] do_page_fault at ffffffff81517bae
#8 [ffff8839c2a91c60] page_fault at ffffffff81514f95
[exception RIP: rb_next+1]
RIP: ffffffff81286e21 RSP: ffff8839c2a91d10 RFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff88204b501c00 RCX: 0000000000000000
RDX: ffff88013bc56840 RSI: ffff88013bc568d8 RDI: 0000000000000010
RBP: ffff8839c2a91d60 R8: 0000000000000001 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff8839c2a91d18] pick_next_task_fair at ffffffff81068121
#10 [ffff8839c2a91d68] schedule at ffffffff81511e08
#11 [ffff8839c2a91e28] flushd_run at ffffffffa07a2cbd [cbd]
#12 [ffff8839c2a91ee8] kthread at ffffffff8109acd6
#13 [ffff8839c2a91f48] kernel_thread at ffffffff8100c20a
The [cbd] is a module developed by us, I think this bug has nothing to
do with it.
And the contents of rq in pick_next_task(struct rq *rq) is (see
attachement for full contents of struct rq):
struct rq {
lock = {
raw_lock = {
slock = 67109881
}
},
nr_running = 2,
cpu_load = {0, 5923, 14993, 13888, 9115},
last_load_update_tick = 4365159236,
nohz_balance_kick = 0 '\000',
skip_clock_update = 0,
load = {
weight = 2,
inv_weight = 0
},
nr_load_updates = 21530842,
nr_switches = 148355748,
cfs = {
load = {
weight = 2,
inv_weight = 0
},
nr_running = 1,
h_nr_running = 2,
exec_clock = 3309310258875,
min_vruntime = 1181294560093,
tasks_timeline = {
rb_node = 0x0
},
rb_leftmost = 0x0,
tasks = {
next = 0xffff88013bc568e8,
prev = 0xffff88013bc568e8
},
balance_iterator = 0xffff88013bc568e8,
curr = 0xffff88204b501e00,
next = 0x0,
last = 0x0,
skip = 0x0,
nr_spread_over = 5,
....
We can see that the value if rq->cfs.nr_running is not zero, but
rb_leftmost is null. With skip is null, this causes null deference
panic in pick_next_entity() of pick_next_task_fair().
Does anyone have encountered same problem or advice?
Thanks