Hi,
I got a panic when running CentOS-6.5:
crash> bt PID: 106074 TASK: ffff8839c1e32ae0 CPU: 4 COMMAND: "flushd4[cbd-sd-" #0 [ffff8839c2a91900] machine_kexec at ffffffff81038fa9 #1 [ffff8839c2a91960] crash_kexec at ffffffff810c5992 #2 [ffff8839c2a91a30] oops_end at ffffffff81515c90 #3 [ffff8839c2a91a60] no_context at ffffffff81049f1b #4 [ffff8839c2a91ab0] __bad_area_nosemaphore at ffffffff8104a1a5 #5 [ffff8839c2a91b00] bad_area_nosemaphore at ffffffff8104a273 #6 [ffff8839c2a91b10] __do_page_fault at ffffffff8104a9bf #7 [ffff8839c2a91c30] do_page_fault at ffffffff81517bae #8 [ffff8839c2a91c60] page_fault at ffffffff81514f95 [exception RIP: rb_next+1] RIP: ffffffff81286e21 RSP: ffff8839c2a91d10 RFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff88204b501c00 RCX: 0000000000000000 RDX: ffff88013bc56840 RSI: ffff88013bc568d8 RDI: 0000000000000010 RBP: ffff8839c2a91d60 R8: 0000000000000001 R9: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff8839c2a91d18] pick_next_task_fair at ffffffff81068121 #10 [ffff8839c2a91d68] schedule at ffffffff81511e08 #11 [ffff8839c2a91e28] flushd_run at ffffffffa07a2cbd [cbd] #12 [ffff8839c2a91ee8] kthread at ffffffff8109acd6 #13 [ffff8839c2a91f48] kernel_thread at ffffffff8100c20a
The [cbd] is a module developed by us, I think this bug has nothing to do with it.
And the contents of rq in pick_next_task(struct rq *rq) is (see attachement for full contents of struct rq):
struct rq { lock = { raw_lock = { slock = 67109881 } }, nr_running = 2, cpu_load = {0, 5923, 14993, 13888, 9115}, last_load_update_tick = 4365159236, nohz_balance_kick = 0 '\000', skip_clock_update = 0, load = { weight = 2, inv_weight = 0 }, nr_load_updates = 21530842, nr_switches = 148355748, cfs = { load = { weight = 2, inv_weight = 0 }, nr_running = 1, h_nr_running = 2, exec_clock = 3309310258875, min_vruntime = 1181294560093, tasks_timeline = { rb_node = 0x0 }, rb_leftmost = 0x0, tasks = { next = 0xffff88013bc568e8, prev = 0xffff88013bc568e8 }, balance_iterator = 0xffff88013bc568e8, curr = 0xffff88204b501e00, next = 0x0, last = 0x0, skip = 0x0, nr_spread_over = 5, ....
We can see that the value if rq->cfs.nr_running is not zero, but rb_leftmost is null. With skip is null, this causes null deference panic in pick_next_entity() of pick_next_task_fair().
Does anyone have encountered same problem or advice?
Thanks
I googled this issue and found so many people have encountered, but most of them just said "the newer kernel doesn't have this problem, so upgrade kernel". We can't upgrade kernel easily, so we need to *really* solve this problem.
On 10/18/2017 04:41 PM, John Hodrien wrote:
On Wed, 18 Oct 2017, wuzhouhui wrote:
Does anyone have encountered same problem or advice?
Expect minimal help when running custom kernel modules on painfully old CentOS kernels?
jh _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
On 18 October 2017 at 09:50, wuzhouhui wuzhouhui14@mails.ucas.ac.cn wrote:
I googled this issue and found so many people have encountered, but most of them just said "the newer kernel doesn't have this problem, so upgrade kernel". We can't upgrade kernel easily, so we need to *really* solve this problem.
No, you really need to rebase your work on current CentOS as you're so far behind on critical security issues it's just not funny. It also mitigates any ability for someone to actually help and for any fix to reach you.
On 18 October 2017 at 04:50, wuzhouhui wuzhouhui14@mails.ucas.ac.cn wrote:
I googled this issue and found so many people have encountered, but most of them just said "the newer kernel doesn't have this problem, so upgrade kernel". We can't upgrade kernel easily, so we need to *really* solve this problem.
If you can't update the kernel then how can anyone fix the problem? The kernel needs to be changed out in some way. [Yes there are ways to binary patch a running kernel but it is a) frought with danger b) experts only area. People who do that do not offer their services for free for a reason.]
-- Stephen J Smoogen.
Fine, it seems that upgrade kernel is the only effective solution.
On 18 Oct 2017, at 10:00 PM, Stephen John Smoogen smooge@gmail.com wrote:
On 18 October 2017 at 04:50, wuzhouhui wuzhouhui14@mails.ucas.ac.cn wrote:
I googled this issue and found so many people have encountered, but most of them just said "the newer kernel doesn't have this problem, so upgrade kernel". We can't upgrade kernel easily, so we need to *really* solve this problem.
If you can't update the kernel then how can anyone fix the problem? The kernel needs to be changed out in some way. [Yes there are ways to binary patch a running kernel but it is a) frought with danger b) experts only area. People who do that do not offer their services for free for a reason.]
-- Stephen J Smoogen. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
On 18 October 2017 at 15:34, wuzhouhui wuzhouhui14@mails.ucas.ac.cn wrote:
Fine, it seems that upgrade kernel is the only effective solution.
To be as abundantly clear as possible on the matter ... it is not just kernel.
You need to do a full update against the CentOS 6 repositories.
Please remove me from your email I stopped working Thanks
Sent from my iPad
On Oct 18, 2017, at 10:34 AM, wuzhouhui wuzhouhui14@mails.ucas.ac.cn wrote:
Fine, it seems that upgrade kernel is the onlyrk effective solution.
On 18 Oct 2017, at 10:00 PM, Stephen John Smoogen smooge@gmail.com wrote:
On 18 October 2017 at 04:50, wuzhouhui wuzhouhui14@mails.ucas.ac.cn wrote:
I googled this issue and found so many people have encountered, but most of them just said "the newer kernel doesn't have this problem, so upgrade kernel". We can't upgrade kernel easily, so we need to *really* solve this problem.
If you can't update the kernel then how can anyone fix the problem? The kernel needs to be changed out in some way. [Yes there are ways to binary patch a running kernel but it is a) frought with danger b) experts only area. People who do that do not offer their services for free for a reason.]
-- Stephen J Smoogen. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos