Hi all,
I recently had an issue where, running kernel 2.6.32-358.11.1, the box would be up for about five minutes, then would crash and reboot. kdump saved the vmcore files, so I was hoping to run crash against them to see why this was occurring. I copied the vmcores to another machine, installed the kernel-debuginfo package, and gave it a try, but had no success:
$ crash /usr/lib/debug/lib/modules/2.6.32-358.11.1.el6.centos.plus.x86_64/vmlinux vmcore
crash 6.1.0-1.el6 Copyright (C) 2002-2012 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.3.1 Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"...
WARNING: kernels compiled by different gcc versions: /usr/lib/debug/lib/modules/2.6.32-358.11.1.el6.centos.plus.x86_64/vmlinux: 4.4.6 vmcore kernel: 4.4.7
WARNING: kernel version inconsistency between vmlinux and dumpfile
crash: page excluded: kernel virtual address: ffffffff81c1bbc0 type: "current_task (per_cpu)" crash: page excluded: kernel virtual address: ffffffff81c1bbc0 type: "current_task (per_cpu)" crash: page excluded: kernel virtual address: ffffffff81c1bbc0 type: "current_task (per_cpu)" crash: page excluded: kernel virtual address: ffffffff81c1bbc0 type: "current_task (per_cpu)" crash: page excluded: kernel virtual address: ffffffff81c1bbc0 type: "current_task (per_cpu)" crash: page excluded: kernel virtual address: ffffffff81c1bbc0 type: "current_task (per_cpu)" crash: page excluded: kernel virtual address: ffffffff81c1bbc0 type: "current_task (per_cpu)" crash: page excluded: kernel virtual address: ffffffff81c1bbc0 type: "current_task (per_cpu)" crash: page excluded: kernel virtual address: ffffffff81c232a4 type: "tss_struct ist array" $
And instead of the crash> prompt, I was back to the bash prompt. Does anyone know what I could do to figure out why that would be? Do I need to run crash on the original server (the crash docs seem to imply that's not necessary)? Anything else I could look for that might be useful? As far as I can tell the kernels match:
$ strings vmcore |grep OSRELEASE OSRELEASE=2.6.32-358.11.1.el6.x86_64
--keith
On 07/04/2013 01:20 AM, Keith Keller wrote:
Hi all,
I recently had an issue where, running kernel 2.6.32-358.11.1, the box would be up for about five minutes, then would crash and reboot. kdump saved the vmcore files, so I was hoping to run crash against them to see why this was occurring. I copied the vmcores to another machine, installed the kernel-debuginfo package, and gave it a try, but had no success:
That kernel has potential kernel leak that could produce DoS attack. For now revert back to earlier kernel and wait for 2.6.32-358.11.2 or later. Do that even if you do not suspect DoS attack.
If problem repeats on older or later kernel, then you this investigate further.
On 2013-07-03, Ljubomir Ljubojevic centos@plnet.rs wrote:
That kernel has potential kernel leak that could produce DoS attack. For now revert back to earlier kernel and wait for 2.6.32-358.11.2 or later. Do that even if you do not suspect DoS attack.
Heh, it was DoSing me right from initial boot! :)
It looks like the bug was introduced some time ago: the bug report from Johnny's email a few days ago
https://bugzilla.redhat.com/show_bug.cgi?id=979936
says that the bug was introduced in this kernel:
https://rhn.redhat.com/errata/RHSA-2012-1304.html
Older than that goes back before CentOS 6.4. Maybe trying out Johnny's test kernel is a better option?
--keith