CentOS 5 has been running continuously since 9/21 on my "do everything" home server (with the exception of a kernel update). It's a fairly old Athlon machine that serves as a firewall and various servers (dovecot, samba, NFS, dhcp, OpenVPN, etc).
I connected via OpenVPN about a week ago and discovered I get a kernel panic. I've since found that this is very repeatable and happens only after being connected via OpenVPN for about 4 hours or so.
I was able to manually copy the stuff on the console after the panic (see below). I googled "unable to handle kernel paging request" and didn't really find anything useful (to me).
I've tried both kernel version 2.6.18-8.1.14.el5 and 2.6.18-8.1.15.el5 as well as OpenVPN versions 2.1_rc4-1 and 2.0.9 all with the same results.
Not sure where to go with this(?). Should I post this on a kernel mailing list? Or somewhere else?
Call Trace: [<C040502c>] dump_trace+0x8c/0x96 [<c0405046>] show_trace_log_lvl+0x10/0x20 [<c04050e2>] show_stack_log_lvl+0x8c/0x94 [<c040520f>] show_registers+0x125/0x191 [<c0404c3b>] kernel_thread_helper+0x7/0x10 [<c0405411>] die+0x196/0x296 [<c05fd73f>] do_page_fault+0x3ea/0x4b8 [<c0434d15>] kthread+0x0/0xeb [<c05fd355>] do_page_fault+0x0/0x4b8 [<c0404a71>] error_code+0x39/0x40 [<c0434d15>] kthread+0x0/0xeb [<c0404c3b>] kernel_thread_helper+0x7/0x10 BUG: unable to handle kernel paging request at virtual address c0613dbf Printing eip: c0404c44 *pde = 2f9b5163 Recursive die() failure, output suppressed <0>Kernel panic - not syncing: Fatal exception
-- Thanks, Mike
On Nov 28, 2007 11:27 AM, Mike azmr@earthlink.net wrote:
I googled "unable to handle kernel paging request" and didn't really find anything useful (to me).
In my experience this probably means that you have some RAM going bad and you only manage to tickle the problem when the machine becomes loaded enough to need that part of the address space.
Reboot with memtest86 (should be on the centos install media) and look for test failures.
On Wed, 28 Nov 2007, Bart Schaefer wrote:
On Nov 28, 2007 11:27 AM, Mike azmr@earthlink.net wrote:
I googled "unable to handle kernel paging request" and didn't really find anything useful (to me).
In my experience this probably means that you have some RAM going bad and you only manage to tickle the problem when the machine becomes loaded enough to need that part of the address space.
Reboot with memtest86 (should be on the centos install media) and look for test failures.
Thanks Bart - That makes perfect sense. I've installed memtest and will let it cook over night.
-- Mike
On Wed, 2007-11-28 at 14:27 -0800, Bart Schaefer wrote:
On Nov 28, 2007 11:27 AM, Mike azmr@earthlink.net wrote:
I googled "unable to handle kernel paging request" and didn't really find anything useful (to me).
In my experience this probably means that you have some RAM going bad and you only manage to tickle the problem when the machine becomes loaded enough to need that part of the address space.
Reboot with memtest86 (should be on the centos install media) and look for test failures.
JFTR: I chased "random" panics for some time on my Acer AK77-400. Thought bad memory, ran memtest86 and it was confimed... NOT!
Turns out that although the board supports DDR.../333/400 (PCwhatchamacallit/2700/...) and has three slots, there is not enough bandwidth to run @ 400 with all three slots populated. At 333, all memory tested good.
*After* I was made aware of this niggling little inconvenience, I had to make the tough choice between faster or more memory. *sigh*.
I had found a post about it somewhere, but I can't locate it now. I hope this is not your problem... or maybe it is better than bad memory?
<snip sig stuff>
-- Bill
On Wed, 28 Nov 2007, Bart Schaefer wrote:
On Nov 28, 2007 11:27 AM, Mike azmr@earthlink.net wrote:
I googled "unable to handle kernel paging request" and didn't really find anything useful (to me).
In my experience this probably means that you have some RAM going bad and you only manage to tickle the problem when the machine becomes loaded enough to need that part of the address space.
Reboot with memtest86 (should be on the centos install media) and look for test failures.
That was it! Replaced the failing memory, now OpenVPN has been up for ~16 hours.
-- Thanks, Mike
On Nov 30, 2007 10:07 AM, Mike azmr@earthlink.net wrote:
On Wed, 28 Nov 2007, Bart Schaefer wrote:
Reboot with memtest86 (should be on the centos install media) and look for test failures.
That was it! Replaced the failing memory, now OpenVPN has been up for ~16 hours.
Glad to hear it. I sometimes wonder, given how often I've seen memory failures occur only under load, whether unused RAM is more likely to go bad than RAM that's kept busy all the time.