Been having this problem, and posted about it before. Thinking it was a memory issue, I've replaced all memory on the server. However, the problem has continued.
Server is a Proliant DL380 (8GB RAM, 2 Xeon CPU), running CentOS 3.6, all patches up-to-date. Kernel is 2.4.21-40.ELsmp (problem seems to have first manifested on kernel 2.4.21-37.0.1.ELsmp). Disk is CCISS hardware RAID-5, straight partitioning (no LVM).
This server runs an Oracle 10g instance, async I/O enabled to a NetApp filer where most data is stored.
Traceback of latest failure follows (two crashes this morning). Anyone read these things well enough to tell me if there's any insight in this? There are no nVidia drivers loaded, only stock kernel modules.
Thanks in advance for any insight. -Alan
Apr 10 05:32:22 db01-01 kernel: page not mapped. erroring out. Apr 10 05:32:22 db01-01 kernel: Page has mapping still set. This is a serious situation. However if you Apr 10 05:32:22 db01-01 kernel: are using the NVidia binary only module please report this bug to Apr 10 05:32:22 db01-01 kernel: NVidia and not to the linux kernel mailinglist. Apr 10 05:32:22 db01-01 kernel: ------------[ cut here ]------------ Apr 10 05:32:22 db01-01 kernel: kernel BUG at page_alloc.c:225! Apr 10 05:32:22 db01-01 kernel: invalid operand: 0000 Apr 10 05:32:22 db01-01 kernel: sg nfs lockd sunrpc tg3 microcode keybdev mousedev hid input ehci-hcd usb-uhci usbcore ext3 jbd cciss sd_mod scsi_mod Apr 10 05:32:22 db01-01 kernel: CPU: 1 Apr 10 05:32:22 db01-01 kernel: EIP: 0060:[<c0159560>] Not tainted Apr 10 05:32:22 db01-01 kernel: EFLAGS: 00010286 Apr 10 05:32:22 db01-01 kernel: Apr 10 05:32:22 db01-01 kernel: EIP is at __free_pages_ok [kernel] 0x3e0 (2.4.21-40.ELsmp/i686) Apr 10 05:32:22 db01-01 kernel: eax: 00000033 ebx: c797dd38 ecx: 00000001edx: c0387e98 Apr 10 05:32:22 db01-01 kernel: esi: f4402880 edi: 00000000 ebp: 00000000esp: cd7d5ec8 Apr 10 05:32:22 db01-01 kernel: ds: 0068 es: 0068 ss: 0068 Apr 10 05:32:22 db01-01 kernel: Process keventd (pid: 6, stackpage=cd7d5000) Apr 10 05:32:22 db01-01 kernel: Stack: c02c1ea8 00000363 c000a750 ff0ea000 c0440280 00000000 cdbac000 efc21f00 Apr 10 05:32:22 db01-01 kernel: 00000000 00000001 00000001 00000086 dab95054 00000001 f4402880 00000000 Apr 10 05:32:22 db01-01 kernel: 00000000 c014cf3e 00000001 00000000 00000000 cd7d4000 00000000 00000e00 Apr 10 05:32:22 db01-01 kernel: Call Trace: [<c014cf3e>] __iodesc_free [kernel] 0xde (0xcd7d5f0c) Apr 10 05:32:22 db01-01 kernel: [<c0161e9c>] kmap_high [kernel] 0x5c (0xcd7d5f28) Apr 10 05:32:22 db01-01 kernel: [<c014d87b>] __iodesc_read_finish [kernel] 0x22b (0xcd7d5f38) Apr 10 05:32:22 db01-01 kernel: [<c01302ca>] __run_task_queue [kernel] 0x6a (0xcd7d5f74) Apr 10 05:32:22 db01-01 kernel: [<c013c9ad>] context_thread [kernel] 0x13d (0xcd7d5f8c) Apr 10 05:32:22 db01-01 kernel: [<c013c870>] context_thread [kernel] 0x0 (0xcd7d5fe0) Apr 10 05:32:22 db01-01 kernel: [<c01095cd>] kernel_thread_helper [kernel] 0x5 (0xcd7d5ff0) Apr 10 05:32:22 db01-01 kernel: Apr 10 05:32:22 db01-01 kernel: Code: 0f 0b e1 00 33 17 2c c0 e9 6c fc ff ff 9c 5a fa f0 fe 0d 70 Apr 10 05:32:22 db01-01 kernel: Apr 10 05:32:22 db01-01 kernel: Kernel panic: Fatal exception
=========== Alan Sparks, UNIX/Linux Systems Administrator asparks@doublesparks.net
Anyone? Bueller?
Just had another of these crashes, after moving the disks and new memory to a new DL380G4 box. Starting to look very much like a kernel problem, but I do not know the best way to approach such a debugging task.
Any advice is appreciated. -Alan
Alan Sparks said:
Been having this problem, and posted about it before. Thinking it was a memory issue, I've replaced all memory on the server. However, the problem has continued.
Server is a Proliant DL380 (8GB RAM, 2 Xeon CPU), running CentOS 3.6, all patches up-to-date. Kernel is 2.4.21-40.ELsmp (problem seems to have first manifested on kernel 2.4.21-37.0.1.ELsmp). Disk is CCISS hardware RAID-5, straight partitioning (no LVM).
This server runs an Oracle 10g instance, async I/O enabled to a NetApp filer where most data is stored.
Traceback of latest failure follows (two crashes this morning). Anyone read these things well enough to tell me if there's any insight in this? There are no nVidia drivers loaded, only stock kernel modules.
Thanks in advance for any insight. -Alan
Apr 10 05:32:22 db01-01 kernel: page not mapped. erroring out. Apr 10 05:32:22 db01-01 kernel: Page has mapping still set. This is a serious situation. However if you Apr 10 05:32:22 db01-01 kernel: are using the NVidia binary only module please report this bug to Apr 10 05:32:22 db01-01 kernel: NVidia and not to the linux kernel mailinglist. Apr 10 05:32:22 db01-01 kernel: ------------[ cut here ]------------ Apr 10 05:32:22 db01-01 kernel: kernel BUG at page_alloc.c:225! Apr 10 05:32:22 db01-01 kernel: invalid operand: 0000 Apr 10 05:32:22 db01-01 kernel: sg nfs lockd sunrpc tg3 microcode keybdev mousedev hid input ehci-hcd usb-uhci usbcore ext3 jbd cciss sd_mod scsi_mod Apr 10 05:32:22 db01-01 kernel: CPU: 1 Apr 10 05:32:22 db01-01 kernel: EIP: 0060:[<c0159560>] Not tainted Apr 10 05:32:22 db01-01 kernel: EFLAGS: 00010286 Apr 10 05:32:22 db01-01 kernel: Apr 10 05:32:22 db01-01 kernel: EIP is at __free_pages_ok [kernel] 0x3e0 (2.4.21-40.ELsmp/i686) Apr 10 05:32:22 db01-01 kernel: eax: 00000033 ebx: c797dd38 ecx: 00000001edx: c0387e98 Apr 10 05:32:22 db01-01 kernel: esi: f4402880 edi: 00000000 ebp: 00000000esp: cd7d5ec8 Apr 10 05:32:22 db01-01 kernel: ds: 0068 es: 0068 ss: 0068 Apr 10 05:32:22 db01-01 kernel: Process keventd (pid: 6, stackpage=cd7d5000) Apr 10 05:32:22 db01-01 kernel: Stack: c02c1ea8 00000363 c000a750 ff0ea000 c0440280 00000000 cdbac000 efc21f00 Apr 10 05:32:22 db01-01 kernel: 00000000 00000001 00000001 00000086 dab95054 00000001 f4402880 00000000 Apr 10 05:32:22 db01-01 kernel: 00000000 c014cf3e 00000001 00000000 00000000 cd7d4000 00000000 00000e00 Apr 10 05:32:22 db01-01 kernel: Call Trace: [<c014cf3e>] __iodesc_free [kernel] 0xde (0xcd7d5f0c) Apr 10 05:32:22 db01-01 kernel: [<c0161e9c>] kmap_high [kernel] 0x5c (0xcd7d5f28) Apr 10 05:32:22 db01-01 kernel: [<c014d87b>] __iodesc_read_finish [kernel] 0x22b (0xcd7d5f38) Apr 10 05:32:22 db01-01 kernel: [<c01302ca>] __run_task_queue [kernel] 0x6a (0xcd7d5f74) Apr 10 05:32:22 db01-01 kernel: [<c013c9ad>] context_thread [kernel] 0x13d (0xcd7d5f8c) Apr 10 05:32:22 db01-01 kernel: [<c013c870>] context_thread [kernel] 0x0 (0xcd7d5fe0) Apr 10 05:32:22 db01-01 kernel: [<c01095cd>] kernel_thread_helper [kernel] 0x5 (0xcd7d5ff0) Apr 10 05:32:22 db01-01 kernel: Apr 10 05:32:22 db01-01 kernel: Code: 0f 0b e1 00 33 17 2c c0 e9 6c fc ff ff 9c 5a fa f0 fe 0d 70 Apr 10 05:32:22 db01-01 kernel: Apr 10 05:32:22 db01-01 kernel: Kernel panic: Fatal exception
=========== Alan Sparks, UNIX/Linux Systems Administrator asparks@doublesparks.net
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
=========== Alan Sparks, UNIX/Linux Systems Administrator asparks@doublesparks.net