I'm getting kernel crashes on a new machine since it went into production some weeks ago. How can I troubleshoot this? I suspect a hardware problem, but I know too few about the kernel and kernel debugging on Linux to know how I can nail this down with debugging software.
Just for reference of what kind these oopses are I quote the last two. The symptom is that the machine stops responding but is pingable. After a reset all is well until the next crash in ten days or so.
Apr 29 15:05:07 nx05 kernel: c014802d Apr 29 15:05:07 nx05 kernel: Modules linked in: nls_utf8 cifs smbfs ipt_REJECT ipt_limit ipt_state ipt_LOG iptable_filter ip_tables ip_conntrack_ftp ip_conntrack md5 ipv6 autofs4 sunrpc dm_mirror dm_mod button battery ac 8139too mii ext3 jbd ata_piix libata sd_mod scsi_mod Apr 29 15:05:07 nx05 kernel: CPU: 0 Apr 29 15:05:07 nx05 kernel: EIP: 0060:[<c014802d>] Not tainted VLI Apr 29 15:05:07 nx05 kernel: EFLAGS: 00010006 (2.6.9-34.EL) Apr 29 15:05:07 nx05 kernel: EIP is at find_get_page+0x73/0xdd Apr 29 15:05:07 nx05 kernel: eax: 00000200 ebx: df002bf4 ecx: 00000200 edx: 00000200 Apr 29 15:05:07 nx05 kernel: esi: df002bf4 edi: 00000000 ebp: c1b64d1c esp: c1b64cf0 Apr 29 15:05:07 nx05 kernel: ds: 007b es: 007b ss: 0068 Apr 29 15:05:07 nx05 kernel: Process httpd (pid: 24805, threadinfo=c1b64000 task=d308ad50) Apr 29 15:05:07 nx05 kernel: Stack: 00000000 c016a705 0029005a 00000000 00000001 00000000 00000000 df002b14 Apr 29 15:05:07 nx05 kernel: 00000000 0029005a 00000000 df002a80 c016bdb7 00001000 00000000 df73b800 Apr 29 15:05:07 nx05 kernel: 0029005a 00000000 df002a80 c016bde6 00001000 df73b800 cfc14bf4 00000000 Apr 29 15:05:07 nx05 kernel: Call Trace: Apr 29 15:05:07 nx05 kernel: [<c016a705>] __find_get_block_slow+0x4b/0x1c6 Apr 29 15:05:07 nx05 kernel: [<c016bdb7>] __find_get_block+0x89/0xa5 Apr 29 15:05:07 nx05 kernel: [<c016bde6>] __getblk+0x13/0x49 Apr 29 15:05:07 nx05 kernel: [<e09298e3>] ext3_get_inode_loc+0x4f/0x223 [ext3] Apr 29 15:05:07 nx05 kernel: [<e0929b45>] ext3_read_inode+0x38/0x309 [ext3] Apr 29 15:05:07 nx05 kernel: [<c030fbf0>] __cond_resched+0x14/0x3b Apr 29 15:05:07 nx05 kernel: [<e092e1b7>] ext3_alloc_inode+0xf/0x46 [ext3] Apr 29 15:05:07 nx05 kernel: [<c0185596>] alloc_inode+0xf6/0x17f Apr 29 15:05:07 nx05 kernel: [<c018662e>] get_new_inode_fast+0xa5/0x1e9 Apr 29 15:05:07 nx05 kernel: [<e092b9a0>] ext3_lookup+0x55/0x87 [ext3] Apr 29 15:05:07 nx05 kernel: [<c0177d32>] real_lookup+0x73/0xde Apr 29 15:05:07 nx05 kernel: [<c0178062>] do_lookup+0x56/0x8f Apr 29 15:05:07 nx05 kernel: [<c0178ad4>] __link_path_walk+0xa39/0xd98 Apr 29 15:05:07 nx05 kernel: [<c0178e74>] link_path_walk+0x41/0xb9 Apr 29 15:05:07 nx05 kernel: [<c011e867>] autoremove_wake_function+0x0/0x2d Apr 29 15:05:07 nx05 kernel: [<c017916c>] path_lookup+0x104/0x135 Apr 29 15:05:07 nx05 kernel: [<c01792b1>] __user_walk+0x21/0x51 Apr 29 15:05:07 nx05 kernel: [<c01734b8>] vfs_stat+0x14/0x3a Apr 29 15:05:07 nx05 kernel: [<c011e867>] autoremove_wake_function+0x0/0x2d Apr 29 15:05:07 nx05 kernel: [<c0173ac1>] sys_stat64+0xf/0x23 Apr 29 15:05:07 nx05 kernel: [<c0168b76>] vfs_read+0xda/0xe2 Apr 29 15:05:07 nx05 kernel: [<c0168d65>] sys_read+0x3c/0x62 Apr 29 15:05:07 nx05 kernel: [<c0311443>] syscall_call+0x7/0xb Apr 29 15:05:07 nx05 kernel: [<c031007b>] rwsem_down_read_failed+0x19f/0x204 Apr 29 15:05:07 nx05 kernel: Code: c0 e8 77 8d fd ff c7 43 14 01 00 00 00 8d 43 04 c7 43 20 54 1f 32 c0 c7 43 24 0e 02 00 00 e8 1e c5 09 00 85 c0 89 c1 74 0f 89 c2 <8b> 00 f6 c4 80 74 03 8b 51 0c ff 42 04 81 7b 10 3c 4b 24 1d 74
Apr 20 03:55:31 nx05 kernel: c0167d59 Apr 20 03:55:31 nx05 kernel: Modules linked in: smbfs ipt_REJECT ipt_limit ipt_state ipt_LOG ip_conntrack_ftp ip_conntrack iptable_filter ip_tables parport_pc lp parport md5 ipv6 autofs4 sunrpc dm_mirror dm_mod button battery ac 8139too mii ext3 jbd ata_piix libata sd_mod scsi_mod Apr 20 03:55:31 nx05 kernel: CPU: 0 Apr 20 03:55:31 nx05 kernel: EIP: 0060:[<c0167d59>] Not tainted VLI Apr 20 03:55:31 nx05 kernel: EFLAGS: 00010286 (2.6.9-34.EL) Apr 20 03:55:31 nx05 kernel: EIP is at __dentry_open+0x62/0x16a Apr 20 03:55:31 nx05 kernel: eax: a093ffa0 ebx: dfab0680 ecx: 0000000d edx: dfe51100 Apr 20 03:55:31 nx05 kernel: esi: cd1dd78c edi: dfe51100 ebp: 00000000 esp: dda19f30 Apr 20 03:55:31 nx05 kernel: ds: 007b es: 007b ss: 0068 Apr 20 03:55:31 nx05 kernel: Process mysqld (pid: 12922, threadinfo=dda19000 task=dee2edd0) Apr 20 03:55:31 nx05 kernel: Stack: cae8a978 00000000 dfab0680 00008000 00000000 c0167c96 dfab0680 caeac000 Apr 20 03:55:31 nx05 kernel: cae8a978 dfe51100 dda19f58 dec82680 dda19f88 00000101 00000001 00000000 Apr 20 03:55:31 nx05 kernel: 00001000 b0d4e780 dda19f80 c030fbf0 caeac000 c01e67f2 00000000 00000039 Apr 20 03:55:31 nx05 kernel: Call Trace: Apr 20 03:55:31 nx05 kernel: [<c0167c96>] filp_open+0x5c/0x70 Apr 20 03:55:31 nx05 kernel: [<c030fbf0>] __cond_resched+0x14/0x3b Apr 20 03:55:31 nx05 kernel: [<c01e67f2>] direct_strncpy_from_user+0x3e/0x5d Apr 20 03:55:31 nx05 kernel: [<c016819f>] sys_open+0x31/0x7d Apr 20 03:55:31 nx05 kernel: [<c0311443>] syscall_call+0x7/0xb Apr 20 03:55:31 nx05 kernel: [<c031007b>] rwsem_down_read_failed+0x19f/0x204 Apr 20 03:55:31 nx05 kernel: Code: dc 00 00 00 89 83 a0 00 00 00 8b 04 24 89 7b 0c c7 43 24 00 00 00 00 89 43 08 c7 43 28 00 00 00 00 8b 86 d0 00 00 00 85 c0 74 19 <8b> 00 85 c0 74 0b 83 38 02 74 0e ff 80 00 01 00 00 8b 86 d0 00
Kai