 
            On Tue, 2005-09-06 at 10:00 -0700, Mark Elam wrote:
Hey all,
Long time user of Centos, I really love what you guys are doing here. I have a farm of 50 Centos 4.1 machines. (originally 4.0 updated with yum up to current 4.1). Ever since I updated to the 2.6.9-11 kernel I am getting a lot of kernel panics. 7 machines suffered kernel panics over the weekend. Funny that they were the only ones that are booted into the new kernel! The rest haven't been rebooted yet so they are still at the 2.6.9-5 kernel. They all have similar messages in the logs as show below. Any ideas on where to look for the problem? Has anyone else seen this?
Machine info: Typical of all 50 machines:
P4 3Ghz 2gb ram U320 SCSI disk w/ lsi scsi controller Intel Workstation boards Nvidia graphics
Verify that all the boards have the latest BIOS updates (check for one, install the latest bios and see if it corrects the issue).
All machines exactly the same, installed w/ kickstart w/ these packages:
%packages @ office @ legacy-software-development @ editors @ system-tools @ base-x @ gnome-software-development @ graphics @ smb-server @ development-tools @ printing @ text-internet @ kde-software-development @ kde-desktop @ x-software-development @ mail-server @ legacy-network-server @ sound-and-video @ gnome-desktop @ ftp-server @ network-server @ graphical-internet vnc telnet-server rsh-server kernel-smp grub kernel xemacs rusers-server
/var/log/messsages:
Sep 3 04:03:10 qu015 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000054 Sep 3 04:03:10 qu015 kernel: printing eip: Sep 3 04:03:10 qu015 kernel: c016c583 Sep 3 04:03:10 qu015 kernel: *pde = 0bb6d001 Sep 3 04:03:10 qu015 kernel: Oops: 0000 [#1] Sep 3 04:03:10 qu015 kernel: SMP Sep 3 04:03:10 qu015 kernel: Modules linked in: nvidia(U) vmnet(U) vmmon(U) nfs nfsd exportfs lockd sunrpc md5 ipv6 parport_pc lp parport autofs4 sr_mod ide_scsi dm_mod button battery ac joydev uhci_hcd ehci_hcd snd_maestro3 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd soundcore e1000 floppy ext3 jbd mptscsih mptbase sd_mod scsi_mod Sep 3 04:03:10 qu015 kernel: CPU: 1 Sep 3 04:03:10 qu015 kernel: EIP: 0060:[<c016c583>] Tainted: P VLI Sep 3 04:03:10 qu015 kernel: EFLAGS: 00010202 (2.6.9-11.ELsmp) Sep 3 04:03:10 qu015 kernel: EIP is at iput+0x25/0x61 Sep 3 04:03:10 qu015 kernel: eax: 00000040 ebx: f3a85b74 ecx: f8c7bbb9 edx: f3a85b74 Sep 3 04:03:10 qu015 kernel: esi: f222b89c edi: f222b8a4 ebp: 0000006b esp: f7ceeeec Sep 3 04:03:10 qu015 kernel: ds: 007b es: 007b ss: 0068 Sep 3 04:03:10 qu015 kernel: Process kswapd0 (pid: 43, threadinfo=f7cee000 task=f7d1b7b0) Sep 3 04:03:10 qu015 kernel: Stack: f3a85b74 c016a1d8 00000000 00000092 00000000 f7ffe9e0 c016a553 c0144e2c Sep 3 04:03:10 qu015 kernel: 00d70a00 00000000 00000061 00000000 00023313 000000d0 00000020 c031ad80 Sep 3 04:03:10 qu015 kernel: 00000002 c031ad80 0000000c c01460b8 c02c5604 00023313 f7ceef9c 00000000 Sep 3 04:03:10 qu015 kernel: Call Trace: Sep 3 04:03:10 qu015 kernel: [<c016a1d8>] prune_dcache+0x13f/0x18e Sep 3 04:03:10 qu015 kernel: [<c016a553>] shrink_dcache_memory +0x14/0x2b Sep 3 04:03:10 qu015 kernel: [<c0144e2c>] shrink_slab+0xf8/0x161 Sep 3 04:03:10 qu015 kernel: [<c01460b8>] balance_pgdat+0x1d2/0x2f8 Sep 3 04:03:10 qu015 kernel: [<c02c5604>] schedule+0x844/0x87a Sep 3 04:03:10 qu015 kernel: [<c01462a8>] kswapd+0xca/0xcc Sep 3 04:03:10 qu015 kernel: [<c011f6ee>] autoremove_wake_function +0x0/0x2d Sep 3 04:03:10 qu015 kernel: [<c02c7296>] ret_from_fork+0x6/0x14 Sep 3 04:03:10 qu015 kernel: [<c011f6ee>] autoremove_wake_function +0x0/0x2d Sep 3 04:03:10 qu015 kernel: [<c01461de>] kswapd+0x0/0xcc Sep 3 04:03:10 qu015 kernel: [<c01041f1>] kernel_thread_helper+0x5/0xb Sep 3 04:03:10 qu015 kernel: Code: ff e9 fa fe ff ff 53 85 c0 89 c3 74 58 83 bb 3c 01 00 00 20 8b 80 a4 00 00 00 8b 40 24 75 08 0f 0b 4c 04 3d e7 2d c0 85 c0 74 0b <8b> 50 14 85 d2 74 04 89 d8 ff d2 8d 43 1c ba 70 fc 31 c0 e8 59 Sep 3 04:03:10 qu015 kernel: <0>Fatal exception: panic in 5 seconds
Thanks!
Looks to me as if this is either an ACPI issue with the disk ... OR a SCSI controller driver issue that changed with the new kernel.
Both of these issues can be fixed with a Motherboard BIOS upgrade. (If the controller is built onto the board).