[CentOS] Kernel Panic!

Tue Sep 6 17:15:08 UTC 2005
Johnny Hughes <mailing-lists at hughesjr.com>

On Tue, 2005-09-06 at 10:00 -0700, Mark Elam wrote:
> Hey all,
> 
> Long time user of Centos, I really love what you guys are doing here.  I
> have a farm of 50 Centos 4.1 machines. (originally 4.0 updated with yum
> up to current 4.1).  Ever since I updated to the 2.6.9-11 kernel I am
> getting a lot of kernel panics.  7 machines suffered kernel panics over
> the weekend.  Funny that they were the only ones that are booted into
> the new kernel!  The rest haven't been rebooted yet so they are still at
> the 2.6.9-5 kernel.  They all have similar messages in the logs as show
> below.  Any ideas on where to look for the problem?  Has anyone else
> seen this?  
> 
> Machine info:  Typical of all 50 machines:
> 
> P4 3Ghz
> 2gb ram
> U320 SCSI disk w/ lsi scsi controller
> Intel Workstation boards
> Nvidia graphics
> 

Verify that all the boards have the latest BIOS updates (check for one,
install the latest bios and see if it corrects the issue).

> All machines exactly the same, installed w/ kickstart w/ these packages:
> 
> %packages
> @ office
> @ legacy-software-development
> @ editors
> @ system-tools
> @ base-x
> @ gnome-software-development
> @ graphics
> @ smb-server
> @ development-tools
> @ printing
> @ text-internet
> @ kde-software-development
> @ kde-desktop
> @ x-software-development
> @ mail-server
> @ legacy-network-server
> @ sound-and-video
> @ gnome-desktop
> @ ftp-server
> @ network-server
> @ graphical-internet
> vnc
> telnet-server
> rsh-server
> kernel-smp
> grub
> kernel
> xemacs
> rusers-server
> 
> 
> /var/log/messsages:
> 
> Sep  3 04:03:10 qu015 kernel: Unable to handle kernel NULL pointer
> dereference at virtual address 00000054
> Sep  3 04:03:10 qu015 kernel:  printing eip:
> Sep  3 04:03:10 qu015 kernel: c016c583
> Sep  3 04:03:10 qu015 kernel: *pde = 0bb6d001
> Sep  3 04:03:10 qu015 kernel: Oops: 0000 [#1]
> Sep  3 04:03:10 qu015 kernel: SMP
> Sep  3 04:03:10 qu015 kernel: Modules linked in: nvidia(U) vmnet(U)
> vmmon(U) nfs nfsd exportfs lockd sunrpc md5 ipv6 parport_pc lp parport
> autofs4 sr_mod ide_scsi dm_mod button battery ac joydev uhci_hcd
> ehci_hcd snd_maestro3 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm
> snd_timer snd_page_alloc snd soundcore e1000 floppy ext3 jbd mptscsih
> mptbase sd_mod scsi_mod
> Sep  3 04:03:10 qu015 kernel: CPU:    1
> Sep  3 04:03:10 qu015 kernel: EIP:    0060:[<c016c583>]    Tainted: P
> VLI
> Sep  3 04:03:10 qu015 kernel: EFLAGS: 00010202   (2.6.9-11.ELsmp)
> Sep  3 04:03:10 qu015 kernel: EIP is at iput+0x25/0x61
> Sep  3 04:03:10 qu015 kernel: eax: 00000040   ebx: f3a85b74   ecx:
> f8c7bbb9   edx: f3a85b74
> Sep  3 04:03:10 qu015 kernel: esi: f222b89c   edi: f222b8a4   ebp:
> 0000006b   esp: f7ceeeec
> Sep  3 04:03:10 qu015 kernel: ds: 007b   es: 007b   ss: 0068
> Sep  3 04:03:10 qu015 kernel: Process kswapd0 (pid: 43,
> threadinfo=f7cee000 task=f7d1b7b0)
> Sep  3 04:03:10 qu015 kernel: Stack: f3a85b74 c016a1d8 00000000 00000092
> 00000000 f7ffe9e0 c016a553 c0144e2c
> Sep  3 04:03:10 qu015 kernel:        00d70a00 00000000 00000061 00000000
> 00023313 000000d0 00000020 c031ad80
> Sep  3 04:03:10 qu015 kernel:        00000002 c031ad80 0000000c c01460b8
> c02c5604 00023313 f7ceef9c 00000000
> Sep  3 04:03:10 qu015 kernel: Call Trace:
> Sep  3 04:03:10 qu015 kernel:  [<c016a1d8>] prune_dcache+0x13f/0x18e
> Sep  3 04:03:10 qu015 kernel:  [<c016a553>] shrink_dcache_memory
> +0x14/0x2b
> Sep  3 04:03:10 qu015 kernel:  [<c0144e2c>] shrink_slab+0xf8/0x161
> Sep  3 04:03:10 qu015 kernel:  [<c01460b8>] balance_pgdat+0x1d2/0x2f8
> Sep  3 04:03:10 qu015 kernel:  [<c02c5604>] schedule+0x844/0x87a
> Sep  3 04:03:10 qu015 kernel:  [<c01462a8>] kswapd+0xca/0xcc
> Sep  3 04:03:10 qu015 kernel:  [<c011f6ee>] autoremove_wake_function
> +0x0/0x2d
> Sep  3 04:03:10 qu015 kernel:  [<c02c7296>] ret_from_fork+0x6/0x14
> Sep  3 04:03:10 qu015 kernel:  [<c011f6ee>] autoremove_wake_function
> +0x0/0x2d
> Sep  3 04:03:10 qu015 kernel:  [<c01461de>] kswapd+0x0/0xcc
> Sep  3 04:03:10 qu015 kernel:  [<c01041f1>] kernel_thread_helper+0x5/0xb
> Sep  3 04:03:10 qu015 kernel: Code: ff e9 fa fe ff ff 53 85 c0 89 c3 74
> 58 83 bb 3c 01 00 00 20 8b 80 a4 00 00 00 8b 40 24 75 08 0f 0b 4c 04 3d
> e7 2d c0 85 c0 74 0b <8b> 50 14 85 d2 74 04 89 d8 ff d2 8d 43 1c ba 70
> fc 31 c0 e8 59
> Sep  3 04:03:10 qu015 kernel:  <0>Fatal exception: panic in 5 seconds
> 
> Thanks!  

Looks to me as if this is either an ACPI issue with the disk ... OR a
SCSI controller driver issue that changed with the new kernel.

Both of these issues can be fixed with a Motherboard BIOS upgrade. (If
the controller is built onto the board).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.centos.org/pipermail/centos/attachments/20050906/050ecfd4/attachment-0005.sig>