[CentOS] Centos-4 Kernel pannic

Tue Apr 12 20:36:01 UTC 2005
Brian Trudeau <btrudeau at eastek-intl.com>

Bob Pierce wrote:

> Hi all,
>
> We are running a new Centos-4 server, and it has kernel panicked on us 
> 4 times in the last month. After the first kernel panic we hooked up a 
> serial console to the server and captured the output in order to have 
> a record of what happens. I've included the error messages from the 
> last time it locked up… but it doesn't really mean much to me. Anybody 
> have any ideas what might be causing this server lock up?
>
> Server description:
> -Dell PE1750 - dual 2.8Ghz Xeon (with Hyper Threading on) - 2GB DDR 
> RAM - Perc4-DI onboard RAID using 3 scsi drives in raid-5 configuration
>
> -ext3 file system
> -kernel-smp-2.6.9-5.0.3.EL
> -mysql - from distribution
> -2 postfix instances rebuilt with MySQL support
> -amavisd-new
> -clamav
> -spamassassin
> -rbldnsd
> -bind
>
>
> Here's the captured output from a serial console connected to the 
> server at time of fault.
>
> Unable to handle kernel NULL pointer dereference at virtual address 
> 00000000
> printing eip:
> f8872da8
> *pde = 35562001
> Oops: 0000 [#1]
> SMP
> Modules linked in: md5 ipv6 autofs4 sunrpc dm_mod button battery ac 
> ohci_hcd tg3 floppy sg ext3 jbd megaraid_mbox megaraid_mm sd_mod scsi_mod
>
> CPU: 1
> EIP: 0060:[<f8872da8>] Not tainted VLI
> EFLAGS: 00010246 (2.6.9-5.0.3.ELsmp)
> EIP is at __journal_file_buffer+0x1b/0x221 [jbd]
> eax: 00000000 ebx: d2fff26c ecx: 00000008 edx: c2327680
> esi: c2327680 edi: 00000008 ebp: 00000000 esp: f7533dd4
> ds: 007b es: 007b ss: 0068
> Process kjournald (pid: 210, threadinfo=f7533000 task=f75825b0)
> Stack: 00000000 00000000 f148fad8 f7f66200 d2fff26c c2327680 f887351b 
> 00000286
> 00000000 00000000 00000000 00000000 00000000 d2517e6c f7f66200 caa4c50c
> 00001f18 00000000 f75825b0 c011e8d2 f7533e44 f7533e44 f750c054 f8836f24
> Call Trace:
> [<f887351b>] journal_commit_transaction+0x310/0xfb1 [jbd]
> [<c011e8d2>] autoremove_wake_function+0x0/0x2d
> [<f8836f24>] megaraid_isr+0x1ad/0x1bf [megaraid_mbox]
> [<c011e8d2>] autoremove_wake_function+0x0/0x2d
> [<c011bcd5>] finish_task_switch+0x30/0x66
> [<c02c4363>] schedule+0x833/0x869
> [<c0127e62>] del_timer_sync+0x7a/0x9c
> [<f8875e6d>] kjournald+0xc7/0x215 [jbd]
> [<c011e8d2>] autoremove_wake_function+0x0/0x2d
> [<c011e8d2>] autoremove_wake_function+0x0/0x2d
> [<c011bd1d>] schedule_tail+0x12/0x55
> [<f8875da0>] commit_timeout+0x0/0x5 [jbd]
> [<f8875da6>] kjournald+0x0/0x215 [jbd]
> [<c01041f1>] kernel_thread_helper+0x5/0xb
> Code: 14 ba 01 00 00 00 83 c4 10 89 d0 5b 5e 5f 5d c3 55 31 ed 57 89 
> cf 56 89 d6 53 53 53 89 c3 c7 44 24 04 00 00 00 00 8b 00 89 04 24 <8b> 
> 00 a9 00 00 08 00 75 29 68 d4 85 87 f8 68 9b 07 00 00 68 55
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>CentOS mailing list
>CentOS at centos.org
>http://lists.centos.org/mailman/listinfo/centos
>  
>
Looks to me as there is a problem with the RAID, I'm not too familiar 
with LSI oems for dell(I'm guessing it's LSI, since it said something 
about megaraid I'm too lazy to google it), but I'm guessing that 
Perc4-DI is a host raid? I would look into it, and really think about 
getting a hardware raid card if it is. I've had nothing but problems 
with onboard host raids myself, I gave up with them and just went and 
used LVM's software raid, it actually performs much better now. I've 
even seen benchmarks saying the same thing. But we are still switching 
to hardware raid, for much easier restoring.

-- 
Brian Trudeau,   I.T., Q.A. Inspector
Eastek International Corporation
330 Hastings Drive,   Buffalo Grove, IL 60089
Tel: (847) 353-8300 Ext. 213   Fax: (847) 353-8900
Web: http://www.eastek-intl.com   Email: btrudeau at eastek-intl.com
----
The information contained in this electronic mail transmission is intended by Eastek International for the use of the named individual or entity to which it is directed and may contain information that is confidential or privileged.

If you are not the intended recipient, you must not keep, use, disclose, copy or distribute this email without the author's prior permission. We have taken precautions to minimize the risk of transmitting software viruses, but we advise you to carry out your own virus checks on any attachment to this message. We cannot accept liability for any loss or damage caused by software viruses or other attachments.

If you have received this electronic mail transmission in error, please delete it from your system without copying or forwarding it, and notify the sender of the error by reply email so that the sender's address records can be corrected.  Thank you.