[CentOS] stack overflow

Wed Jan 14 20:49:40 UTC 2009
Peter Doherty <doherty at crystal.harvard.edu>

Hi,

I've got a fileserver that runs Centos 5.2.  It's been stable  
otherwise stable for maybe a year or more, and now it's crashed three  
times since Saturday.  The first two times the computer was completely  
unresponsive, and there was nothing on the console, and nothing in the  
logs.  I was beginning to suspect hardware, (esp. RAM, PSU, or maybe a  
failed fan)
Last night it locked up again, but this time there was something in / 
var/log/messages. (see below)
So far my searches indicate that increasing to 8K kernel stacks would  
fix this.  The server has a couple 3ware SATA RAID cards, and I'm  
running xfs.  The server does nightly disk to disk backups of a few  
dozen workstations.  The server does always seem to crash overnight.   
If the problem really is in the kernel stack size, it's odd that it  
just started all of a sudden.
I thought I'd post here before I looked into compiling a new kernel  
with 8K stacks.  Thanks for any advice!

--Peter


2.6.18-53.1.21.el5 #1 SMP Tue May 20 09:34:18 EDT 2008 i686 i686 i386  
GNU/Linux


Jan 14 03:06:21 fs2 kernel: do_IRQ: stack overflow: 464
Jan 14 03:06:21 fs2 kernel: [<c04073bd>] do_IRQ+0x5c/0xae
Jan 14 03:06:21 fs2 kernel: BUG: unable to handle kernel paging request
at virtual address fc7ff0aa
Jan 14 03:06:21 fs2 kernel: printing eip:
Jan 14 03:06:21 fs2 kernel: c0606da4
Jan 14 03:06:21 fs2 kernel: *pde = 00000000
Jan 14 03:06:21 fs2 kernel: BUG: unable to handle kernel paging request
at virtual address fc7ff226
Jan 14 03:06:21 fs2 kernel: printing eip:
Jan 14 03:06:21 fs2 kernel: c060704e
Jan 14 03:06:21 fs2 kernel: *pde = 00000000


----clip for repeating info------


Jan 14 03:06:21 fs2 kernel: BUG: unable to handle kernel paging request
at virtual address fc7ff226
Jan 14 03:06:21 fs2 kernel: printing eip:
Jan 14 03:06:21 fs2 kernel: c060704e
Jan 14 03:06:21 fs2 kernel: *pde = 00000000
Jan 14 03:06:21 fs2 kernel: Oops: 0002 [#1]
Jan 14 03:06:21 fs2 kernel: SMP
Jan 14 03:06:21 fs2 kernel: last sysfs file:
/devices/pci0000:00/0000:00:1e.0/0000:04:04.0/irq
Jan 14 03:06:21 fs2 kernel: Modules linked in: ipv6 autofs4 hidp rfcomm
l2cap bluetooth sunrpc xfs(U) dm_mul
tipath video sbs backlight i2c_ec button battery asus_acpi ac lp
i2c_i801 e7xxx_edac floppy edac_mc i2c_core
ide_cd serio_raw e100 parport_pc cdrom e1000 mii parport intel_rng sg
pcspkr dm_snapshot dm_zero dm_mirror
dm_mod ata_piix libata 3w_9xxx sd_mod scsi_mod ext3 jbd ehci_hcd
ohci_hcd uhci_hcd
Jan 14 03:06:21 fs2 kernel: CPU: -1065923552
Jan 14 03:06:21 fs2 kernel: EIP: 0060:[<c060704e>] Not tainted VLI
Jan 14 03:06:21 fs2 kernel: EFLAGS: 00010046 (2.6.18-53.1.21.el5 #1)
Jan 14 03:06:21 fs2 kernel: EIP is at do_page_fault+0x3c5/0x4b8
Jan 14 03:06:21 fs2 kernel: eax: 00000013 ebx: 00000000 ecx:
c0626aae edx: 00006dbf
Jan 14 03:06:21 fs2 kernel: esi: fc7ff226 edi: fc7ff026 ebp:
00000002 esp: f0b69170
Jan 14 03:06:21 fs2 kernel: ds: 007b es: 007b ss: 0068
Jan 14 03:06:21 fs2 kernel: BUG: unable to handle kernel paging request
at virtual address 000100ca
Jan 14 03:06:21 fs2 kernel: printing eip:
Jan 14 03:06:21 fs2 kernel: c0606da4
Jan 14 03:06:21 fs2 kernel: *pde = 707f0067
Jan 14 03:06:21 fs2 kernel: BUG: unable to handle kernel paging request
at virtual address 826ef258
Jan 14 03:06:21 fs2 kernel: printing eip:
Jan 14 03:06:21 fs2 kernel: c041fa82
Jan 14 03:06:21 fs2 kernel: *pde = 00000000