[CentOS] Centos4 SMP Kernel OOM

Wed Jun 1 12:34:07 UTC 2005
Johnny Hughes <mailing-lists at hughesjr.com>

On Wed, 2005-06-01 at 00:17 +0200, Maciej Żenczykowski wrote:
> Hello,
> 
> I've just run out of memory on a dual xeon with 5GB ram,
> considering there should have been around 4GB free (not counting 
> buffers and cache)... this is unusual.
> 
> Now after it OOM'ed I tried running top and memory usage was fine
> (around 1GB of 5, no swap usage of 12GB).
> 
> So I thought it was a temporary thing, but processes kept on
> OOM'ing for no understandable reason...
> while swap was empty and memory continued to show 3GB free
> (and look at the weird log messages...)
> 
> A reboot helped though, but still... :)
> 
> [this is the normal CentOS4 i686 SMP kernel 2.6.9-5.0.5.ELsmp]
> 
> May 31 22:31:25 tcs kernel: oom-killer: gfp_mask=0xd0
> May 31 22:31:25 tcs kernel: DMA per-cpu:
> May 31 22:31:25 tcs kernel: cpu 0 hot: low 2, high 6, batch 1
> May 31 22:31:25 tcs kernel: cpu 0 cold: low 0, high 2, batch 1
> May 31 22:31:25 tcs kernel: cpu 1 hot: low 2, high 6, batch 1
> May 31 22:31:25 tcs kernel: cpu 1 cold: low 0, high 2, batch 1
> May 31 22:31:25 tcs kernel: cpu 2 hot: low 2, high 6, batch 1
> May 31 22:31:25 tcs kernel: cpu 2 cold: low 0, high 2, batch 1
> May 31 22:31:25 tcs kernel: cpu 3 hot: low 2, high 6, batch 1
> May 31 22:31:25 tcs kernel: cpu 3 cold: low 0, high 2, batch 1
> May 31 22:31:25 tcs kernel: Normal per-cpu:
> May 31 22:31:25 tcs kernel: cpu 0 hot: low 32, high 96, batch 16
> May 31 22:31:25 tcs kernel: cpu 0 cold: low 0, high 32, batch 16
> May 31 22:31:26 tcs postfix: Process did not exit cleanly, returned 0 with 
> signal 9
> May 31 22:31:26 tcs kernel: cpu 1 hot: low 32, high 96, batch 16
> May 31 22:31:26 tcs kernel: cpu 1 cold: low 0, high 32, batch 16
> May 31 22:31:26 tcs kernel: cpu 2 hot: low 32, high 96, batch 16
> May 31 22:31:26 tcs kernel: cpu 2 cold: low 0, high 32, batch 16
> May 31 22:31:26 tcs kernel: cpu 3 hot: low 32, high 96, batch 16
> May 31 22:31:26 tcs kernel: cpu 3 cold: low 0, high 32, batch 16
> May 31 22:31:26 tcs kernel: HighMem per-cpu:
> May 31 22:31:26 tcs kernel: cpu 0 hot: low 32, high 96, batch 16
> May 31 22:31:26 tcs kernel: cpu 0 cold: low 0, high 32, batch 16
> May 31 22:31:26 tcs kernel: cpu 1 hot: low 32, high 96, batch 16
> May 31 22:31:26 tcs kernel: cpu 1 cold: low 0, high 32, batch 16
> May 31 22:31:26 tcs kernel: cpu 2 hot: low 32, high 96, batch 16
> May 31 22:31:26 tcs kernel: cpu 2 cold: low 0, high 32, batch 16
> May 31 22:31:26 tcs kernel: cpu 3 hot: low 32, high 96, batch 16
> May 31 22:31:26 tcs kernel: cpu 3 cold: low 0, high 32, batch 16
> May 31 22:31:26 tcs kernel:
> May 31 22:31:26 tcs kernel: Free pages:     3579344kB (3578432kB HighMem)
> May 31 22:31:26 tcs kernel: Active:96900 inactive:20481 dirty:0 
> writeback:0 unstable:0 free:894836 slab:212200 mapped:86012 pagetabl
> es:1438
> May 31 22:31:26 tcs kernel: DMA free:16kB min:16kB low:32kB high:48kB 
> active:16kB inactive:0kB present:16384kB
> May 31 22:31:26 tcs kernel: protections[]: 0 0 0
> May 31 22:31:26 tcs kernel: Normal free:896kB min:936kB low:1872kB 
> high:2808kB active:240kB inactive:416kB present:901120kB
> May 31 22:31:26 tcs kernel: protections[]: 0 0 0
> May 31 22:31:26 tcs kernel: HighMem free:3578432kB min:512kB low:1024kB 
> high:1536kB active:387344kB inactive:81508kB present:4325372
> kB
> May 31 22:31:26 tcs kernel: protections[]: 0 0 0
> May 31 22:31:26 tcs kernel: DMA: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16kB
> May 31 22:31:26 tcs kernel: Normal: 0*4kB 0*8kB 2*16kB 1*32kB 1*64kB 
> 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 896kB
> May 31 22:31:26 tcs kernel: HighMem: 1560*4kB 5190*8kB 4683*16kB 1632*32kB 
> 2304*64kB 1512*128kB 781*256kB 349*512kB 157*1024kB 66*20
> 48kB 583*4096kB = 3578432kB
> May 31 22:31:26 tcs kernel: Swap cache: add 0, delete 0, find 0/0, race 
> 0+0
> May 31 22:31:26 tcs kernel: Out of Memory: Killed process 11250 
> (MailScanner).
> 
> and again moments later...
> 
> May 31 22:31:30 tcs kernel: oom-killer: gfp_mask=0xd0
> ...
> 
> and repeat two dozen or more times for different processes.

OK ... there are known memory leak issues on that kernel

Your problem is similar, but not exactly like:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=150971

I can give you a kernel (beta release U1 kernel) OR you can install a
beta kernel from the redhat developer to test if it fixes your issue:

http://people.redhat.com/davej/kernels/RHEL4/RPMS.kernel/

(the kernel I have built it 2.6.9-6.37 ... it has it's own issues.  The
kernel issues seem to be what is holding up Update 1)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.centos.org/pipermail/centos/attachments/20050601/a64d88c4/attachment-0004.sig>