[CentOS] Re: OOM condition with file-4.17-9.0.1.el5
John
jknappers-argentia at hotmail.com
Sat May 17 12:41:19 UTC 2008
Hello Filip,
I forgot the word NOT in this part of my message.
> For me, the problem was solved, and I had enough proof for my self to say,
> I assume file(1) was the problem. I agree, this is not enough for some
> developers to be convinced, and the problem file is NOT easily found. But
> I believe the problem file should be somewhere in my Amavisd-new
> directory. But at this moment I don't have a test system to test the
> complete directory against file(1)
John
"John" <jknappers-argentia at hotmail.com> schreef
in bericht news:g0mjh8$hah$1 at ger.gmane.org...
>
> "Filipe Brandenburger"
> <filbranden at gmail.com> schreef in bericht
> news:e814db780805161718oeb2763ja80120d58b3abc93 at mail.gmail.com...
>> On Fri, May 16, 2008 at 8:30 AM, John
>> <jknappers-argentia at hotmail.com> wrote:
>>> We just encountered several serious Out of Memory Problems caused by the
>>> file 4.17-9.0.1.EL5 utility on our mail Gateway. CentOS 5.1 kept it self
>>> alive by killing it parent process Amavisd-new. Manually restarting
>>> Amavisd-new and flushing the mail queue caused same problem. Disabling
>>> file
>>> for the moment, caused Amavisd-new to fail, and also defer all mail.
>>
>> I cannot understand it. On which situations would the "file" utility
>> be causing OOM issues? "file" is not a daemon or something that is
>> running all the time. And to generate an OOM situation, you have to
>> exhaust all the memory on your machine.
>>
>> Could you give more details here?
>>
>> How much memory does the machine have?
>> How much swap space does the machine have?
>> Why do you believe "file" is the offending application?
>> Who is using "file" on a regular basis?
>> What types of files are being tested for type with the "file" utility?
>> Can you produce a specific file which causes "file" to allocate a lot of
>> memory?
>> Can you produce the output of "ps" or "top" which shows "file" using a
>> lot of memory just before a crash?
>>
>> If you don't give us such details, it's really hard to believe that
>> "file" is what is causing your issue. Please back your statement.
>>
>> Filipe
>
> Ok, I will try.
> The machine config
> Machine is a HP DL360G3, 2gb Mem, 2Gb swap running CentOS5.1 x86
>
> The purpose of and how file(1) is used.
> The machine is our mail gateway, Running Postfix / Amavisd-new / Clamd /
> Perl Spamassassin
> Al incoming and outgoing mail is handed to Amavisd which extract the mail
> and ran it against for example file(1)
> So every incoming / outgoing mail passes file(1), where file(1) is used to
> enforce mail policy (No exe, com, hta, pif etc is coming in, even if the
> extension is changed).
>
> I cannot provide a example causing file(1) to consume all memory, because,
> I had to get the mail system running again. Now thinking, It had been wise
> to make a copy off the postfix deferred queue before starting the system
> again.
>
> The kernel has logged the OOM condition and killed the parent Amavisd-new
> to keep itself alive.
> I saved that part, and adding it here at the end of the mail
>
> This is what happend just before in the maillog
>
> May 16 13:16:56 gw-0001 amavis[6058]: (06058-01) FWD via SMTP:
> <696e666f726e6c2c4e4c312c33322c31303334 at bounce.msg1svc.net>
> -> <john.xxx at xxxxx.nl>,BODY=8BITMIME 250 2.6.0 Ok,
> id=06058-01, from $
> May 16 13:16:56 gw-0001 amavis[6059]: (06059-02-2) (!!)TROUBLE in
> check_mail: parts_decode_ext FAILED: file(1) utility (/usr/bin/file)
> error: run_command (open pipe): Can't fork at
> /usr/lib/perl5/5.8.8/i386-li$
> May 16 13:16:56 gw-0001 amavis[6059]: (06059-02-2) (!)PRESERVING EVIDENCE
> in /var/amavis/tmp/amavis-20080516T131638-06059
> May 16 13:16:56 gw-0001 amavis[6058]: (06058-01) Passed CLEAN, [backup-mx]
> [66.179.205.75]
> <696e666f726e6c2c4e4c312c33322c31303334 at bounce.msg1svc.net>
> -> <john.xxxx at xxxxx.nl>, Message-ID: <1$
> May 16 13:16:56 gw-0001 postfix/lmtp[7373]: 8196E1380A6:
> to=<john.xxxx at xxxxxx.nl>,
> relay=localhost[127.0.0.1]:10024, delay=12071, delays=12052/0.01/0.03/19,
> dsn=2.0.0, status=sent (250 2.0.0 Ok:$
>
> Maybe Amavisd-new saved the problem message,
> This is the first entry where file failed. The saved message contains a
> normal html message. Nothing strange here.
>
> A bit later in the maillog:
> ay 16 13:17:59 gw-0001 postfix/smtpd[7385]: connect from
> localhost.localdomain[127.0.0.1]
> May 16 13:18:02 gw-0001 amavis[5639]: (!)Net::Server: 2008/05/16-13:18:02
> Bad fork [Cannot allocate memory]\n at line 168 in file
> /usr/lib/perl5/vendor_perl/5.8.8/Net/Server/PreForkSimple.pm
> May 16 13:18:02 gw-0001 amavis[5639]: Net::Server: 2008/05/16-13:18:02
> Server closing!
>
> This combined with the Kernel logging at the end, and after repeatable
> restarting Amavisd-new and flushing the postfix queue, it ran dead in the
> same problem, I started to believe that there may be a problem in file(1).
> After checking with yum list file, I concluded that there were no CentOS
> updates for file.
>
> The easied way to get a newer version of file was to rebuild the SRPM of
> FC9's file-4.23-05 on CentOS 5. It compiled without problems and after
> installing, starting Amavisd-new and flushing the queue al deferred mail
> was delivered without any problems.
>
> For me, the problem was solved, and I had enough proof for my self to say,
> I assume file(1) was the problem. I agree, this is not enough for some
> developers to be convinced, and the problem file is easily found. But I
> believe the problem file should be somewhere in my Amavisd-new directory.
> But at this moment I don't have a test system to test the complete
> directory against file(1)
>
> I hope I provide some of the information you wanted to see.
>
> Regards,
> John
>
>
> May 16 13:17:53 gw-0001 kernel: cmasasd invoked oom-killer:
> gfp_mask=0x201d2, order=0, oomkilladj=0
> May 16 13:17:53 gw-0001 kernel: [<c0457d36>] out_of_memory+0x72/0x1a4
> May 16 13:17:53 gw-0001 kernel: [<c0459161>] __alloc_pages+0x201/0x282
> May 16 13:17:53 gw-0001 kernel: [<c045a3bf>]
> __do_page_cache_readahead+0xc4/0x1c6
> May 16 13:17:53 gw-0001 kernel: [<c0454a21>] sync_page+0x0/0x3b
> May 16 13:17:53 gw-0001 kernel: [<c044fbe5>]
> __delayacct_blkio_end+0x32/0x35
> May 16 13:17:53 gw-0001 kernel: [<c0604c24>] __wait_on_bit_lock+0x4b/0x52
> May 16 13:17:53 gw-0001 kernel: [<c04549a4>] __lock_page+0x51/0x57
> May 16 13:17:53 gw-0001 kernel: [<c04572d8>] filemap_nopage+0x151/0x315
> May 16 13:17:53 gw-0001 kernel: [<c045fda3>]
> __handle_mm_fault+0x172/0x87b
> May 16 13:17:53 gw-0001 kernel: [<c04b43cf>] do_msgrcv+0x241/0x290
> May 16 13:17:53 gw-0001 kernel: [<c0606c2b>] do_page_fault+0x20a/0x4b8
> May 16 13:17:53 gw-0001 kernel: [<c0606a21>] do_page_fault+0x0/0x4b8
> May 16 13:17:53 gw-0001 kernel: [<c0405a71>] error_code+0x39/0x40
> May 16 13:17:53 gw-0001 kernel: =======================
> May 16 13:17:53 gw-0001 kernel: Mem-info:
> May 16 13:17:53 gw-0001 kernel: DMA per-cpu:
> May 16 13:17:53 gw-0001 kernel: cpu 0 hot: high 0, batch 1 used:0
> May 16 13:17:53 gw-0001 kernel: cpu 0 cold: high 0, batch 1 used:0
> May 16 13:17:53 gw-0001 kernel: cpu 1 hot: high 0, batch 1 used:0
> May 16 13:17:53 gw-0001 kernel: cpu 1 cold: high 0, batch 1 used:0
> May 16 13:17:53 gw-0001 kernel: cpu 2 hot: high 0, batch 1 used:0
> May 16 13:17:53 gw-0001 kernel: cpu 2 cold: high 0, batch 1 used:0
> May 16 13:17:53 gw-0001 kernel: cpu 3 hot: high 0, batch 1 used:0
> May 16 13:17:54 gw-0001 kernel: cpu 3 cold: high 0, batch 1 used:0
> May 16 13:17:54 gw-0001 kernel: DMA32 per-cpu: empty
> May 16 13:17:54 gw-0001 kernel: Normal per-cpu:
> May 16 13:17:54 gw-0001 kernel: cpu 0 hot: high 186, batch 31 used:161
> May 16 13:17:54 gw-0001 kernel: cpu 0 cold: high 62, batch 15 used:14
> May 16 13:17:54 gw-0001 kernel: cpu 1 hot: high 186, batch 31 used:44
> May 16 13:17:54 gw-0001 kernel: cpu 1 cold: high 62, batch 15 used:47
> May 16 13:17:54 gw-0001 kernel: cpu 2 hot: high 186, batch 31 used:70
> May 16 13:17:54 gw-0001 kernel: cpu 2 cold: high 62, batch 15 used:15
> May 16 13:17:54 gw-0001 kernel: cpu 3 hot: high 186, batch 31 used:99
> May 16 13:17:54 gw-0001 kernel: cpu 3 cold: high 62, batch 15 used:29
> May 16 13:17:54 gw-0001 kernel: HighMem per-cpu:
> May 16 13:17:54 gw-0001 kernel: cpu 0 hot: high 186, batch 31 used:29
> May 16 13:17:54 gw-0001 kernel: cpu 0 cold: high 62, batch 15 used:42
> May 16 13:17:54 gw-0001 kernel: cpu 1 hot: high 186, batch 31 used:26
> May 16 13:17:54 gw-0001 kernel: cpu 1 cold: high 62, batch 15 used:53
> May 16 13:17:54 gw-0001 kernel: cpu 2 hot: high 186, batch 31 used:23
> May 16 13:17:54 gw-0001 kernel: cpu 2 cold: high 62, batch 15 used:56
> May 16 13:17:54 gw-0001 kernel: cpu 3 hot: high 186, batch 31 used:168
> May 16 13:17:54 gw-0001 kernel: cpu 3 cold: high 62, batch 15 used:42
> May 16 13:17:54 gw-0001 kernel: Free pages: 47492kB (464kB HighMem)
> May 16 13:17:54 gw-0001 kernel: Active:248505 inactive:243082 dirty:0
> writeback:0 unstable:0 free:11873 slab:
> May 16 13:17:54 gw-0001 kernel: Active:248505 inactive:243082 dirty:0
> writeback:0 unstable:0 free:11873 slab:
> 4798 mapped-file:1037 mapped-anon:491633 pagetables:2940
> May 16 13:17:54 gw-0001 kernel: DMA free:8160kB min:36kB low:44kB
> high:52kB active:2340kB inactive:1364kB pre
> sent:16384kB pages_scanned:6857 all_unreclaimable? yes
> May 16 13:17:54 gw-0001 kernel: lowmem_reserve[]: 0 0 880 2031
> May 16 13:17:54 gw-0001 kernel: DMA32 free:0kB min:0kB low:0kB high:0kB
> active:0kB inactive:0kB present:0kB p
> ages_scanned:0 all_unreclaimable? no
> May 16 13:17:54 gw-0001 kernel: lowmem_reserve[]: 0 0 880 2031
> May 16 13:17:54 gw-0001 kernel: Normal free:38868kB min:2008kB low:2508kB
> high:3012kB active:397440kB inactiv
> e:409920kB present:901120kB pages_scanned:1894767 all_unreclaimable? yes
> May 16 13:17:54 gw-0001 kernel: lowmem_reserve[]: 0 0 0 9215
> May 16 13:17:54 gw-0001 kernel: HighMem free:464kB min:512kB low:1168kB
> high:1828kB active:594240kB inactive:
> 561044kB present:1179624kB pages_scanned:4549828 all_unreclaimable? yes
> May 16 13:17:54 gw-0001 kernel: lowmem_reserve[]: 0 0 0 0
> May 16 13:17:54 gw-0001 kernel: DMA: 0*4kB 0*8kB 126*16kB 60*32kB 10*64kB
> 0*128kB 2*256kB 0*512kB 1*1024kB 1*
> 2048kB 0*4096kB = 8160kB
> May 16 13:17:54 gw-0001 kernel: DMA32: empty
> May 16 13:17:54 gw-0001 kernel: Normal: 1*4kB 28*8kB 2059*16kB 120*32kB
> 1*64kB 0*128kB 1*256kB 1*512kB 1*1024
> kB 0*2048kB 0*4096kB = 38868kB
> May 16 13:17:54 gw-0001 kernel: HighMem: 4*4kB 8*8kB 0*16kB 2*32kB 1*64kB
> 2*128kB 0*256kB 0*512kB 0*1024kB 0*
> 2048kB 0*4096kB = 464kB
> May 16 13:17:54 gw-0001 kernel: Swap cache: add 591115, delete 590893,
> find 5038951/5045583, race 0+568
> May 16 13:17:54 gw-0001 kernel: Free swap = 0kB
> May 16 13:17:54 gw-0001 kernel: Total swap = 2097096kB
> May 16 13:17:54 gw-0001 kernel: Free swap: 0kB
> May 16 13:17:54 gw-0001 kernel: 524282 pages of RAM
> May 16 13:17:54 gw-0001 kernel: 294906 pages of HIGHMEM
> May 16 13:17:54 gw-0001 kernel: 5445 reserved pages
> May 16 13:17:54 gw-0001 kernel: 357914 pages shared
> May 16 13:17:54 gw-0001 kernel: 222 pages swap cached
> May 16 13:17:54 gw-0001 kernel: 0 pages dirty
> May 16 13:17:54 gw-0001 kernel: 0 pages writeback
> May 16 13:17:54 gw-0001 kernel: 1037 pages mapped
> May 16 13:17:54 gw-0001 kernel: 4798 pages slab
> May 16 13:17:54 gw-0001 kernel: 2940 pages pagetables
> May 16 13:17:54 gw-0001 kernel: Out of memory: Killed process 6054
> (amavisd).
>
>
> Version ussed:
> unane -a
> Linux gw-0001.xxx.yyy.com 2.6.18-53.1.14.el5 #1 SMP Wed Mar 5 11:36:49 EST
> 2008 i686 i686 i386 GNU/Linux
>
> rpm -q postfix postfix-2.4.7-3.el5.mysql_pgsql.jk
> rpm -q amvisd-new amavisd-new-2.5.4-5.el5.jk
> rpm -q perl-spamassassin spamassassin-3.2.4-1.el5.rf
> rpm -q clamd clamd-0.93-2.el5.rf
> rpm -q file file-4.23-5.el5.jk (This is the
> backported version, it the problem version is 4.17-9.0.1
More information about the CentOS
mailing list