"Filipe Brandenburger" <filbranden at gmail.com> schreef in bericht news:e814db780805161718oeb2763ja80120d58b3abc93 at mail.gmail.com... > On Fri, May 16, 2008 at 8:30 AM, John > <jknappers-argentia at hotmail.com> wrote: >> We just encountered several serious Out of Memory Problems caused by the >> file 4.17-9.0.1.EL5 utility on our mail Gateway. CentOS 5.1 kept it self >> alive by killing it parent process Amavisd-new. Manually restarting >> Amavisd-new and flushing the mail queue caused same problem. Disabling >> file >> for the moment, caused Amavisd-new to fail, and also defer all mail. > > I cannot understand it. On which situations would the "file" utility > be causing OOM issues? "file" is not a daemon or something that is > running all the time. And to generate an OOM situation, you have to > exhaust all the memory on your machine. > > Could you give more details here? > > How much memory does the machine have? > How much swap space does the machine have? > Why do you believe "file" is the offending application? > Who is using "file" on a regular basis? > What types of files are being tested for type with the "file" utility? > Can you produce a specific file which causes "file" to allocate a lot of > memory? > Can you produce the output of "ps" or "top" which shows "file" using a > lot of memory just before a crash? > > If you don't give us such details, it's really hard to believe that > "file" is what is causing your issue. Please back your statement. > > Filipe Ok, I will try. The machine config Machine is a HP DL360G3, 2gb Mem, 2Gb swap running CentOS5.1 x86 The purpose of and how file(1) is used. The machine is our mail gateway, Running Postfix / Amavisd-new / Clamd / Perl Spamassassin Al incoming and outgoing mail is handed to Amavisd which extract the mail and ran it against for example file(1) So every incoming / outgoing mail passes file(1), where file(1) is used to enforce mail policy (No exe, com, hta, pif etc is coming in, even if the extension is changed). I cannot provide a example causing file(1) to consume all memory, because, I had to get the mail system running again. Now thinking, It had been wise to make a copy off the postfix deferred queue before starting the system again. The kernel has logged the OOM condition and killed the parent Amavisd-new to keep itself alive. I saved that part, and adding it here at the end of the mail This is what happend just before in the maillog May 16 13:16:56 gw-0001 amavis[6058]: (06058-01) FWD via SMTP: <696e666f726e6c2c4e4c312c33322c31303334 at bounce.msg1svc.net> -> <john.xxx at xxxxx.nl>,BODY=8BITMIME 250 2.6.0 Ok, id=06058-01, from $ May 16 13:16:56 gw-0001 amavis[6059]: (06059-02-2) (!!)TROUBLE in check_mail: parts_decode_ext FAILED: file(1) utility (/usr/bin/file) error: run_command (open pipe): Can't fork at /usr/lib/perl5/5.8.8/i386-li$ May 16 13:16:56 gw-0001 amavis[6059]: (06059-02-2) (!)PRESERVING EVIDENCE in /var/amavis/tmp/amavis-20080516T131638-06059 May 16 13:16:56 gw-0001 amavis[6058]: (06058-01) Passed CLEAN, [backup-mx] [66.179.205.75] <696e666f726e6c2c4e4c312c33322c31303334 at bounce.msg1svc.net> -> <john.xxxx at xxxxx.nl>, Message-ID: <1$ May 16 13:16:56 gw-0001 postfix/lmtp[7373]: 8196E1380A6: to=<john.xxxx at xxxxxx.nl>, relay=localhost[127.0.0.1]:10024, delay=12071, delays=12052/0.01/0.03/19, dsn=2.0.0, status=sent (250 2.0.0 Ok:$ Maybe Amavisd-new saved the problem message, This is the first entry where file failed. The saved message contains a normal html message. Nothing strange here. A bit later in the maillog: ay 16 13:17:59 gw-0001 postfix/smtpd[7385]: connect from localhost.localdomain[127.0.0.1] May 16 13:18:02 gw-0001 amavis[5639]: (!)Net::Server: 2008/05/16-13:18:02 Bad fork [Cannot allocate memory]\n at line 168 in file /usr/lib/perl5/vendor_perl/5.8.8/Net/Server/PreForkSimple.pm May 16 13:18:02 gw-0001 amavis[5639]: Net::Server: 2008/05/16-13:18:02 Server closing! This combined with the Kernel logging at the end, and after repeatable restarting Amavisd-new and flushing the postfix queue, it ran dead in the same problem, I started to believe that there may be a problem in file(1). After checking with yum list file, I concluded that there were no CentOS updates for file. The easied way to get a newer version of file was to rebuild the SRPM of FC9's file-4.23-05 on CentOS 5. It compiled without problems and after installing, starting Amavisd-new and flushing the queue al deferred mail was delivered without any problems. For me, the problem was solved, and I had enough proof for my self to say, I assume file(1) was the problem. I agree, this is not enough for some developers to be convinced, and the problem file is easily found. But I believe the problem file should be somewhere in my Amavisd-new directory. But at this moment I don't have a test system to test the complete directory against file(1) I hope I provide some of the information you wanted to see. Regards, John May 16 13:17:53 gw-0001 kernel: cmasasd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0 May 16 13:17:53 gw-0001 kernel: [<c0457d36>] out_of_memory+0x72/0x1a4 May 16 13:17:53 gw-0001 kernel: [<c0459161>] __alloc_pages+0x201/0x282 May 16 13:17:53 gw-0001 kernel: [<c045a3bf>] __do_page_cache_readahead+0xc4/0x1c6 May 16 13:17:53 gw-0001 kernel: [<c0454a21>] sync_page+0x0/0x3b May 16 13:17:53 gw-0001 kernel: [<c044fbe5>] __delayacct_blkio_end+0x32/0x35 May 16 13:17:53 gw-0001 kernel: [<c0604c24>] __wait_on_bit_lock+0x4b/0x52 May 16 13:17:53 gw-0001 kernel: [<c04549a4>] __lock_page+0x51/0x57 May 16 13:17:53 gw-0001 kernel: [<c04572d8>] filemap_nopage+0x151/0x315 May 16 13:17:53 gw-0001 kernel: [<c045fda3>] __handle_mm_fault+0x172/0x87b May 16 13:17:53 gw-0001 kernel: [<c04b43cf>] do_msgrcv+0x241/0x290 May 16 13:17:53 gw-0001 kernel: [<c0606c2b>] do_page_fault+0x20a/0x4b8 May 16 13:17:53 gw-0001 kernel: [<c0606a21>] do_page_fault+0x0/0x4b8 May 16 13:17:53 gw-0001 kernel: [<c0405a71>] error_code+0x39/0x40 May 16 13:17:53 gw-0001 kernel: ======================= May 16 13:17:53 gw-0001 kernel: Mem-info: May 16 13:17:53 gw-0001 kernel: DMA per-cpu: May 16 13:17:53 gw-0001 kernel: cpu 0 hot: high 0, batch 1 used:0 May 16 13:17:53 gw-0001 kernel: cpu 0 cold: high 0, batch 1 used:0 May 16 13:17:53 gw-0001 kernel: cpu 1 hot: high 0, batch 1 used:0 May 16 13:17:53 gw-0001 kernel: cpu 1 cold: high 0, batch 1 used:0 May 16 13:17:53 gw-0001 kernel: cpu 2 hot: high 0, batch 1 used:0 May 16 13:17:53 gw-0001 kernel: cpu 2 cold: high 0, batch 1 used:0 May 16 13:17:53 gw-0001 kernel: cpu 3 hot: high 0, batch 1 used:0 May 16 13:17:54 gw-0001 kernel: cpu 3 cold: high 0, batch 1 used:0 May 16 13:17:54 gw-0001 kernel: DMA32 per-cpu: empty May 16 13:17:54 gw-0001 kernel: Normal per-cpu: May 16 13:17:54 gw-0001 kernel: cpu 0 hot: high 186, batch 31 used:161 May 16 13:17:54 gw-0001 kernel: cpu 0 cold: high 62, batch 15 used:14 May 16 13:17:54 gw-0001 kernel: cpu 1 hot: high 186, batch 31 used:44 May 16 13:17:54 gw-0001 kernel: cpu 1 cold: high 62, batch 15 used:47 May 16 13:17:54 gw-0001 kernel: cpu 2 hot: high 186, batch 31 used:70 May 16 13:17:54 gw-0001 kernel: cpu 2 cold: high 62, batch 15 used:15 May 16 13:17:54 gw-0001 kernel: cpu 3 hot: high 186, batch 31 used:99 May 16 13:17:54 gw-0001 kernel: cpu 3 cold: high 62, batch 15 used:29 May 16 13:17:54 gw-0001 kernel: HighMem per-cpu: May 16 13:17:54 gw-0001 kernel: cpu 0 hot: high 186, batch 31 used:29 May 16 13:17:54 gw-0001 kernel: cpu 0 cold: high 62, batch 15 used:42 May 16 13:17:54 gw-0001 kernel: cpu 1 hot: high 186, batch 31 used:26 May 16 13:17:54 gw-0001 kernel: cpu 1 cold: high 62, batch 15 used:53 May 16 13:17:54 gw-0001 kernel: cpu 2 hot: high 186, batch 31 used:23 May 16 13:17:54 gw-0001 kernel: cpu 2 cold: high 62, batch 15 used:56 May 16 13:17:54 gw-0001 kernel: cpu 3 hot: high 186, batch 31 used:168 May 16 13:17:54 gw-0001 kernel: cpu 3 cold: high 62, batch 15 used:42 May 16 13:17:54 gw-0001 kernel: Free pages: 47492kB (464kB HighMem) May 16 13:17:54 gw-0001 kernel: Active:248505 inactive:243082 dirty:0 writeback:0 unstable:0 free:11873 slab: May 16 13:17:54 gw-0001 kernel: Active:248505 inactive:243082 dirty:0 writeback:0 unstable:0 free:11873 slab: 4798 mapped-file:1037 mapped-anon:491633 pagetables:2940 May 16 13:17:54 gw-0001 kernel: DMA free:8160kB min:36kB low:44kB high:52kB active:2340kB inactive:1364kB pre sent:16384kB pages_scanned:6857 all_unreclaimable? yes May 16 13:17:54 gw-0001 kernel: lowmem_reserve[]: 0 0 880 2031 May 16 13:17:54 gw-0001 kernel: DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB p ages_scanned:0 all_unreclaimable? no May 16 13:17:54 gw-0001 kernel: lowmem_reserve[]: 0 0 880 2031 May 16 13:17:54 gw-0001 kernel: Normal free:38868kB min:2008kB low:2508kB high:3012kB active:397440kB inactiv e:409920kB present:901120kB pages_scanned:1894767 all_unreclaimable? yes May 16 13:17:54 gw-0001 kernel: lowmem_reserve[]: 0 0 0 9215 May 16 13:17:54 gw-0001 kernel: HighMem free:464kB min:512kB low:1168kB high:1828kB active:594240kB inactive: 561044kB present:1179624kB pages_scanned:4549828 all_unreclaimable? yes May 16 13:17:54 gw-0001 kernel: lowmem_reserve[]: 0 0 0 0 May 16 13:17:54 gw-0001 kernel: DMA: 0*4kB 0*8kB 126*16kB 60*32kB 10*64kB 0*128kB 2*256kB 0*512kB 1*1024kB 1* 2048kB 0*4096kB = 8160kB May 16 13:17:54 gw-0001 kernel: DMA32: empty May 16 13:17:54 gw-0001 kernel: Normal: 1*4kB 28*8kB 2059*16kB 120*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024 kB 0*2048kB 0*4096kB = 38868kB May 16 13:17:54 gw-0001 kernel: HighMem: 4*4kB 8*8kB 0*16kB 2*32kB 1*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 0* 2048kB 0*4096kB = 464kB May 16 13:17:54 gw-0001 kernel: Swap cache: add 591115, delete 590893, find 5038951/5045583, race 0+568 May 16 13:17:54 gw-0001 kernel: Free swap = 0kB May 16 13:17:54 gw-0001 kernel: Total swap = 2097096kB May 16 13:17:54 gw-0001 kernel: Free swap: 0kB May 16 13:17:54 gw-0001 kernel: 524282 pages of RAM May 16 13:17:54 gw-0001 kernel: 294906 pages of HIGHMEM May 16 13:17:54 gw-0001 kernel: 5445 reserved pages May 16 13:17:54 gw-0001 kernel: 357914 pages shared May 16 13:17:54 gw-0001 kernel: 222 pages swap cached May 16 13:17:54 gw-0001 kernel: 0 pages dirty May 16 13:17:54 gw-0001 kernel: 0 pages writeback May 16 13:17:54 gw-0001 kernel: 1037 pages mapped May 16 13:17:54 gw-0001 kernel: 4798 pages slab May 16 13:17:54 gw-0001 kernel: 2940 pages pagetables May 16 13:17:54 gw-0001 kernel: Out of memory: Killed process 6054 (amavisd). Version ussed: unane -a Linux gw-0001.xxx.yyy.com 2.6.18-53.1.14.el5 #1 SMP Wed Mar 5 11:36:49 EST 2008 i686 i686 i386 GNU/Linux rpm -q postfix postfix-2.4.7-3.el5.mysql_pgsql.jk rpm -q amvisd-new amavisd-new-2.5.4-5.el5.jk rpm -q perl-spamassassin spamassassin-3.2.4-1.el5.rf rpm -q clamd clamd-0.93-2.el5.rf rpm -q file file-4.23-5.el5.jk (This is the backported version, it the problem version is 4.17-9.0.1