Hello Filip, I forgot the word NOT in this part of my message. > For me, the problem was solved, and I had enough proof for my self to say, > I assume file(1) was the problem. I agree, this is not enough for some > developers to be convinced, and the problem file is NOT easily found. But > I believe the problem file should be somewhere in my Amavisd-new > directory. But at this moment I don't have a test system to test the > complete directory against file(1) John "John" <jknappers-argentia at hotmail.com> schreef in bericht news:g0mjh8$hah$1 at ger.gmane.org... > > "Filipe Brandenburger" > <filbranden at gmail.com> schreef in bericht > news:e814db780805161718oeb2763ja80120d58b3abc93 at mail.gmail.com... >> On Fri, May 16, 2008 at 8:30 AM, John >> <jknappers-argentia at hotmail.com> wrote: >>> We just encountered several serious Out of Memory Problems caused by the >>> file 4.17-9.0.1.EL5 utility on our mail Gateway. CentOS 5.1 kept it self >>> alive by killing it parent process Amavisd-new. Manually restarting >>> Amavisd-new and flushing the mail queue caused same problem. Disabling >>> file >>> for the moment, caused Amavisd-new to fail, and also defer all mail. >> >> I cannot understand it. On which situations would the "file" utility >> be causing OOM issues? "file" is not a daemon or something that is >> running all the time. And to generate an OOM situation, you have to >> exhaust all the memory on your machine. >> >> Could you give more details here? >> >> How much memory does the machine have? >> How much swap space does the machine have? >> Why do you believe "file" is the offending application? >> Who is using "file" on a regular basis? >> What types of files are being tested for type with the "file" utility? >> Can you produce a specific file which causes "file" to allocate a lot of >> memory? >> Can you produce the output of "ps" or "top" which shows "file" using a >> lot of memory just before a crash? >> >> If you don't give us such details, it's really hard to believe that >> "file" is what is causing your issue. Please back your statement. >> >> Filipe > > Ok, I will try. > The machine config > Machine is a HP DL360G3, 2gb Mem, 2Gb swap running CentOS5.1 x86 > > The purpose of and how file(1) is used. > The machine is our mail gateway, Running Postfix / Amavisd-new / Clamd / > Perl Spamassassin > Al incoming and outgoing mail is handed to Amavisd which extract the mail > and ran it against for example file(1) > So every incoming / outgoing mail passes file(1), where file(1) is used to > enforce mail policy (No exe, com, hta, pif etc is coming in, even if the > extension is changed). > > I cannot provide a example causing file(1) to consume all memory, because, > I had to get the mail system running again. Now thinking, It had been wise > to make a copy off the postfix deferred queue before starting the system > again. > > The kernel has logged the OOM condition and killed the parent Amavisd-new > to keep itself alive. > I saved that part, and adding it here at the end of the mail > > This is what happend just before in the maillog > > May 16 13:16:56 gw-0001 amavis[6058]: (06058-01) FWD via SMTP: > <696e666f726e6c2c4e4c312c33322c31303334 at bounce.msg1svc.net> > -> <john.xxx at xxxxx.nl>,BODY=8BITMIME 250 2.6.0 Ok, > id=06058-01, from $ > May 16 13:16:56 gw-0001 amavis[6059]: (06059-02-2) (!!)TROUBLE in > check_mail: parts_decode_ext FAILED: file(1) utility (/usr/bin/file) > error: run_command (open pipe): Can't fork at > /usr/lib/perl5/5.8.8/i386-li$ > May 16 13:16:56 gw-0001 amavis[6059]: (06059-02-2) (!)PRESERVING EVIDENCE > in /var/amavis/tmp/amavis-20080516T131638-06059 > May 16 13:16:56 gw-0001 amavis[6058]: (06058-01) Passed CLEAN, [backup-mx] > [66.179.205.75] > <696e666f726e6c2c4e4c312c33322c31303334 at bounce.msg1svc.net> > -> <john.xxxx at xxxxx.nl>, Message-ID: <1$ > May 16 13:16:56 gw-0001 postfix/lmtp[7373]: 8196E1380A6: > to=<john.xxxx at xxxxxx.nl>, > relay=localhost[127.0.0.1]:10024, delay=12071, delays=12052/0.01/0.03/19, > dsn=2.0.0, status=sent (250 2.0.0 Ok:$ > > Maybe Amavisd-new saved the problem message, > This is the first entry where file failed. The saved message contains a > normal html message. Nothing strange here. > > A bit later in the maillog: > ay 16 13:17:59 gw-0001 postfix/smtpd[7385]: connect from > localhost.localdomain[127.0.0.1] > May 16 13:18:02 gw-0001 amavis[5639]: (!)Net::Server: 2008/05/16-13:18:02 > Bad fork [Cannot allocate memory]\n at line 168 in file > /usr/lib/perl5/vendor_perl/5.8.8/Net/Server/PreForkSimple.pm > May 16 13:18:02 gw-0001 amavis[5639]: Net::Server: 2008/05/16-13:18:02 > Server closing! > > This combined with the Kernel logging at the end, and after repeatable > restarting Amavisd-new and flushing the postfix queue, it ran dead in the > same problem, I started to believe that there may be a problem in file(1). > After checking with yum list file, I concluded that there were no CentOS > updates for file. > > The easied way to get a newer version of file was to rebuild the SRPM of > FC9's file-4.23-05 on CentOS 5. It compiled without problems and after > installing, starting Amavisd-new and flushing the queue al deferred mail > was delivered without any problems. > > For me, the problem was solved, and I had enough proof for my self to say, > I assume file(1) was the problem. I agree, this is not enough for some > developers to be convinced, and the problem file is easily found. But I > believe the problem file should be somewhere in my Amavisd-new directory. > But at this moment I don't have a test system to test the complete > directory against file(1) > > I hope I provide some of the information you wanted to see. > > Regards, > John > > > May 16 13:17:53 gw-0001 kernel: cmasasd invoked oom-killer: > gfp_mask=0x201d2, order=0, oomkilladj=0 > May 16 13:17:53 gw-0001 kernel: [<c0457d36>] out_of_memory+0x72/0x1a4 > May 16 13:17:53 gw-0001 kernel: [<c0459161>] __alloc_pages+0x201/0x282 > May 16 13:17:53 gw-0001 kernel: [<c045a3bf>] > __do_page_cache_readahead+0xc4/0x1c6 > May 16 13:17:53 gw-0001 kernel: [<c0454a21>] sync_page+0x0/0x3b > May 16 13:17:53 gw-0001 kernel: [<c044fbe5>] > __delayacct_blkio_end+0x32/0x35 > May 16 13:17:53 gw-0001 kernel: [<c0604c24>] __wait_on_bit_lock+0x4b/0x52 > May 16 13:17:53 gw-0001 kernel: [<c04549a4>] __lock_page+0x51/0x57 > May 16 13:17:53 gw-0001 kernel: [<c04572d8>] filemap_nopage+0x151/0x315 > May 16 13:17:53 gw-0001 kernel: [<c045fda3>] > __handle_mm_fault+0x172/0x87b > May 16 13:17:53 gw-0001 kernel: [<c04b43cf>] do_msgrcv+0x241/0x290 > May 16 13:17:53 gw-0001 kernel: [<c0606c2b>] do_page_fault+0x20a/0x4b8 > May 16 13:17:53 gw-0001 kernel: [<c0606a21>] do_page_fault+0x0/0x4b8 > May 16 13:17:53 gw-0001 kernel: [<c0405a71>] error_code+0x39/0x40 > May 16 13:17:53 gw-0001 kernel: ======================= > May 16 13:17:53 gw-0001 kernel: Mem-info: > May 16 13:17:53 gw-0001 kernel: DMA per-cpu: > May 16 13:17:53 gw-0001 kernel: cpu 0 hot: high 0, batch 1 used:0 > May 16 13:17:53 gw-0001 kernel: cpu 0 cold: high 0, batch 1 used:0 > May 16 13:17:53 gw-0001 kernel: cpu 1 hot: high 0, batch 1 used:0 > May 16 13:17:53 gw-0001 kernel: cpu 1 cold: high 0, batch 1 used:0 > May 16 13:17:53 gw-0001 kernel: cpu 2 hot: high 0, batch 1 used:0 > May 16 13:17:53 gw-0001 kernel: cpu 2 cold: high 0, batch 1 used:0 > May 16 13:17:53 gw-0001 kernel: cpu 3 hot: high 0, batch 1 used:0 > May 16 13:17:54 gw-0001 kernel: cpu 3 cold: high 0, batch 1 used:0 > May 16 13:17:54 gw-0001 kernel: DMA32 per-cpu: empty > May 16 13:17:54 gw-0001 kernel: Normal per-cpu: > May 16 13:17:54 gw-0001 kernel: cpu 0 hot: high 186, batch 31 used:161 > May 16 13:17:54 gw-0001 kernel: cpu 0 cold: high 62, batch 15 used:14 > May 16 13:17:54 gw-0001 kernel: cpu 1 hot: high 186, batch 31 used:44 > May 16 13:17:54 gw-0001 kernel: cpu 1 cold: high 62, batch 15 used:47 > May 16 13:17:54 gw-0001 kernel: cpu 2 hot: high 186, batch 31 used:70 > May 16 13:17:54 gw-0001 kernel: cpu 2 cold: high 62, batch 15 used:15 > May 16 13:17:54 gw-0001 kernel: cpu 3 hot: high 186, batch 31 used:99 > May 16 13:17:54 gw-0001 kernel: cpu 3 cold: high 62, batch 15 used:29 > May 16 13:17:54 gw-0001 kernel: HighMem per-cpu: > May 16 13:17:54 gw-0001 kernel: cpu 0 hot: high 186, batch 31 used:29 > May 16 13:17:54 gw-0001 kernel: cpu 0 cold: high 62, batch 15 used:42 > May 16 13:17:54 gw-0001 kernel: cpu 1 hot: high 186, batch 31 used:26 > May 16 13:17:54 gw-0001 kernel: cpu 1 cold: high 62, batch 15 used:53 > May 16 13:17:54 gw-0001 kernel: cpu 2 hot: high 186, batch 31 used:23 > May 16 13:17:54 gw-0001 kernel: cpu 2 cold: high 62, batch 15 used:56 > May 16 13:17:54 gw-0001 kernel: cpu 3 hot: high 186, batch 31 used:168 > May 16 13:17:54 gw-0001 kernel: cpu 3 cold: high 62, batch 15 used:42 > May 16 13:17:54 gw-0001 kernel: Free pages: 47492kB (464kB HighMem) > May 16 13:17:54 gw-0001 kernel: Active:248505 inactive:243082 dirty:0 > writeback:0 unstable:0 free:11873 slab: > May 16 13:17:54 gw-0001 kernel: Active:248505 inactive:243082 dirty:0 > writeback:0 unstable:0 free:11873 slab: > 4798 mapped-file:1037 mapped-anon:491633 pagetables:2940 > May 16 13:17:54 gw-0001 kernel: DMA free:8160kB min:36kB low:44kB > high:52kB active:2340kB inactive:1364kB pre > sent:16384kB pages_scanned:6857 all_unreclaimable? yes > May 16 13:17:54 gw-0001 kernel: lowmem_reserve[]: 0 0 880 2031 > May 16 13:17:54 gw-0001 kernel: DMA32 free:0kB min:0kB low:0kB high:0kB > active:0kB inactive:0kB present:0kB p > ages_scanned:0 all_unreclaimable? no > May 16 13:17:54 gw-0001 kernel: lowmem_reserve[]: 0 0 880 2031 > May 16 13:17:54 gw-0001 kernel: Normal free:38868kB min:2008kB low:2508kB > high:3012kB active:397440kB inactiv > e:409920kB present:901120kB pages_scanned:1894767 all_unreclaimable? yes > May 16 13:17:54 gw-0001 kernel: lowmem_reserve[]: 0 0 0 9215 > May 16 13:17:54 gw-0001 kernel: HighMem free:464kB min:512kB low:1168kB > high:1828kB active:594240kB inactive: > 561044kB present:1179624kB pages_scanned:4549828 all_unreclaimable? yes > May 16 13:17:54 gw-0001 kernel: lowmem_reserve[]: 0 0 0 0 > May 16 13:17:54 gw-0001 kernel: DMA: 0*4kB 0*8kB 126*16kB 60*32kB 10*64kB > 0*128kB 2*256kB 0*512kB 1*1024kB 1* > 2048kB 0*4096kB = 8160kB > May 16 13:17:54 gw-0001 kernel: DMA32: empty > May 16 13:17:54 gw-0001 kernel: Normal: 1*4kB 28*8kB 2059*16kB 120*32kB > 1*64kB 0*128kB 1*256kB 1*512kB 1*1024 > kB 0*2048kB 0*4096kB = 38868kB > May 16 13:17:54 gw-0001 kernel: HighMem: 4*4kB 8*8kB 0*16kB 2*32kB 1*64kB > 2*128kB 0*256kB 0*512kB 0*1024kB 0* > 2048kB 0*4096kB = 464kB > May 16 13:17:54 gw-0001 kernel: Swap cache: add 591115, delete 590893, > find 5038951/5045583, race 0+568 > May 16 13:17:54 gw-0001 kernel: Free swap = 0kB > May 16 13:17:54 gw-0001 kernel: Total swap = 2097096kB > May 16 13:17:54 gw-0001 kernel: Free swap: 0kB > May 16 13:17:54 gw-0001 kernel: 524282 pages of RAM > May 16 13:17:54 gw-0001 kernel: 294906 pages of HIGHMEM > May 16 13:17:54 gw-0001 kernel: 5445 reserved pages > May 16 13:17:54 gw-0001 kernel: 357914 pages shared > May 16 13:17:54 gw-0001 kernel: 222 pages swap cached > May 16 13:17:54 gw-0001 kernel: 0 pages dirty > May 16 13:17:54 gw-0001 kernel: 0 pages writeback > May 16 13:17:54 gw-0001 kernel: 1037 pages mapped > May 16 13:17:54 gw-0001 kernel: 4798 pages slab > May 16 13:17:54 gw-0001 kernel: 2940 pages pagetables > May 16 13:17:54 gw-0001 kernel: Out of memory: Killed process 6054 > (amavisd). > > > Version ussed: > unane -a > Linux gw-0001.xxx.yyy.com 2.6.18-53.1.14.el5 #1 SMP Wed Mar 5 11:36:49 EST > 2008 i686 i686 i386 GNU/Linux > > rpm -q postfix postfix-2.4.7-3.el5.mysql_pgsql.jk > rpm -q amvisd-new amavisd-new-2.5.4-5.el5.jk > rpm -q perl-spamassassin spamassassin-3.2.4-1.el5.rf > rpm -q clamd clamd-0.93-2.el5.rf > rpm -q file file-4.23-5.el5.jk (This is the > backported version, it the problem version is 4.17-9.0.1