Quoting Daniel de Kok <danieldk at pobox.com>: > On Wed, 2006-08-16 at 15:12 -0500, Aleksandar Milivojevic wrote: >> <flame mode="on"> >> Now, I wouldn't call this kind of thing "stable" operating system or >> "stable" file system. If application asks for too much memory it >> should get killed (btw, system had 1 gig of RAM and application asked >> for like 600 meg, plus there was plenty of swap space free too -- so I >> wouldn't call this a case of app asking too much). You definetely >> don't end up with corrupted file system. >> </flame> > > - Did you enforce process limits? Hm, no. There was no need for that. Even if I had, they would be higher than what the app was using (because the system had enough resources). > - Was the memory fragmented, and how does the applications allocate > memory? Well, it was Perl script, and only God knows how Perl allocates memory ;-). It allocated almost all of those 600megs on startup (probably in smaller chunks), than happily worked on it. Somewhere in the middle, the OOM and file system corruption happened. BTW, some half an hour after the ext3 error, the app happily (and uniterrupted) finished its job. > - I suppose that vm.oom-kill is still set to 1? Hmmm... Any downside to setting it to 0? > Oh, and there's always bad karma (or semi-random errors if you > prefer) ;). Bad karma is having a bad memory, or overheated processor. Not applicable to my case ;-) There were bunck of errors logged. Here are just few of them that seem most relevant: Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB Node 0 HighMem: empty Swap cache: add 1634047, delete 1515588, find 13048002/13194900, race 0+22 Free swap: 844384kB 261856 pages of RAM 5646 reserved pages 108709 pages shared 118455 pages swap cached do_get_write_access: OOM for frozen_buffer ext3_splice_branch: aborting transaction: Out of memory in __ext3_journal_get_write_access EXT3-fs error (device dm-2) in ext3_ordered_writepage: Out of memory Aborting journal on device dm-2. ext3_abort called. EXT3-fs error (device dm-2): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only EXT3-fs error (device dm-2) in ext3_ordered_writepage: IO failure last message repeated 3 times __journal_remove_journal_head: freeing b_frozen_data last message repeated 10 times __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_frozen_data __journal_remove_journal_head: freeing b_frozen_data -- NOTICE: If you are not intended recipient, you are hereby notified that by reading this message you agreed not to disturb frogs during mating season. For more info, visit http://www.8-P.ca/