[CentOS] XFS and CentOS 4.3

Wed Aug 16 21:08:20 UTC 2006
Aleksandar Milivojevic <alex at milivojevic.org>

Quoting Daniel de Kok <danieldk at pobox.com>:

> On Wed, 2006-08-16 at 15:12 -0500, Aleksandar Milivojevic wrote:
>> <flame mode="on">
>> Now, I wouldn't call this kind of thing "stable" operating system or
>> "stable" file system.  If application asks for too much memory it
>> should get killed (btw, system had 1 gig of RAM and application asked
>> for like 600 meg, plus there was plenty of swap space free too -- so I
>> wouldn't call this a case of app asking too much).  You definetely
>> don't end up with corrupted file system.
>> </flame>
>
> - Did you enforce process limits?

Hm, no.  There was no need for that.  Even if I had, they would be  
higher than what the app was using (because the system had enough  
resources).

> - Was the memory fragmented, and how does the applications allocate
> memory?

Well, it was Perl script, and only God knows how Perl allocates memory  
;-).  It allocated almost all of those 600megs on startup (probably in  
smaller chunks), than happily worked on it.  Somewhere in the middle,  
the OOM and file system corruption happened.  BTW, some half an hour  
after the ext3 error, the app happily (and uniterrupted) finished its  
job.

> - I suppose that vm.oom-kill is still set to 1?

Hmmm...  Any downside to setting it to 0?

> Oh, and there's always bad karma (or semi-random errors if you
> prefer) ;).

Bad karma is having a bad memory, or overheated processor.  Not  
applicable to my case ;-)

There were bunck of errors logged.  Here are just few of them that  
seem most relevant:

Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB  
0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
Node 0 HighMem: empty
Swap cache: add 1634047, delete 1515588, find 13048002/13194900, race 0+22
Free swap:       844384kB
261856 pages of RAM
5646 reserved pages
108709 pages shared
118455 pages swap cached
do_get_write_access: OOM for frozen_buffer
ext3_splice_branch: aborting transaction: Out of memory in  
__ext3_journal_get_write_access
EXT3-fs error (device dm-2) in ext3_ordered_writepage: Out of memory
Aborting journal on device dm-2.
ext3_abort called.
EXT3-fs error (device dm-2): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
EXT3-fs error (device dm-2) in ext3_ordered_writepage: IO failure
last message repeated 3 times
__journal_remove_journal_head: freeing b_frozen_data
last message repeated 10 times
__journal_remove_journal_head: freeing b_committed_data
__journal_remove_journal_head: freeing b_frozen_data
__journal_remove_journal_head: freeing b_frozen_data

-- 
NOTICE: If you are not intended recipient, you are hereby notified
that by reading this message you agreed not to disturb frogs during
mating season.  For more info, visit http://www.8-P.ca/