On 2013-01-09 16:50, fred smith wrote:
On Wed, Jan 09, 2013 at 04:05:26PM +0100, Paul Bijnens wrote:
Inspecting /proc/PID/smaps of such a large process may reveal something?
well, there's a LOT of stuff dumped when one cats the file. but I have no adequate expertise to figure out what it all means.
What I know about it...
Here are some lines from my running system (first 3 blocks only):
00400000-00414000 r-xp 00000000 fd:00 190374 /usr/libexec/clock-applet Size: 80 kB Rss: 72 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 72 kB Private_Dirty: 0 kB Swap: 0 kB Pss: 72 kB 00614000-0061b000 rw-p 00014000 fd:00 190374 /usr/libexec/clock-applet Size: 28 kB Rss: 16 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 12 kB Private_Dirty: 4 kB Swap: 0 kB Pss: 16 kB 14c9e000-14ea3000 rw-p 14c9e000 00:00 0 [heap] Size: 2068 kB Rss: 2064 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 2064 kB Swap: 0 kB Pss: 2064 kB
That shows 3 memory blocks.
The first two blocks are from the program file itself "/usr/libexec/clock-applet".
The first block of that is not writable ("r-xp"). That are the executable instructions. The second one is writable ("rw-p"). Those are the writable data (variable etc.).
We see several sizes. We see the total size, and the RSS size (=what is using memory now). The RSS is again split up in different categories below it. (Shared/Private Clean/dirty).
The first block is the code, and thus non writable, and thus pages here can be deleted from memory at all times and can be paged back in from the file itself. They should always have 0 dirty pages. The second block is writable, having a mix of clean and dirty pages. These dirty pages must always be kept in memory. Dirty pages can also be shared with other processes, e.g. shmem segments.
Next we see the heap (memory from malloc etc). And as expected, all private and dirty here.
When a page is shared between many processes, this is reflected in the PSS value (proportional set size): 64 kB RSS, all shared between 4 processes, will show 16 kB PSS.
Example: 2aaaad317000-2aaaad346000 r-xp 00000000 fd:00 189693 /usr/lib64/libgsf-1.so.114.0.1 Size: 188 kB Rss: 56 kB Shared_Clean: 48 kB Shared_Dirty: 0 kB Private_Clean: 8 kB Private_Dirty: 0 kB Swap: 0 kB Pss: 21 kB
# lsof /usr/lib64/libgsf-1.so.114.0.1 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME gnome-pan 5826 paulb mem REG 253,0 210712 189693 /usr/lib64/libgsf-1.so.114.0.1 stickynot 5932 paulb mem REG 253,0 210712 189693 /usr/lib64/libgsf-1.so.114.0.1 clock-app 5950 paulb mem REG 253,0 210712 189693 /usr/lib64/libgsf-1.so.114.0.1 nautilus 23427 paulb mem REG 253,0 210712 189693 /usr/lib64/libgsf-1.so.114.0.1
Apparently not all 56kB is shared with all these 4 processes, but proprotionally the clock is using 21 kB of that.
The ps and top show the sum of several of those values: VIRT: sum of Size RSS : sum of Rss SHR : sum of Shared_* (aproximately at least: When I do my additions, I get in the neighbourhood, but not exact. I do not know why...)
So to debug our problem, we should find some part(s) there and see where the memory is going to. Maybe there is 1 large part? (e.g. some enormous font file -- unicode font -- that somehow all is marked as dirty) Or many smaller parts (heap -> mostly malloc and friends, and thus suspecting a memory leak there).