On Wed, Jan 09, 2013 at 07:07:15PM +0100, Paul Bijnens wrote:
On 2013-01-09 16:50, fred smith wrote:
On Wed, Jan 09, 2013 at 04:05:26PM +0100, Paul Bijnens wrote:
Inspecting /proc/PID/smaps of such a large process may reveal something?
well, there's a LOT of stuff dumped when one cats the file. but I have no adequate expertise to figure out what it all means.
What I know about it...
Here are some lines from my running system (first 3 blocks only):
00400000-00414000 r-xp 00000000 fd:00 190374 /usr/libexec/clock-applet Size: 80 kB Rss: 72 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 72 kB Private_Dirty: 0 kB Swap: 0 kB Pss: 72 kB 00614000-0061b000 rw-p 00014000 fd:00 190374 /usr/libexec/clock-applet Size: 28 kB Rss: 16 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 12 kB Private_Dirty: 4 kB Swap: 0 kB Pss: 16 kB 14c9e000-14ea3000 rw-p 14c9e000 00:00 0 [heap] Size: 2068 kB Rss: 2064 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 2064 kB Swap: 0 kB Pss: 2064 kB
That shows 3 memory blocks.
here's the (I think) equivalent blocks from my system:
08048000-0805b000 r-xp 00000000 fd:00 4685290 /usr/libexec/clock-applet Size: 76 kB Rss: 68 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 68 kB Private_Dirty: 0 kB Swap: 0 kB Pss: 68 kB 0805b000-0805c000 rw-p 00012000 fd:00 4685290 /usr/libexec/clock-applet Size: 4 kB Rss: 4 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 4 kB Swap: 0 kB Pss: 4 kB 09681000-1072a000 rw-p 09681000 00:00 0 [heap] Size: 115364 kB Rss: 115288 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 115288 kB Swap: 0 kB Pss: 115288 kB
I note that the heap shows 115288, or around 115 megabytes. Do I understand what you said (below) to imply that in this example, clock-applet is directly responsible for all 115288 KB, since Pss and RSS are the same?
top currently shows:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ SWAP CODE DATA nFLT nDRT COMMAND 23350 fredex 16 0 240m 124m 9.9m S 0.7 3.2 1:25.71 115m 76 125m 1 0 clock-applet
note that I've enabled some other memory-related columns here. I think it may be interesting to see that the swap amount appears to be the same as (or at least close) the size/rss/private-dirty/pss entries from the [heap] section shown above.
the other sections with large-ish values for 'size' are all in the 2x,xxx range or smaller-- mostly considerably smaller.
Another thing that might be interesting to try is to kill the clock-applet and leave it dead for a while, and see if top starts reporting that memory to some other app(let).
The first two blocks are from the program file itself "/usr/libexec/clock-applet".
The first block of that is not writable ("r-xp"). That are the executable instructions. The second one is writable ("rw-p"). Those are the writable data (variable etc.).
We see several sizes. We see the total size, and the RSS size (=what is using memory now). The RSS is again split up in different categories below it. (Shared/Private Clean/dirty).
The first block is the code, and thus non writable, and thus pages here can be deleted from memory at all times and can be paged back in from the file itself. They should always have 0 dirty pages. The second block is writable, having a mix of clean and dirty pages. These dirty pages must always be kept in memory. Dirty pages can also be shared with other processes, e.g. shmem segments.
Next we see the heap (memory from malloc etc). And as expected, all private and dirty here.
When a page is shared between many processes, this is reflected in the PSS value (proportional set size): 64 kB RSS, all shared between 4 processes, will show 16 kB PSS.
Example: 2aaaad317000-2aaaad346000 r-xp 00000000 fd:00 189693 /usr/lib64/libgsf-1.so.114.0.1 Size: 188 kB Rss: 56 kB Shared_Clean: 48 kB Shared_Dirty: 0 kB Private_Clean: 8 kB Private_Dirty: 0 kB Swap: 0 kB Pss: 21 kB
# lsof /usr/lib64/libgsf-1.so.114.0.1 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME gnome-pan 5826 paulb mem REG 253,0 210712 189693 /usr/lib64/libgsf-1.so.114.0.1 stickynot 5932 paulb mem REG 253,0 210712 189693 /usr/lib64/libgsf-1.so.114.0.1 clock-app 5950 paulb mem REG 253,0 210712 189693 /usr/lib64/libgsf-1.so.114.0.1 nautilus 23427 paulb mem REG 253,0 210712 189693 /usr/lib64/libgsf-1.so.114.0.1
Apparently not all 56kB is shared with all these 4 processes, but proprotionally the clock is using 21 kB of that.
The ps and top show the sum of several of those values: VIRT: sum of Size RSS : sum of Rss SHR : sum of Shared_* (aproximately at least: When I do my additions, I get in the neighbourhood, but not exact. I do not know why...)
So to debug our problem, we should find some part(s) there and see where the memory is going to. Maybe there is 1 large part? (e.g. some enormous font file -- unicode font -- that somehow all is marked as dirty) Or many smaller parts (heap -> mostly malloc and friends, and thus suspecting a memory leak there).