Dear All,
We recently reinstalled our computing cluster. We were using CentOS 5.3 (32 bits). It is now CentOS 6.3 (64 bits), installed from the CentOS 6.2 x64 CD, then upgraded to 6.3.
We have some issues with the memory needs of our running jobs. They require much more than before, it may be due to the switch from 32 to 64 bits, but to me this cannot explain the whole difference.
Here are our investigations.
We used the following simple benchmark:
1. Run a python script and check the memory that it requires (field "VIRT" of the "top" command). This script is: ---- import time time.sleep(30) print("done") ----
2. Similarly, run and check the memory of a simple bash script: ---- #!/bin/bash sleep 30 echo "done" ----
3. Open a R session and check the memory used
I asked 10 of our users to run these three things on their personal PCs. They are running different distributions (mainly ubuntu, slackware), half of them use a 32 bits system, the other half a 64 one. Here is a summary of the results:
Bash script: Avg Min Max 32 bits 5400 4192 9024 64 bits 12900 10000 16528
Python script: Avg Min Max 32 bits 8500 5004 11132 64 bits 32800 30000 36336
R: Avg Min Max 32 bits 26900 21000 33452 64 bits 100200 93008 97496
(as a side remark, the difference between 32 and 64 is surprisingly big to me...).
Then we ran the same things on our CentOS cluster, getting surprisingly high results. I installed a machine from scratch with the CentOS CD (6.2 x64) to be sure another component of the cluster was not playing a role. On this freshly installed machine I get the following results: SH: 103MB PYTHON: 114MB R: 200MB
So, compared to the highest of our users (among the 64 bits ones), we have a ratio of ~7, ~3, ~2, respectively.
It is very problematic for us because many jobs now cannot run properly, because they lack memory on most of our computing nodes. So we really cannot stand the situation...
Do you see any reason for this? Do you have suggestions?
Sincerely,
Jérémie
On 09/26/12 19:14, Jérémie Dubois-Lacoste wrote:
Dear All,
Hi!
We recently reinstalled our computing cluster. We were using CentOS 5.3 (32 bits). It is now CentOS 6.3 (64 bits), installed from the CentOS 6.2 x64 CD, then upgraded to 6.3.
We have some issues with the memory needs of our running jobs. They require much more than before, it may be due to the switch from 32 to 64 bits, but to me this cannot explain the whole difference.
it would seem that there is a malloc(glibc) behaviour ... i seen in other list an advice to use : export MALLOC_ARENA_MAX=1 export MALLOC_MMAP_THRESHOLD=131072
in order to decrease the used memory ..
HTH, Adrian
Here are our investigations.
We used the following simple benchmark:
- Run a python script and check the memory that
it requires (field "VIRT" of the "top" command). This script is:
import time time.sleep(30) print("done")
- Similarly, run and check the memory of a simple
bash script:
#!/bin/bash sleep 30 echo "done"
- Open a R session and check the memory used
I asked 10 of our users to run these three things on their personal PCs. They are running different distributions (mainly ubuntu, slackware), half of them use a 32 bits system, the other half a 64 one. Here is a summary of the results:
Bash script: Avg Min Max 32 bits 5400 4192 9024 64 bits 12900 10000 16528
Python script: Avg Min Max 32 bits 8500 5004 11132 64 bits 32800 30000 36336
R: Avg Min Max 32 bits 26900 21000 33452 64 bits 100200 93008 97496
(as a side remark, the difference between 32 and 64 is surprisingly big to me...).
Then we ran the same things on our CentOS cluster, getting surprisingly high results. I installed a machine from scratch with the CentOS CD (6.2 x64) to be sure another component of the cluster was not playing a role. On this freshly installed machine I get the following results: SH: 103MB PYTHON: 114MB R: 200MB
So, compared to the highest of our users (among the 64 bits ones), we have a ratio of ~7, ~3, ~2, respectively.
It is very problematic for us because many jobs now cannot run properly, because they lack memory on most of our computing nodes. So we really cannot stand the situation...
Do you see any reason for this? Do you have suggestions?
Sincerely,
Jérémie
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Hm, interesting suggestion. But didn't change anything. :( Thanks anyway,
Jérémie
2012/9/26 Adrian Sevcenco Adrian.Sevcenco@cern.ch:
On 09/26/12 19:14, Jérémie Dubois-Lacoste wrote:
Dear All,
Hi!
We recently reinstalled our computing cluster. We were using CentOS 5.3 (32 bits). It is now CentOS 6.3 (64 bits), installed from the CentOS 6.2 x64 CD, then upgraded to 6.3.
We have some issues with the memory needs of our running jobs. They require much more than before, it may be due to the switch from 32 to 64 bits, but to me this cannot explain the whole difference.
it would seem that there is a malloc(glibc) behaviour ... i seen in other list an advice to use : export MALLOC_ARENA_MAX=1 export MALLOC_MMAP_THRESHOLD=131072
in order to decrease the used memory ..
HTH, Adrian
Here are our investigations.
We used the following simple benchmark:
- Run a python script and check the memory that
it requires (field "VIRT" of the "top" command). This script is:
import time time.sleep(30) print("done")
- Similarly, run and check the memory of a simple
bash script:
#!/bin/bash sleep 30 echo "done"
- Open a R session and check the memory used
I asked 10 of our users to run these three things on their personal PCs. They are running different distributions (mainly ubuntu, slackware), half of them use a 32 bits system, the other half a 64 one. Here is a summary of the results:
Bash script: Avg Min Max 32 bits 5400 4192 9024 64 bits 12900 10000 16528
Python script: Avg Min Max 32 bits 8500 5004 11132 64 bits 32800 30000 36336
R: Avg Min Max 32 bits 26900 21000 33452 64 bits 100200 93008 97496
(as a side remark, the difference between 32 and 64 is surprisingly big to me...).
Then we ran the same things on our CentOS cluster, getting surprisingly high results. I installed a machine from scratch with the CentOS CD (6.2 x64) to be sure another component of the cluster was not playing a role. On this freshly installed machine I get the following results: SH: 103MB PYTHON: 114MB R: 200MB
So, compared to the highest of our users (among the 64 bits ones), we have a ratio of ~7, ~3, ~2, respectively.
It is very problematic for us because many jobs now cannot run properly, because they lack memory on most of our computing nodes. So we really cannot stand the situation...
Do you see any reason for this? Do you have suggestions?
Sincerely,
Jérémie
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Jérémie Dubois-Lacoste wrote:
Dear All,
We recently reinstalled our computing cluster. We were using CentOS 5.3 (32 bits). It is now CentOS 6.3 (64 bits), installed from the CentOS 6.2 x64 CD, then upgraded to 6.3.
We have some issues with the memory needs of our running jobs. They require much more than before, it may be due to the switch from 32 to 64 bits, but to me this cannot explain the whole difference.
Why not? The numbers you post, unless I'm misreading them, are about twice what the 32-bit were. <nsip>
- Open a R session and check the memory used
<snip>
Bash script: Avg Min Max 32 bits 5400 4192 9024 64 bits 12900 10000 16528
5400 * 2 = 10800 4192 * 2 = 8296 9024 * 2 = 18048
Python script: Avg Min Max 32 bits 8500 5004 11132 64 bits 32800 30000 36336
8500 * 2 = 17000 5004 * 2 = 10007 11132 * 2 = 22264
So that ranges from 2-2.5 larger.
R: Avg Min Max 32 bits 26900 21000 33452 64 bits 100200 93008 97496
Same here, about 2 - 2.5 times larger. More larger variables. <snip>
Then we ran the same things on our CentOS cluster, getting surprisingly high results. I installed a machine from scratch with the CentOS CD (6.2 x64) to be sure another component of the cluster was not playing a role. On this freshly installed machine I get the following results: SH: 103MB PYTHON: 114MB R: 200MB
So, compared to the highest of our users (among the 64 bits ones), we have a ratio of ~7, ~3, ~2, respectively.
It is very problematic for us because many jobs now cannot run properly, because they lack memory on most of our computing nodes. So we really cannot stand the situation...
Do you see any reason for this? Do you have suggestions?
First, what kind of compute cluster is this? Are you using something like torque, or what? Second, how much memory do you have in each of the nodes? And how many cores?
mark
On Wed, 26 Sep 2012, m.roth@5-cent.us wrote:
Jérémie Dubois-Lacoste wrote:
Python script: Avg Min Max 32 bits 8500 5004 11132 64 bits 32800 30000 36336
8500 * 2 = 17000 5004 * 2 = 10007 11132 * 2 = 22264
So that ranges from 2-2.5 larger.
Huh? 3*8500=25500 < 32800 3*5004=15012 < 30000 3*11132=33396 < 36336
R: Avg Min Max 32 bits 26900 21000 33452 64 bits 100200 93008 97496
3*26900= 80700 < 100200 4*21000=84000 < 93008 2.9*33452< 97011 < 97496
You may have misunderstood. The detailed number I gave are obtained on distributions that are not CentOS, and ok, it can makes sense between 32 and 64 bits. But on CentOS 6.2, 64bits, I obtain: SH: 103MB PYTHON: 114MB R: 200MB
This is from a freshly installed CentOS 6.2 machine, without anything else. Thus the other components of our cluster are not involved here. This machine has 16 cores. This is MUCH more than other 64 bits distributions, as I wrote the ratios are between 2 and 7.
2012/9/26 Michael Hennebry hennebry@web.cs.ndsu.nodak.edu:
On Wed, 26 Sep 2012, m.roth@5-cent.us wrote:
Jérémie Dubois-Lacoste wrote:
Python script: Avg Min Max 32 bits 8500 5004 11132 64 bits 32800 30000 36336
8500 * 2 = 17000 5004 * 2 = 10007 11132 * 2 = 22264
So that ranges from 2-2.5 larger.
Huh? 3*8500=25500 < 32800 3*5004=15012 < 30000 3*11132=33396 < 36336
R: Avg Min Max 32 bits 26900 21000 33452 64 bits 100200 93008 97496
3*26900= 80700 < 100200 4*21000=84000 < 93008 2.9*33452< 97011 < 97496
-- Michael hennebry@web.cs.ndsu.NoDak.edu "On Monday, I'm gonna have to tell my kindergarten class, whom I teach not to run with scissors, that my fiance ran me through with a broadsword." -- Lily _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 09/26/2012 09:14 AM, Jérémie Dubois-Lacoste wrote:
- Run a python script and check the memory that
it requires (field "VIRT" of the "top" command).
Don't use VIRT as a reference for memory used. RES is a better indication, but even that won't tell you anything useful about shared memory, and will lead you to believe that a process is using more memory than it is.
We have a computing cluster running Sun Grid Engine, which considers this value to check if a process exceeds the memory limit or not. So somehow I'm bound to consider it.
I installed a machine from scratch with CentOS 6.2 x64, nothing else, I open a terminal, I run this simple bash script and VIRT goes beyond 100MB for it. I understand it may not be very precise, however I still don't understant the difference compared to other x64 ditributions, under CentOS the value is 7 times higher!
2012/9/27 Gordon Messmer yinyang@eburg.com:
On 09/26/2012 09:14 AM, Jérémie Dubois-Lacoste wrote:
- Run a python script and check the memory that
it requires (field "VIRT" of the "top" command).
Don't use VIRT as a reference for memory used. RES is a better indication, but even that won't tell you anything useful about shared memory, and will lead you to believe that a process is using more memory than it is.
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 09/27/2012 01:57 AM, Jérémie Dubois-Lacoste wrote:
We have a computing cluster running Sun Grid Engine, which considers this value to check if a process exceeds the memory limit or not. So somehow I'm bound to consider it.
I installed a machine from scratch with CentOS 6.2 x64, nothing else, I open a terminal, I run this simple bash script and VIRT goes beyond 100MB for it. I understand it may not be very precise, however I still don't understant the difference compared to other x64 ditributions, under CentOS the value is 7 times higher!
OK, so compare the memory map of the process on the two systems. For example, I've included the output of 'pmap $$' on two CentOS 6 systems. One is 32bit, the other is 64. VIRT on the 32 bit system is ~5MB. On the 64 bit system, VIRT is ~116MB.
Why? Well, most of the VIRT size is a mmapped file, /usr/lib/locale/locale-archive. That file isn't resident in memory for the most part, but because the process addresses the contents of that file as if they were memory, rather than as if the content were a file, it's accounted as part of the VIRT size.
My login shell on the 64 bit host is only using 3.5MB of RAM total (including libraries shared with other processes), but its VIRT size is ~116MB.
If your vendor is using VIRT to make decisions about available resources, they're doing their job profoundly wrong. RES would be better, but still would represent far more memory use than is accurate, especially on systems with many active processes.
64 bit system: [gordon@vagabond:~]$ pmap $$ 23966: bash 0000000000400000 872K r-x-- /usr/bin/bash 00000000006d9000 4K r---- /usr/bin/bash 00000000006da000 36K rw--- /usr/bin/bash 00000000006e3000 24K rw--- [ anon ] 00000000008e2000 36K rw--- /usr/bin/bash 000000000122d000 1836K rw--- [ anon ] 000000383c000000 128K r-x-- /usr/lib64/ld-2.15.so 000000383c21f000 4K r---- /usr/lib64/ld-2.15.so 000000383c220000 4K rw--- /usr/lib64/ld-2.15.so 000000383c221000 4K rw--- [ anon ] 000000383c400000 1712K r-x-- /usr/lib64/libc-2.15.so 000000383c5ac000 2048K ----- /usr/lib64/libc-2.15.so 000000383c7ac000 16K r---- /usr/lib64/libc-2.15.so 000000383c7b0000 8K rw--- /usr/lib64/libc-2.15.so 000000383c7b2000 20K rw--- [ anon ] 000000383cc00000 12K r-x-- /usr/lib64/libdl-2.15.so 000000383cc03000 2044K ----- /usr/lib64/libdl-2.15.so 000000383ce02000 4K r---- /usr/lib64/libdl-2.15.so 000000383ce03000 4K rw--- /usr/lib64/libdl-2.15.so 000000384d000000 148K r-x-- /usr/lib64/libtinfo.so.5.9 000000384d025000 2044K ----- /usr/lib64/libtinfo.so.5.9 000000384d224000 16K r---- /usr/lib64/libtinfo.so.5.9 000000384d228000 4K rw--- /usr/lib64/libtinfo.so.5.9 00007f8ae82aa000 48K r-x-- /usr/lib64/libnss_files-2.15.so 00007f8ae82b6000 2044K ----- /usr/lib64/libnss_files-2.15.so 00007f8ae84b5000 4K r---- /usr/lib64/libnss_files-2.15.so 00007f8ae84b6000 4K rw--- /usr/lib64/libnss_files-2.15.so 00007f8ae84b7000 102580K r---- /usr/lib/locale/locale-archive 00007f8aee8e4000 16K rw--- [ anon ] 00007f8aee8f5000 8K rw--- [ anon ] 00007f8aee8f7000 28K r--s- /usr/lib64/gconv/gconv-modules.cache 00007f8aee8fe000 4K rw--- [ anon ] 00007fff5010a000 132K rw--- [ stack ] 00007fff501ff000 4K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 115904K
32 bit: enetics@firewall:~$ pmap $$ 32629: -bash 00291000 4K r-x-- [ anon ] 00952000 120K r-x-- /lib/ld-2.12.so 00970000 4K r---- /lib/ld-2.12.so 00971000 4K rw--- /lib/ld-2.12.so 00978000 1584K r-x-- /lib/libc-2.12.so 00b04000 8K r---- /lib/libc-2.12.so 00b06000 4K rw--- /lib/libc-2.12.so 00b07000 12K rw--- [ anon ] 00b0c000 12K r-x-- /lib/libdl-2.12.so 00b0f000 4K r---- /lib/libdl-2.12.so 00b10000 4K rw--- /lib/libdl-2.12.so 00b39000 48K r-x-- /lib/libnss_files-2.12.so 00b45000 4K r---- /lib/libnss_files-2.12.so 00b46000 4K rw--- /lib/libnss_files-2.12.so 00b7d000 88K r-x-- /lib/libtinfo.so.5.7 00b93000 12K rw--- /lib/libtinfo.so.5.7 08047000 836K r-x-- /bin/bash 08118000 20K rw--- /bin/bash 0811d000 20K rw--- [ anon ] 08b43000 232K rw--- [ anon ] b7682000 28K r--s- /usr/lib/gconv/gconv-modules.cache b7689000 2048K r---- /usr/lib/locale/locale-archive b7889000 8K rw--- [ anon ] b788f000 12K rw--- [ anon ] bff3f000 84K rw--- [ stack ] total 5204K
On Thu, Sep 27, 2012 at 10:46 AM, Gordon Messmer yinyang@eburg.com wrote:
I understand it may not be very precise, however I still don't understant the difference compared to other x64 ditributions, under CentOS the value is 7 times higher!
This might explain it: https://bugzilla.redhat.com/show_bug.cgi?id=156477 The mmapped local-archive contains all languages even though only the ones you use are accessed from it. Other distros split them and the installers only install what you want.
64 bit system: 00007f8ae84b7000 102580K r---- /usr/lib/locale/locale-archive
32 bit: b7689000 2048K r---- /usr/lib/locale/locale-archive
That's an interesting difference on its own, since the underlying files are about 95M and 54M respectively. Does the 32 bit kernel use some tricks to sparsely map files where the 64 bit one does it directly with page tables?
On 09/27/2012 09:34 AM, Les Mikesell wrote:
That's an interesting difference on its own, since the underlying files are about 95M and 54M respectively. Does the 32 bit kernel use some tricks to sparsely map files where the 64 bit one does it directly with page tables?
No, it's because glibc maps the whole file into memory on systems with greater than 32 bit memory address sizes.
http://illiterat.livejournal.com/4615.html?nojs=1 This blog discusses the topic briefly, but his description of M_MMAP_THRESHOLD is a little off.
Anyway, with a 64 bit address space, there's no reason not to mmap the entire file. On a smaller space, mapping the entire file could consume some significant portion of the process' address space. It depends on the size of that file, but let's say 5-10% on systems with a big locale-archive file. That's a big cost for the feature. On a 64 bit system, where address space is nearly unlimited, there's no reason to avoid mapping the whole file. On BOTH systems, mapping the file alone doesn't actually consume physical memory.
Anyone using VIRT to make decisions about resource utilization is completely ignorant of its function.
Anyone using VIRT to make decisions about resource utilization is completely ignorant of its function.
I agree. I don't understand why Sun Grid Engine does exactly that. I posted a full explanation of this problem and the solution we used here: https://www.centos.org/modules/newbb/viewtopic.php?topic_id=39499
Thanks for your feedback, it was useful to isolate the problem.