[CentOS] High memory needs

Wed Sep 26 16:14:55 UTC 2012
Jérémie Dubois-Lacoste <jeremie.dl at gmail.com>

Dear All,

We recently reinstalled our computing cluster.  We were using CentOS
5.3 (32 bits).  It is now CentOS 6.3 (64 bits), installed from the
CentOS 6.2 x64 CD, then upgraded to 6.3.

We have some issues with the memory needs of our running jobs. They
require much more than before, it may be due to the switch from 32 to
64 bits, but to me this cannot explain the whole difference.

Here are our investigations.

We used the following simple benchmark:

1. Run a python script and check the memory that
it requires (field "VIRT" of the "top" command).
This script is:
import time

2. Similarly, run and check the memory of a simple
bash script:
sleep 30
echo "done"

3. Open a R session and check the memory used

I asked 10 of our users to run these three things on their personal
PCs. They are running different distributions (mainly ubuntu,
slackware), half of them use a 32 bits system, the other half a 64
one.  Here is a summary of the results:

Bash script:
               Avg      Min       Max
32 bits     5400    4192      9024
64 bits     12900  10000      16528

Python script:
               Avg      Min       Max
32 bits    8500     5004      11132
64 bits    32800   30000      36336

               Avg      Min       Max
32 bits    26900   21000     33452
64 bits    100200 93008      97496

(as a side remark, the difference between 32 and 64 is surprisingly
big to me...).

Then we ran the same things on our CentOS cluster, getting
surprisingly high results. I installed a machine from scratch with the
CentOS CD (6.2 x64) to be sure another component of the cluster was
not playing a role. On this freshly installed machine I get the
following results:
SH:          103MB
R:            200MB

So, compared to the highest of our users (among the 64 bits ones), we
have a ratio of ~7, ~3, ~2, respectively.

It is very problematic for us because many jobs now cannot run
properly, because they lack memory on most of our computing nodes.
So we really cannot stand the situation...

Do you see any reason for this? Do you have suggestions?