[CentOS] strange memory issues with CentOS 6.2 on VPS

Hi all,

today we've encountered quite strange issues with memory allocation on
one of our VPS running CentOS 6.2. So far I've been unable to determine
what's causing it - hopefully someone here will know what's up.

The VPS is a "small" machine - just 512MB of RAM, 1 CPU, running 6.2.
with current kernel (2.6.32-220.4.2.el6.x86_64, but I've tried the
2.6.32-71.el6.x86_64 too). There's a quite common stack installed, i.e.
Nainstalován je na něm celkem standardní stack - apache, php,
postgresql, postfix, dovecot, memcached and ssh. Basically nothing
exotic, everything from official repos (except from postfix and
postgresql). The machine is not heavily used.

More detailed logs (than posted here) are available here:

  http://pastebin.com/vYxRUyUX

We've been hitting some I/O utilization issues (cause by other VPS
instances on the same hw) so we've migrated to a different physical hw.
After the migration, the VPS started failing because of memory alloc
issues - the services fail either at startup time or when processing the
requests - although there's enough free mem:

[root at vps audit]# free
         total       used       free     shared    buffers     cached
Mem:    502728     294224     208504          0      18604     163608
-/+ buffers/cache: 112012     390716
Swap:        0          0          0

i.e. about 200 MB of free memory, but apache fails because of segfaults
when forking a child process:

  [16:49:51 2012] [error] (12)Cannot allocate memory: fork: Unable to
                          fork new process
  [16:51:17 2012] [notice] child pid 2577 exit signal Segmentation
                           fault (11)

or when processing requests:

  [26 16:30:16 2012] [error] [client 66.249.72.1] PHP Fatal error:  Out
  of memory (allocated 262144) (tried to allocate 523800 bytes) in
  Unknown on line 0

The memory_limit in PHP is set to 32MB, so it's not the case. Similar
issues happen to PostgreSQL:

  16:42:01 CET pid=2504 db=xxxxxx-drupal user=xxxxxx FATAL:  out of
               memory
  16:42:01 CET pid=2504 db=xxxxxx-drupal user=xxxxxx DETAIL:  Failed on
               request of size 2488.
  16:42:01 CET pid=2438 db= user= LOG:  could not fork new process for
               connection: Nelze alokovat paměť
  16:42:01 CET pid=2438 db= user= 4f4a5247.986:21 LOG:  could not fork
               new process for connection: cannot allocate memory

I have absolutely no clue what's causing this / how to fix it. According
to free/vmstat there's about 200MB of free RAM  all the time, so I have
no idea why the alloc calls fail.

What makes is even more puzzling is that after adding a swapfile, all
the issues suddenly disappear, although the swapfile is not used at all
... and it's not possible to disable it because of memory alloc.

  # dd if=/dev/zero of=swap.img bs=1024 count=409600
  # mkswap swap.img
  # swapon swap.img

  ... now the services are starting fine ...

  # swapon -s

    Filename             Type        Size    Used    Priority
    /root/swap.img       file        399992  0       -1

  # free
           total     used     free   shared  buffers   cached
    Mem:  503412   294192   209220        0    11740    99980
    -/+ buffers/cache: 182472   320940
    Swap: 399992        0   399992

  # swapoff swap.img
    swapoff: swap.img: swapoff selhal: Nelze alokovat paměť

Any ideas what might cause this?

The fact that I haven't noticed these issues before the migration are
probably caused by a swap file - I've manually added it during a
maintenance and forgot to remove it after that, but it disappeared when
the machine was rebooted during migration.

There's a SELinux enable, but I doubt it's causing the issues - there's
nothing in audit logs except for an information that there was a
segfault. Nothing suspicious.

Otherwise it's just a standard CentOS install, the only thing I had to
tune a bit were kernel limits (in sysctl.conf) related to shared memory
(because of the database). Currently there's

  kernel.shmmax = 68719476736
  kernel.shmall = 134217728
  vm.swappiness = 0
  vm.overcommit_memory = 2

which should be fine IMHO ... any ideas?

regards
Tomáš