Hi all,
today we've encountered quite strange issues with memory allocation on one of our VPS running CentOS 6.2. So far I've been unable to determine what's causing it - hopefully someone here will know what's up.
The VPS is a "small" machine - just 512MB of RAM, 1 CPU, running 6.2. with current kernel (2.6.32-220.4.2.el6.x86_64, but I've tried the 2.6.32-71.el6.x86_64 too). There's a quite common stack installed, i.e. Nainstalován je na něm celkem standardní stack - apache, php, postgresql, postfix, dovecot, memcached and ssh. Basically nothing exotic, everything from official repos (except from postfix and postgresql). The machine is not heavily used.
More detailed logs (than posted here) are available here:
We've been hitting some I/O utilization issues (cause by other VPS instances on the same hw) so we've migrated to a different physical hw. After the migration, the VPS started failing because of memory alloc issues - the services fail either at startup time or when processing the requests - although there's enough free mem:
[root@vps audit]# free total used free shared buffers cached Mem: 502728 294224 208504 0 18604 163608 -/+ buffers/cache: 112012 390716 Swap: 0 0 0
i.e. about 200 MB of free memory, but apache fails because of segfaults when forking a child process:
[16:49:51 2012] [error] (12)Cannot allocate memory: fork: Unable to fork new process [16:51:17 2012] [notice] child pid 2577 exit signal Segmentation fault (11)
or when processing requests:
[26 16:30:16 2012] [error] [client 66.249.72.1] PHP Fatal error: Out of memory (allocated 262144) (tried to allocate 523800 bytes) in Unknown on line 0
The memory_limit in PHP is set to 32MB, so it's not the case. Similar issues happen to PostgreSQL:
16:42:01 CET pid=2504 db=xxxxxx-drupal user=xxxxxx FATAL: out of memory 16:42:01 CET pid=2504 db=xxxxxx-drupal user=xxxxxx DETAIL: Failed on request of size 2488. 16:42:01 CET pid=2438 db= user= LOG: could not fork new process for connection: Nelze alokovat paměť 16:42:01 CET pid=2438 db= user= 4f4a5247.986:21 LOG: could not fork new process for connection: cannot allocate memory
I have absolutely no clue what's causing this / how to fix it. According to free/vmstat there's about 200MB of free RAM all the time, so I have no idea why the alloc calls fail.
What makes is even more puzzling is that after adding a swapfile, all the issues suddenly disappear, although the swapfile is not used at all ... and it's not possible to disable it because of memory alloc.
# dd if=/dev/zero of=swap.img bs=1024 count=409600 # mkswap swap.img # swapon swap.img
... now the services are starting fine ...
# swapon -s
Filename Type Size Used Priority /root/swap.img file 399992 0 -1
# free total used free shared buffers cached Mem: 503412 294192 209220 0 11740 99980 -/+ buffers/cache: 182472 320940 Swap: 399992 0 399992
# swapoff swap.img swapoff: swap.img: swapoff selhal: Nelze alokovat paměť
Any ideas what might cause this?
The fact that I haven't noticed these issues before the migration are probably caused by a swap file - I've manually added it during a maintenance and forgot to remove it after that, but it disappeared when the machine was rebooted during migration.
There's a SELinux enable, but I doubt it's causing the issues - there's nothing in audit logs except for an information that there was a segfault. Nothing suspicious.
Otherwise it's just a standard CentOS install, the only thing I had to tune a bit were kernel limits (in sysctl.conf) related to shared memory (because of the database). Currently there's
kernel.shmmax = 68719476736 kernel.shmall = 134217728 vm.swappiness = 0 vm.overcommit_memory = 2
which should be fine IMHO ... any ideas?
regards Tomáš
On Sunday 26 February 2012 19.59.07 Tomas Vondra wrote: ...
i.e. about 200 MB of free memory, but apache fails because of segfaults when forking a child process:
[16:49:51 2012] [error] (12)Cannot allocate memory: fork: Unable to fork new process [16:51:17 2012] [notice] child pid 2577 exit signal Segmentation fault (11)
In general things can get quite bad with relatively high memory pressure and no swap.
That said, one thing that comes to mind is stacksize. When forking the linux kernel needs whatever the current stacksize is to be available as (free + free swap).
Also, just because you see Y bytes free doesn't mean you can successfully malloc that much (fragmentation, memory zones, etc.).
/Peter
or when processing requests:
[26 16:30:16 2012] [error] [client 66.249.72.1] PHP Fatal error: Out of memory (allocated 262144) (tried to allocate 523800 bytes) in Unknown on line 0
The memory_limit in PHP is set to 32MB, so it's not the case. Similar issues happen to PostgreSQL:
16:42:01 CET pid=2504 db=xxxxxx-drupal user=xxxxxx FATAL: out of memory 16:42:01 CET pid=2504 db=xxxxxx-drupal user=xxxxxx DETAIL: Failed on request of size 2488. 16:42:01 CET pid=2438 db= user= LOG: could not fork new process for connection: Nelze alokovat paměť 16:42:01 CET pid=2438 db= user= 4f4a5247.986:21 LOG: could not fork new process for connection: cannot allocate memory
I have absolutely no clue what's causing this / how to fix it. According to free/vmstat there's about 200MB of free RAM all the time, so I have no idea why the alloc calls fail.
On 27 Únor 2012, 11:26, Peter Kjellström wrote:
On Sunday 26 February 2012 19.59.07 Tomas Vondra wrote: ...
i.e. about 200 MB of free memory, but apache fails because of segfaults when forking a child process:
[16:49:51 2012] [error] (12)Cannot allocate memory: fork: Unable to fork new process [16:51:17 2012] [notice] child pid 2577 exit signal Segmentation fault (11)
In general things can get quite bad with relatively high memory pressure and no swap.
Sure, but there's no such pressure. There was almost 200MB of "free" memory (used for page cache, not dirty thus easy to drop).
That said, one thing that comes to mind is stacksize. When forking the linux kernel needs whatever the current stacksize is to be available as (free + free swap).
Also, just because you see Y bytes free doesn't mean you can successfully malloc that much (fragmentation, memory zones, etc.).
Yup, I'm aware of that. But it's rather improbable, especially given the other symptoms.
Update: After submitting the original post, I've noticed that these issues probably started about a week ago after upgrading a kernel and several related packages. I've had a swap there and the issues were not as severe, so I haven't noticed that before. I do remember I got an OOM error during that upgrade and I thought I've dealt with it properly, but maybe not. So I've reinstalled (remove+install) all those packages, rebooted and the problems disappeared. I will check that in the evening, but hopefully it's fixed.
kind regards
On 27.2.2012 12:57, Tomas Vondra wrote:
On 27 Únor 2012, 11:26, Peter Kjellström wrote:
On Sunday 26 February 2012 19.59.07 Tomas Vondra wrote: ...
i.e. about 200 MB of free memory, but apache fails because of segfaults when forking a child process:
[16:49:51 2012] [error] (12)Cannot allocate memory: fork: Unable to fork new process [16:51:17 2012] [notice] child pid 2577 exit signal Segmentation fault (11)
In general things can get quite bad with relatively high memory pressure and no swap.
Sure, but there's no such pressure. There was almost 200MB of "free" memory (used for page cache, not dirty thus easy to drop).
That said, one thing that comes to mind is stacksize. When forking the linux kernel needs whatever the current stacksize is to be available as (free + free swap).
Also, just because you see Y bytes free doesn't mean you can successfully malloc that much (fragmentation, memory zones, etc.).
Yup, I'm aware of that. But it's rather improbable, especially given the other symptoms.
Update: After submitting the original post, I've noticed that these issues probably started about a week ago after upgrading a kernel and several related packages. I've had a swap there and the issues were not as severe, so I haven't noticed that before. I do remember I got an OOM error during that upgrade and I thought I've dealt with it properly, but maybe not. So I've reinstalled (remove+install) all those packages, rebooted and the problems disappeared. I will check that in the evening, but hopefully it's fixed.
Well, I've found the actual issue. It clearly was my stupidity as I was messing with overcommit_memory without fully understanding it.
What I did was that I set (as mentioned in the original post)
vm.overcommit_memory = 2
which limits the amount of available memory to
swap + vm.overcommit_ratio * RAM
where vm.overcommit_ratio=50 by default, so you can allocate swap + 1/2 the physical memory. This is just fine if you have a swap - for example if you have swap size equal to RAM, this means 150% of RAM is available for processes.
The issues start when you disable swap (as I did) - then it effectively limits the available memory to 50% of physical RAM (and receive OOM if you try to allocate more. This is exactly what happened to me :-(
So what I did was that I set
vm.overcommit_ratio = 100
which gives me 100% of RAM. I know this will give me an OOM if I use all the physical RAM, but that's expected - I don't want to use swap on a virtual machine with poor I/O (and the services are set accordingly).
So the moral is don't mess with something you don't fully understand.
kind regards Tomas