Any suggestions to help me troubleshoot a "killed by signal 11" problem with Postfix? I've Googled and fiddled, but cannot figure it out. I have no idea where to look/start.
Based on this post, Wietse suggests this type of problem happens when there compiled with different libraries. I tried ldd on various programs and found they all seemed to line up, but I know little. http://www.security-express.com/archives/postfix/2005-12/0152.html
Logs: [root@srv log]# grep 'killed by signal 11' maillog* maillog:Aug 19 13:50:21 srv postfix/master[2719]: warning: process /usr/libexec/postfix/smtpd pid 3996 killed by signal 11 maillog.1:Aug 12 07:01:18 srv postfix/master[20954]: warning: process /usr/libexec/postfix/local pid 25006 killed by signal 11 maillog.1:Aug 13 06:22:38 srv postfix/master[25310]: warning: process /usr/libexec/postfix/cleanup pid 1195 killed by signal 11 maillog.1:Aug 13 18:02:37 srv postfix/master[25310]: warning: process /usr/libexec/postfix/pipe pid 6419 killed by signal 11 maillog.1:Aug 14 07:37:41 srv postfix/master[11557]: warning: process /usr/libexec/postfix/smtp pid 12580 killed by signal 11 maillog.1:Aug 16 14:18:04 srv postfix/master[11557]: warning: process /usr/libexec/postfix/smtpd pid 5323 killed by signal 11 maillog.1:Aug 17 02:37:19 srv postfix/master[11557]: warning: process /usr/libexec/postfix/anvil pid 9575 killed by signal 11 maillog.1:Aug 18 03:02:46 srv postfix/master[11909]: warning: process /usr/libexec/postfix/proxymap pid 19937 killed by signal 11 maillog.1:Aug 19 03:38:28 srv postfix/master[22278]: warning: process /usr/libexec/postfix/proxymap pid 28403 killed by signal 11 maillog.2:Aug 6 07:13:42 srv postfix/master[20954]: warning: process /usr/libexec/postfix/cleanup pid 26753 killed by signal 11 maillog.2:Aug 6 17:45:42 srv postfix/master[20954]: warning: process /usr/libexec/postfix/smtpd pid 31635 killed by signal 11 maillog.3:Jul 31 22:40:54 srv postfix/master[20149]: warning: process /usr/libexec/postfix/smtpd pid 21654 killed by signal 11 maillog.3:Aug 3 00:57:53 srv postfix/master[20149]: warning: process /usr/libexec/postfix/cleanup pid 10409 killed by signal 11 maillog.4:Jul 28 03:37:04 srv postfix/master[3457]: warning: process /usr/libexec/postfix/cleanup pid 14690 killed by signal 11 maillog.4:Jul 28 09:05:51 srv postfix/master[17435]: warning: process /usr/libexec/postfix/smtp pid 17490 killed by signal 11
System: CentOS 5.0 - updated postfix-pflogsumm-2.3.3-2.el5.centos.mysql_pgsql postfix-2.3.3-2.el5.centos.mysql_pgsql pam_mysql-0.6.2-4 cyrus-sasl-plain-2.1.22-4 cyrus-sasl-2.1.22-4 cyrus-sasl-md5-2.1.22-4 cyrus-sasl-sql-2.1.22-4 cyrus-imapd-utils-2.3.8-9 cyrus-imapd-devel-2.3.8-9 cyrus-sasl-lib-2.1.22-4 cyrus-sasl-devel-2.1.22-4 cyrus-sasl-gssapi-2.1.22-4 cyrus-imapd-perl-2.3.8-9 cyrus-imapd-2.3.8-9 sqlgrey-1.7.6-1 clamav-db-0.91.1-1.el5.rf amavisd-new-2.5.2-1.el5.rf clamav-0.91.1-1.el5.rf mysql-5.0.22-2.1 mysql-server-5.0.22-2.1 mysql-devel-5.0.22-2.1
On Sun, Aug 19, 2007 at 02:55:11PM -0700, John Thomas wrote:
Any suggestions to help me troubleshoot a "killed by signal 11" problem with Postfix? I've Googled and fiddled, but cannot figure it out. I have no idea where to look/start.
Try running a memory test (memtest86) on the machine. One of the standard "this machine has hardware problems" indicators is gcc dieing with signal 11. Maybe your postfix instance is being affected.
Stephen Harris said the following on 08/19/2007 03:45 PM:
On Sun, Aug 19, 2007 at 02:55:11PM -0700, John Thomas wrote:
Any suggestions to help me troubleshoot a "killed by signal 11" problem with Postfix? I've Googled and fiddled, but cannot figure it out. I have no idea where to look/start.
Try running a memory test (memtest86) on the machine. One of the standard "this machine has hardware problems" indicators is gcc dieing with signal 11. Maybe your postfix instance is being affected.
Thank you for the thought. Does this still hold if (and in fact) the box is virtual in VMWare Server 1.03 and none of the other virtual boxes nor the server have had trouble?
On Sun, Aug 19, 2007 at 04:08:56PM -0700, John Thomas wrote:
Stephen Harris said the following on 08/19/2007 03:45 PM:
Try running a memory test (memtest86) on the machine. One of the standard "this machine has hardware problems" indicators is gcc dieing with signal 11. Maybe your postfix instance is being affected.
Thank you for the thought. Does this still hold if (and in fact) the box is virtual in VMWare Server 1.03 and none of the other virtual boxes nor the server have had trouble?
It's still _possible_ but less likely to be the problem, I would have thought.
John Thomas wrote:
Any suggestions to help me troubleshoot a "killed by signal 11" problem with Postfix? I've Googled and fiddled, but cannot figure it out. I have no idea where to look/start.
Debug with gdb. Enable in main.cf
debugger_command = PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin gdb $daemon_directory/$process_name $process_id 2>&1 > $config_directory/$process_name.$process_id.log & sleep 5
debug_peer_level = 2
#debug_peer_list = some.domain
Set some.domain to an ip address of a host where you can run telnet to connect to port 25 of the postfix server.
This will create a file(s) in /etc/postfix
Feizhou said the following on 08/19/2007 07:22 PM:
Debug with gdb. Enable in main.cf
<snip>
This sounds interesting. The "crash" happens very rarely, once every day or two, and I have not seen a pattern with domain, etc. Will your suggestion help given this randomness (i.e. I cannot seem to make it crash)?
John Thomas wrote:
Feizhou said the following on 08/19/2007 07:22 PM:
Debug with gdb. Enable in main.cf
<snip>
This sounds interesting. The "crash" happens very rarely, once every day or two, and I have not seen a pattern with domain, etc. Will your suggestion help given this randomness (i.e. I cannot seem to make it crash)?
Hmm....just had a good look at the dates in the log...it could be just RAM issues as suggested. You could try this the script here too: http://people.redhat.com/dledford/memtest.html