Hello list,
Postfix is dying on one of my servers almost nightly. This system is running CentOS5 with postfix-2.3.3-2.
In the morning (after noticing it died) I try to run `service postfix stop` and I get a failed start. Running `ps ax | grep postfix` I can see one process still running for postfix. After killing this, I am able to run `service postfix start`.
I see a lot of this in my maillog:
Nov 7 00:25:13 servername postfix/smtpd[16727]: lost connection after CONNECT from unknown[201.251.x.x] Nov 7 00:25:13 servername postfix/smtpd[16715]: disconnect from unknown[201.251.x.x] Nov 7 00:25:13 servername postfix/smtpd[16727]: disconnect from unknown[201.251.x.x] Nov 7 00:25:13 servername postfix/smtpd[16723]: warning: connect to private/anvil: Resource temporarily unavailable Nov 7 00:25:13 servername postfix/smtpd[16723]: warning: problem talking to server private/anvil: Resource temporarily unavailable Nov 7 00:25:13 servername postfix/smtpd[16723]: lost connection after CONNECT from unknown[201.251.x.x] Nov 7 00:25:13 servername postfix/smtpd[16723]: disconnect from unknown[201.251.x.x] Nov 7 00:27:22 servername postfix/smtpd[16184]: warning: timeout on private/anvil while reading input attribute name Nov 7 00:27:22 servername postfix/smtpd[16184]: warning: problem talking to server private/anvil: Connection timed out Nov 7 00:27:22 servername postfix/smtpd[16910]: warning: timeout on private/anvil while reading input attribute name Nov 7 00:27:22 servername postfix/smtpd[16910]: warning: problem talking to server private/anvil: Connection timed out Nov 7 00:27:22 servername postfix/smtpd[16726]: warning: timeout on private/anvil while reading input attribute name Nov 7 00:27:22 servername postfix/smtpd[16726]: warning: problem talking to server private/anvil: Connection timed out Nov 7 00:27:23 servername postfix/smtpd[16184]: warning: connect to private/anvil: Resource temporarily unavailable Nov 7 00:27:23 servername postfix/smtpd[16184]: warning: problem talking to server private/anvil: Resource temporarily unavailable Nov 7 00:27:23 servername postfix/smtpd[16184]: lost connection after CONNECT from unknown[201.251.x.x] Nov 7 00:27:23 servername postfix/smtpd[16184]: disconnect from unknown[201.251.x.x] Nov 7 00:27:23 servername postfix/smtpd[16910]: warning: connect to private/anvil: Resource temporarily unavailable Nov 7 00:27:23 servername postfix/smtpd[16910]: warning: problem talking to server private/anvil: Resource temporarily unavailable Nov 7 00:27:23 servername postfix/smtpd[16910]: lost connection after CONNECT from unknown[86.123.x.x] Nov 7 00:27:23 servername postfix/smtpd[16910]: disconnect from unknown[86.123.x.x] Nov 7 00:27:23 servername postfix/smtpd[16726]: warning: connect to private/anvil: Resource temporarily unavailable Nov 7 00:27:23 servername postfix/smtpd[16726]: warning: problem talking to server private/anvil: Resource temporarily unavailable Nov 7 00:27:23 servername postfix/smtpd[16726]: lost connection after CONNECT from cpe-024-x-x-x.carolina.x.x.com[24.74.x.x] Nov 7 00:27:23 servername postfix/smtpd[16726]: disconnect from cpe-024-074-x-x.carolina.x.x.com[24.74.x.x]
TIA for any suggestions.
Paul Norton wrote:
Hello list,
Postfix is dying on one of my servers almost nightly. This system is running CentOS5 with postfix-2.3.3-2.
In the morning (after noticing it died) I try to run `service postfix stop` and I get a failed start. Running `ps ax | grep postfix` I can see one process still running for postfix. After killing this, I am able to run `service postfix start`.
Anything in audit.log ?
Florin Andrei wrote:
Paul Norton wrote:
Postfix is dying on one of my servers almost nightly. This system is running CentOS5 with postfix-2.3.3-2.
In the morning (after noticing it died) I try to run `service postfix stop` and I get a failed start. Running `ps ax | grep postfix` I can see one process still running for postfix. After killing this, I am able to run `service postfix start`.
Anything in audit.log ?
I don't see anything for postfix in my audit.log
Paul Norton wrote:
Hello list,
Postfix is dying on one of my servers almost nightly. This system is running CentOS5 with postfix-2.3.3-2.
In the morning (after noticing it died) I try to run `service postfix stop` and I get a failed start. Running `ps ax | grep postfix` I can see one process still running for postfix. After killing this, I am able to run `service postfix start`.
Which process is it? smtpd? master?
I see a lot of this in my maillog:
Nov 7 00:25:13 servername postfix/smtpd[16723]: warning: connect to private/anvil: Resource temporarily unavailable Nov 7 00:25:13 servername postfix/smtpd[16723]: warning: problem talking to server private/anvil: Resource temporarily unavailable
TIA for any suggestions.
I cannot work out when postfix 2.3.3 was released...there was a bugfix related to anvil in Dec 2005 and I am using 2.3.5 without this particular problem. Speaking of which, it is time for me to upgrade to 2.3.13 or the latest 2.4.x . You might want to turn on debugging to find out what is going on with anvil.
Christopher Chan wrote:
Paul Norton wrote:
Postfix is dying on one of my servers almost nightly. This system is running CentOS5 with postfix-2.3.3-2.
In the morning (after noticing it died) I try to run `service postfix stop` and I get a failed start. Running `ps ax | grep postfix` I can see one process still running for postfix. After killing this, I am able to run `service postfix start`.
Which process is it? smtpd? master?
Master
Paul Norton wrote:
Christopher Chan wrote:
Paul Norton wrote:
Postfix is dying on one of my servers almost nightly. This system is running CentOS5 with postfix-2.3.3-2.
In the morning (after noticing it died) I try to run `service postfix stop` and I get a failed start. Running `ps ax | grep postfix` I can see one process still running for postfix. After killing this, I am able to run `service postfix start`.
Which process is it? smtpd? master?
Master
Do you sometimes find anvil missing? I wonder if you can strace master and see what it is doing or waiting for...
Christopher Chan wrote:
Paul Norton wrote:
Christopher Chan wrote:
Paul Norton wrote:
Postfix is dying on one of my servers almost nightly. This system is running CentOS5 with postfix-2.3.3-2.
In the morning (after noticing it died) I try to run `service postfix stop` and I get a failed start. Running `ps ax | grep postfix` I can see one process still running for postfix. After killing this, I am able to run `service postfix start`.
Which process is it? smtpd? master?
Master
Do you sometimes find anvil missing? I wonder if you can strace master and see what it is doing or waiting for...
I'll have to check the next time postfix dies. I'll run an strace on the process too. If anyone else has any suggestions, I would really appreciate it.
Christopher Chan wrote:
Paul Norton wrote:
Christopher Chan wrote:
Paul Norton wrote:
Postfix is dying on one of my servers almost nightly. This system is running CentOS5 with postfix-2.3.3-2.
In the morning (after noticing it died) I try to run `service postfix stop` and I get a failed start. Running `ps ax | grep postfix` I can see one process still running for postfix. After killing this, I am able to run `service postfix start`.
Which process is it? smtpd? master?
Master
Do you sometimes find anvil missing? I wonder if you can strace master and see what it is doing or waiting for...
This happened again this morning. I see the anvil process died -
22815 ? Z 0:13 [anvil] <defunct>
# strace -p 22799 Process 22799 attached - interrupt to quit futex(0xb7bc2bec, FUTEX_WAIT, 2, NULL <unfinished ...>
That's all there was from strace.
Do you sometimes find anvil missing? I wonder if you can strace master and see what it is doing or waiting for...
This happened again this morning. I see the anvil process died -
22815 ? Z 0:13 [anvil] <defunct>
# strace -p 22799 Process 22799 attached - interrupt to quit futex(0xb7bc2bec, FUTEX_WAIT, 2, NULL <unfinished ...>
That's all there was from strace.
...maybe I better build a new postfix package and drop it on the centos list if there is not one there already.
I do not have any such issues with 2.3.5 but then I am using that on Centos 4.x
Do you sometimes find anvil missing? I wonder if you can strace master and see what it is doing or waiting for...
This happened again this morning. I see the anvil process died -
22815 ? Z 0:13 [anvil] <defunct>
# strace -p 22799 Process 22799 attached - interrupt to quit futex(0xb7bc2bec, FUTEX_WAIT, 2, NULL <unfinished ...>
That's all there was from strace.
I saw your post on the postfix list. Do you really have problems with number of file handles? IIRC file-max is automatically adjusted? On my box it is pretty high at 48520 without any tuning on my part.
What does 'sysctl fs.file-nr' say?
Christopher Chan wrote:
Do you sometimes find anvil missing? I wonder if you can strace master and see what it is doing or waiting for...
This happened again this morning. I see the anvil process died -
22815 ? Z 0:13 [anvil] <defunct>
# strace -p 22799 Process 22799 attached - interrupt to quit futex(0xb7bc2bec, FUTEX_WAIT, 2, NULL <unfinished ...>
That's all there was from strace.
I saw your post on the postfix list. Do you really have problems with number of file handles? IIRC file-max is automatically adjusted? On my box it is pretty high at 48520 without any tuning on my part.
I don't believe so. I don't see errors like this anywhere else on the system.
What does 'sysctl fs.file-nr' say?
fs.file-nr = 2240 0 205964
Paul Norton wrote:
Christopher Chan wrote:
Do you sometimes find anvil missing? I wonder if you can strace master and see what it is doing or waiting for...
This happened again this morning. I see the anvil process died -
22815 ? Z 0:13 [anvil] <defunct>
# strace -p 22799 Process 22799 attached - interrupt to quit futex(0xb7bc2bec, FUTEX_WAIT, 2, NULL <unfinished ...>
That's all there was from strace.
I saw your post on the postfix list. Do you really have problems with number of file handles? IIRC file-max is automatically adjusted? On my box it is pretty high at 48520 without any tuning on my part.
I don't believe so. I don't see errors like this anywhere else on the system.
What does 'sysctl fs.file-nr' say?
fs.file-nr = 2240 0 205964
Heh, yours is even higher :-)
May I suggest trying postfix 2.3.13?
Paul Norton said the following on 11/07/2007 10:57 AM:
Postfix is dying on one of my servers almost nightly. This system is running CentOS5 with postfix-2.3.3-2.
I have had a similar problem, but Postfix continues to operate. I am running CentOS 5.0 with all updates as a virtual machine inside the free VMWare server on a CentOS 5.0 machine.
[root@srv log]# egrep "signal|startup" maillog Nov 4 12:56:21 srv postfix/master[9672]: warning: process /usr/libexec/postfix/anvil pid 23893 killed by signal 11 Nov 4 12:56:21 srv postfix/master[9672]: warning: /usr/libexec/postfix/anvil: bad command startup -- throttling Nov 7 11:20:02 srv postfix/master[9672]: warning: process /usr/libexec/postfix/smtp pid 24327 killed by signal 11 Nov 7 11:20:02 srv postfix/master[9672]: warning: /usr/libexec/postfix/smtp: bad command startup -- throttling Nov 9 07:02:49 srv postfix/master[9672]: warning: process /usr/libexec/postfix/pipe pid 13133 killed by signal 11 Nov 9 07:02:49 srv postfix/master[9672]: warning: /usr/libexec/postfix/pipe: bad command startup -- throttling
[root@srv log]# rpm -qa | grep postfix postfix-2.3.3-2.el5.centos.mysql_pgsql postfix-pflogsumm-2.3.3-2.el5.centos.mysql_pgsql
[root@srv log]# rpm -qa | grep -i vmware VMwareTools-6532-56528
Postfix is dying on one of my servers almost nightly. This system
is running CentOS5 with postfix-2.3.3-2.
What does the IO wait look like on the system? Is this system under high load?
You might want to check to make sure that syslog is not calling sync every time that it writes to file.
/etc/syslog.conf
Should be a "-' in front of /var/log/messages
Just a thought.
Joshua Gimer
Joshua Gimer wrote:
Postfix is dying on one of my servers almost nightly. This system is
running CentOS5 with postfix-2.3.3-2.
What does the IO wait look like on the system? Is this system under high load?
Only late at night when back up scripts are running. At this time though is when I start seeing the error from postfix.
You might want to check to make sure that syslog is not calling sync every time that it writes to file.
/etc/syslog.conf
Should be a "-' in front of /var/log/messages
Thanks. I have set this to -/var/log/messages now.
John Thomas wrote:
Paul Norton said the following on 11/07/2007 10:57 AM:
Postfix is dying on one of my servers almost nightly. This system is running CentOS5 with postfix-2.3.3-2.
I have had a similar problem, but Postfix continues to operate. I am running CentOS 5.0 with all updates as a virtual machine inside the free VMWare server on a CentOS 5.0 machine.
[root@srv log]# egrep "signal|startup" maillog Nov 4 12:56:21 srv postfix/master[9672]: warning: process /usr/libexec/postfix/anvil pid 23893 killed by signal 11 Nov 4 12:56:21 srv postfix/master[9672]: warning: /usr/libexec/postfix/anvil: bad command startup -- throttling Nov 7 11:20:02 srv postfix/master[9672]: warning: process /usr/libexec/postfix/smtp pid 24327 killed by signal 11 Nov 7 11:20:02 srv postfix/master[9672]: warning: /usr/libexec/postfix/smtp: bad command startup -- throttling Nov 9 07:02:49 srv postfix/master[9672]: warning: process /usr/libexec/postfix/pipe pid 13133 killed by signal 11 Nov 9 07:02:49 srv postfix/master[9672]: warning: /usr/libexec/postfix/pipe: bad command startup -- throttling
Xen on Centos 5.0 is flaky.