[CentOS] Re: slow SMTP-AUTH

Thu Jul 26 13:22:52 UTC 2007
Ugo Bellavance <ugob at lubik.ca>

Ray Van Dolson wrote:
> On Wed, Jul 25, 2007 at 07:35:58PM -0400, Ugo Bellavance wrote:
>> Ugo Bellavance wrote:
>>> Brett Schroeder wrote:
>>>> Ugo Bellavance wrote:
>>>>> Hi,
>>>>>
>>>>>    I installed sendmail on a centos-4 openvz VM.  I set up SMTP-AUTH
>>>>> and when I try to send mail using this server, there is a big delay in
>>>>> which my e-mail client (Mozilla Thunderbird) just hangs.  The server, on
>>>>> the other hand, is idle at the same time.  It does that no matter if I
>>>>> use no encryption, TLS, or SSL.  When the client is 'hung' sendmail, on
>>>>> the server side haven't logged anything yet for this connection.
>>>>>
>>>>> Any idea?
>>>> Since nothing is logged on the server, it sounds like there is a 
>>>> firewall (or
>>>> other networking problem) in the way. You should use WireShark (or 
>>>> tcpdump) on
>>>> the client to capture the network traffic.
>> I disabled the firewall on the server, no luck.  I tried sniffing, I see 
>> nothing weird.
>>
>>   5.512292 74.59.221.180 -> 72.55.149.51 TCP 17274 > submission [PSH, 
>> ACK] Seq=350 Ack=10923 Win=17461 Len=53
>>   5.512984 72.55.149.51 -> 74.59.221.180 TCP submission > 17274 [PSH, 
>> ACK] Seq=10923 Ack=403 Win=7504 Len=277
>>   5.724172 74.59.221.180 -> 72.55.149.51 TCP 17274 > submission [ACK] 
>> Seq=403 Ack=11200 Win=17184 Len=0
>>  22.884013 74.59.221.180 -> 72.55.149.51 TCP 17274 > submission [PSH, 
>> ACK] Seq=403 Ack=11200 Win=17184 Len=53
>>  22.884918 72.55.149.51 -> 74.59.221.180 TCP submission > 17274 [PSH, 
>> ACK] Seq=11200 Ack=456 Win=7504 Len=85
>>  22.970714 74.59.221.180 -> 72.55.149.51 TLSv1 Application Data
>>  22.971120 72.55.149.51 -> 74.59.221.180 TLSv1 Application Data
>>
> 
> Can you run strace on sendmail during a connection and see what it's
> doing?  DNS is often the culprit when delays are concerned, so look for
> anything along those lines.
> 
> Ray

Here is an interesting part, but I can't tell why it took 2.292863 
seconds to do that:

09:07:47 time(NULL)                     = 1185455267 <0.000007>
09:07:47 rt_sigprocmask(SIG_UNBLOCK, [ALRM], [ALRM], 8) = 0 <0.000008>
09:07:47 select(8, [4 5 6 7], NULL, NULL, {5, 0}) = 1 (in [6], left {2, 
707000}) <2.292863>
09:07:49 accept(6, {sa_family=AF_INET, sin_port=htons(25919), 
sin_addr=inet_addr("74.59.221.180")}, [16]) = 9 <0.000023>
09:07:49 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 <0.000010>
09:07:49 time(NULL)                     = 1185455269 <0.000007>
09:07:49 pipe([10, 11])                 = 0 <0.000021>
09:07:49 rt_sigprocmask(SIG_BLOCK, [CHLD], [ALRM], 8) = 0 <0.000008>
09:07:49 clone(Process 16312 attached

Some operations are unfinished like this one:

[pid 31853] 09:07:49 select(8, [4 5 6 7], NULL, NULL, {5, 0} <unfinished 
...>


There is a 5-second gap here (from one PID to the other, weird):

[pid 16312] 09:07:49 time(NULL)         = 1185455269 <0.000010>
[pid 16312] 09:07:49 read(9, 0x8968260, 5) = -1 EAGAIN (Resource 
temporarily unavailable) <0.000012>
[pid 16312] 09:07:49 time(NULL)         = 1185455269 <0.000009>
[pid 16312] 09:07:49 select(10, [9], NULL, [9], {3600, 0} <unfinished ...>
[pid 31853] 09:07:54 <... select resumed> ) = 0 (Timeout) <4.999631>
[pid 31853] 09:07:54 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 <0.000008>
[pid 31853] 09:07:54 time(NULL)         = 1185455274 <0.000008>
[pid 31853] 09:07:54 open("/proc/loadavg", O_RDONLY) = 9 <0.000037>
[pid 31853] 09:07:54 fstat64(9, {st_mode=S_IFREG|0444, st_size=0, ...}) 
= 0 <0.000008>
[pid 31853] 09:07:54 read(9, "0.01 0.02 0.00 3/49 16395\n", 1024) = 26 
<0.000015>
[pid 31853] 09:07:54 close(9)           = 0 <0.000013>
[pid 31853] 09:07:54 time(NULL)         = 1185455274 <0.000007>
[pid 31853] 09:07:54 open("/proc/loadavg", O_RDONLY) = 9 <0.000015>