[CentOS] Fetchmail multiple instances increasing load average

Sun Jan 4 22:12:17 UTC 2015
Gordon Messmer <gordon.messmer at gmail.com>

On 12/29/2014 01:16 AM, Anshul Chauhan wrote:
> I’m running centos 5.7 with sendmail-8.13.8-8.1.el5_7,
> fetchmail-6.3.6-4.el5 and procmail-3.22-17.1.el5.centos. My server is
> having around 2000 mailboxes and this server used to fetch mails for all
> these users using fetchmail from the other MX server.

Honestly, the best solution to your problem is to fix this setup. The 
standard configuration for a backup MX is to configure it to accept mail 
for your domain, but treat those messages as non-local. That is, the 
server should queue that mail and then deliver it to the primary servers 
via SMTP.  You should not be using fetchmail for this purpose.  Using 
fetchmail means that all of your users' passwords are stored in plain 
text somewhere on your primary server for no good reason.  The setup 
you've got is bad for security and as you are seeing, it is bad for 
reliability.

> But the load average for the server goes high automatically after every 4
> to 5 hours due to multiple fetchmail instances and after that I’ve to
> either kill all the fetchmail jobs or restart the server for making the
> system up again.

Load is a number that indicates how many processes are in a non-sleeping 
state.  By itself, it is not an indication of a problem.  The first 
thing you have to do is identify the actual problem.

Start with "ps ax" and look at the processes that *don't* have an 'S' in 
their STAT column.  Load is a count of those processes.  From there, you 
have to figure out why they aren't sleeping.

Is there a large number of processes eating a lot of CPU?  Use "top" or 
some variant to find out.  How many processes are there?  Is it actually 
fetchmail, or something else?  How much CPU time is each one using?

Is there a lot of disk IO?  Use "iostat -x" to find out.  Which disks 
are seeing a lot of IO?

Fetchmail already uses a lock file, so that's probably not the issue.  
I've never seen fetchmail cause a high load, so I don't have a good 
guess as to the problem, but my first guess based on what little you've 
said about your setup is that you have a mail loop somewhere.  Either 
one use has fetchmail configured to check the local server, so that 
fetchmail is pulling messages from the local server, then feeding them 
in, and then fetching them again in an eternal loop, or one user is 
forwarding email in such a way that it's heading out to your backup MX, 
and looping that way.  You're going to have to go over your logs to see 
if the high-load events happen when a particular user is checked, or if 
there's some other common trigger.