Greetings, I run my own email server for some domains I administer, on a centos vps server with a very small number of users.
The only services are smtp, imap/pop, webmail
Everything was running without problems until this morning. I left home for 1/2 hours, and when I came back everything had become about 100x slower (seriously!). The services were/are still all up and running, but practically unusable (even running mutt in my ssh session is almost frozen).
I have not changed/updated anything in the last 1/2 weeks. I have already done a few checks with the VPS provider, and it looks like:
network & hardware are OK
there seem to be no strange processes running. I didn't manage to save the output of "top", but it didn't show anything that (AFAICT, of course) should not be there on an email server
however, there is something that is using "much more memory than normal" (see the comment below from the hosting sysadmin after he checked user_beancounters). Initially we thought it was apache, but even switching it off didn't change anything. What now?
Any help to understand what the heck happened, and find out what exactly _started_ to cause this problem is very welcome!
TIA, Marco
root@vps [/etc/sysconfig]# cat /proc/user_beancounters Version: 2.5 uid resource held maxheld barrier limit failcnt 712: kmemsize 17208298 162267136 2147483646 2147483646 0 lockedpages 0 8 999999 999999 0 privvmpages 64694 262143 262144 262144 40 shmpages 14 2366 131072 131072 0 dummy 0 0 0 0 0 numproc 63 237 999999 999999 0 physpages 79977 262204 0 262144 0 vmguarpages 0 0 131072 2147483647 0 oomguarpages 30261 44087 131072 2147483647 0 numtcpsock 31 243 7999992 7999992 0 numflock 9 20 999999 999999 0 numpty 1 1 500000 500000 0 numsiginfo 0 27 999999 999999 0 tcpsndbuf 545000 7915960 214748160 396774400 0 tcprcvbuf 507904 3981312 214748160 396774400 0 othersockbuf 21832 1229736 214748160 396774400 0 dgramrcvbuf 0 118400 214748160 396774400 0 numothersock 56 356 7999992 7999992 0 dcachesize 10775271 154640329 2147483646 2147483646 0 numfile 772 1155 23999976 23999976 0 dummy 0 0 0 0 0 dummy 0 0 0 0 0 dummy 0 0 0 0 0 numiptent 57 57 999999 999999 0
As you can see, there are some fails for the privvmpages. This means your VPS tried to use more RAM than what is available (e.g. more than 1GB RAM). If you are only running some basic mail services on your VPS, that's definitely not normal and you should investigate that accordingly. We have fully checked everything for hardware and network problems and everything is working flawlessly. In combination with the RAM shortage errors, it is safe to conclude that there's something within your VPS itself that's malfunctioning.
On Thu, Sep 6, 2012 at 12:14 PM, Marco Fioretti marco.fioretti@gmail.com wrote:
Greetings, I run my own email server for some domains I administer, on a centos vps server with a very small number of users.
The only services are smtp, imap/pop, webmail
Everything was running without problems until this morning. I left home for 1/2 hours, and when I came back everything had become about 100x slower (seriously!). The services were/are still all up and running, but practically unusable (even running mutt in my ssh session is almost frozen).
I have not changed/updated anything in the last 1/2 weeks. I have already done a few checks with the VPS provider, and it looks like:
network & hardware are OK
there seem to be no strange processes running. I didn't manage to save the output of "top", but it didn't show anything that (AFAICT, of course) should not be there on an email server
however, there is something that is using "much more memory than normal" (see the comment below from the hosting sysadmin after he checked user_beancounters). Initially we thought it was apache, but even switching it off didn't change anything. What now?
Any help to understand what the heck happened, and find out what exactly _started_ to cause this problem is very welcome!
One thing to check is that the DNS servers in /etc/resolv.conf are answering quickly (dig some_name.domain @server_ip). Mail services use DNS extensively and if the first server fails there is a timeout before trying the 2nd choice. Things will still work but slower and you may end up with enough processes running to run out of RAM and start swapping. Also check your outbound mail queue in case some spam attempt has succeeded in generating bounces.
One thing to check is that the DNS servers in /etc/resolv.conf are answering quickly (dig some_name.domain @server_ip).
The server runs no DNS server itself.
I ran dig www.google.it @213.179.193.200 (ie the complete real IP of my primary dns server as listed in /etc/resolv.conf) and this is the result: [root@vps728 ~]# dig www.google.it @213.179.193.200
; <<>> DiG 9.2.4 <<>> www.google.it @213.179.193.200 ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37012 ;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 4, ADDITIONAL: 0
;; QUESTION SECTION: ;www.google.it. IN A
;; ANSWER SECTION: www.google.it. 300 IN A 173.194.35.151 www.google.it. 300 IN A 173.194.35.152 www.google.it. 300 IN A 173.194.35.159
;; AUTHORITY SECTION: google.it. 10800 IN NS ns2.google.com. google.it. 10800 IN NS ns3.google.com. google.it. 10800 IN NS ns4.google.com. google.it. 10800 IN NS ns1.google.com.
;; Query time: 2011 msec ;; SERVER: 213.179.193.200#53(213.179.193.200) ;; WHEN: Thu Sep 6 13:41:43 2012 ;; MSG SIZE rcvd: 161
as far as the queue goes, it was empty. I run postsuper -d ALL and postuper -d ALL deferred, just in case, but no change
Thanks, Marco
On 09/06/2012 01:58 PM, Marco Fioretti wrote:
2011 msec
Pretty slow my dig to the same server ran in 113 msec
dig www.google.it @213.179.193.200
; <<>> DiG 9.7.4-P1-RedHat-9.7.4-2.P1.fc14 <<>> www.google.it @213.179.193.200 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17288 ;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 4, ADDITIONAL: 0
;; QUESTION SECTION: ;www.google.it. IN A
;; ANSWER SECTION: www.google.it. 230 IN A 173.194.35.152 www.google.it. 230 IN A 173.194.35.159 www.google.it. 230 IN A 173.194.35.151
;; AUTHORITY SECTION: google.it. 10079 IN NS ns3.google.com. google.it. 10079 IN NS ns4.google.com. google.it. 10079 IN NS ns1.google.com. google.it. 10079 IN NS ns2.google.com.
;; Query time: 113 msec ;; SERVER: 213.179.193.200#53(213.179.193.200) ;; WHEN: Thu Sep 6 14:07:38 2012 ;; MSG SIZE rcvd: 161
Le 2012-09-06 19:14, Marco Fioretti a écrit :
Greetings, I run my own email server for some domains I administer, on a centos vps server with a very small number of users.
The only services are smtp, imap/pop, webmail
however, there is something that is using "much more memory than normal" (see the comment below from the hosting sysadmin after he checked user_beancounters). Initially we thought it was apache, but even switching it off didn't change anything. What now?
My 2 cents. You probably checked a lot of things.
- a filesystem (almost) full ? - did you check the logs ? any errors ? - a user sending/receiving a large e-mail ? what is the maximum size of an e-mail in your MTA settings ? - more mail-services-related processes ? - did you try to stop mail services to see if the server usability is back ? - do you run antispam and/or antivirus on incoming/outgoing e-mails ?
- to help with DNS, you can probably enable nscd or setup dnsmasq, so it would reduce DNS queries sent to DNS servers.
On Thu, September 6, 2012 7:14 pm, Marco Fioretti wrote:
Greetings, I run my own email server for some domains I administer, on a centos vps server with a very small number of users.
The only services are smtp, imap/pop, webmail
Everything was running without problems until this morning. I left home for 1/2 hours, and when I came back everything had become about 100x slower (seriously!).
this morning everything is back to normal (for now at least, strongly crossing my fingers!), as misteriously as it had frozen yesterday. I am able to write from my main email address just because of that.
If anything, I am even more puzzled than I was yesterday.
I do have the feeling, after the exchange we had yesterday, that the DNS servers my VPS provider told me to use had some problem now fixed, but nothing more. Of course, the idea that I don't know for sure what happened and it may happen again doesn't make me happy, but I honestly wouldn't know what/where to investigate at this point. Further comments are welcome.
Thanks to all who helped, Marco
On 2012-09-07 at 11:09:09 +0200, M. Fioretti wrote:
On Thu, September 6, 2012 7:14 pm, Marco Fioretti wrote:
Greetings, I run my own email server for some domains I administer, on a centos vps server with a very small number of users.
The only services are smtp, imap/pop, webmail
Everything was running without problems until this morning. I left home for 1/2 hours, and when I came back everything had become about 100x slower (seriously!).
this morning everything is back to normal (for now at least, strongly crossing my fingers!), as misteriously as it had frozen yesterday. I am able to write from my main email address just because of that.
If anything, I am even more puzzled than I was yesterday.
I do have the feeling, after the exchange we had yesterday, that the DNS servers my VPS provider told me to use had some problem now fixed, but nothing more. Of course, the idea that I don't know for sure what happened and it may happen again doesn't make me happy, but I honestly wouldn't know what/where to investigate at this point. Further comments are welcome.
If it's a DNS server problem, it's helpful to have some IP numbers of other DNS servers. It makes sense to look for alternate DNS servers now, it makes much less fun to look for them when the one you use doesn't work properly.
Someone mentioned Google's public DNS server. Is it advisable to use a DNS server provided by a company which doesn't do anything but collecting data about its users? I'm sceptical.
Regards, Reinhard
On Friday, September 07, 2012 05:09:09 AM M. Fioretti wrote:
I do have the feeling, after the exchange we had yesterday, that the DNS servers my VPS provider told me to use had some problem now fixed, but nothing more. Of course, the idea that I don't know for sure what happened and it may happen again doesn't make me happy, but I honestly wouldn't know what/where to investigate at this point. Further comments are welcome.
Did you say 'VPS'?
It may be that other VPS instances on the host could impact your VPS performance, and there may be nothing you can do about it.