we have several servers on same rack and servers are all inside firewall. Centos version from 4.X to 5.X. sometime the connection are very slow (compare to servers on other racks also inside firewall).
we discuss with network engineer he ask us use "ping" and traceroute" to check. Both tools response time are good but if we connect through application like Web browser, database, or DELL OPMN tool. The response time is very very slow.
This situation normally last 4 to 5 hours then it back to normal. Does there has way to check "real" network response time so we can show to network engineer. Otherwise they always say "no problem".
Thanks.
On Tue, 2011-03-29 at 05:13 +0800, mcclnx mcc wrote:
we have several servers on same rack and servers are all inside firewall. Centos version from 4.X to 5.X. sometime the connection are very slow (compare to servers on other racks also inside firewall).
we discuss with network engineer he ask us use "ping" and traceroute" to check. Both tools response time are good but if we connect through application like Web browser, database, or DELL OPMN tool. The response time is very very slow.
This situation normally last 4 to 5 hours then it back to normal. Does there has way to check "real" network response time so we can show to network engineer. Otherwise they always say "no problem".
miitool and ethtool
Check that you're running at the right speed; sometimes bad cables will make the negotiation go bad. Sometimes you just connect to the wrong switch port. You may also note if the switch has hard-set values about the negotiation such as duplex settings. Make sure they match on both ends. You should be full duplex unless there's a very very good network reason not to.
Lastly, check ifconfig output. Make sure there's no errors reported. If you have a high error count, it's a good bet the cable is bad or the sync settings are wrong.
I don't think it is cable problem. The reason are:
1. it happen every 3 to 4 weeks once.
2. problem last 4 to 5 hours then back to normal.
3. not one server has this problem, several servers on that rack all have problem at same time.
--- 11/3/28 (一),Peter Larsen plarsen@famlarsen.homelinux.com 寫道:
寄件者: Peter Larsen plarsen@famlarsen.homelinux.com 主旨: Re: [CentOS] centos server network speed check??? 收件者: centos@centos.org 日期: 2011年3月28日,一,下午5:41 On Tue, 2011-03-29 at 05:13 +0800, mcclnx mcc wrote:
we have several servers on same rack and servers are
all inside firewall. Centos version from 4.X to 5.X. sometime the connection are very slow (compare to servers on other racks also inside firewall).
we discuss with network engineer he ask us use "ping"
and traceroute" to check. Both tools response time are good but if we connect through application like Web browser, database, or DELL OPMN tool. The response time is very very slow.
This situation normally last 4 to 5 hours then it back
to normal. Does there has way to check "real" network response time so we can show to network engineer. Otherwise they always say "no problem".
miitool and ethtool
Check that you're running at the right speed; sometimes bad cables will make the negotiation go bad. Sometimes you just connect to the wrong switch port. You may also note if the switch has hard-set values about the negotiation such as duplex settings. Make sure they match on both ends. You should be full duplex unless there's a very very good network reason not to.
Lastly, check ifconfig output. Make sure there's no errors reported. If you have a high error count, it's a good bet the cable is bad or the sync settings are wrong.
-- Best Regards Peter Larsen
Wise words of the day: Showing up is 80% of life. -- Woody Allen
-----內含下列夾帶檔案-----
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
This may seem overly obvious, but have you simply run a netstat while the slowdown occurs to see what the box(es) are doing at that point in time?
Alex
On 2011-03-28, at 6:32 PM, mcclnx mcc wrote:
I don't think it is cable problem. The reason are:
it happen every 3 to 4 weeks once.
problem last 4 to 5 hours then back to normal.
not one server has this problem, several servers on that rack all have problem at same time.
--- 11/3/28 (一),Peter Larsen plarsen@famlarsen.homelinux.com 寫道:
寄件者: Peter Larsen plarsen@famlarsen.homelinux.com 主旨: Re: [CentOS] centos server network speed check??? 收件者: centos@centos.org 日期: 2011年3月28日,一,下午5:41 On Tue, 2011-03-29 at 05:13 +0800, mcclnx mcc wrote:
we have several servers on same rack and servers are
all inside firewall. Centos version from 4.X to 5.X. sometime the connection are very slow (compare to servers on other racks also inside firewall).
we discuss with network engineer he ask us use "ping"
and traceroute" to check. Both tools response time are good but if we connect through application like Web browser, database, or DELL OPMN tool. The response time is very very slow.
This situation normally last 4 to 5 hours then it back
to normal. Does there has way to check "real" network response time so we can show to network engineer. Otherwise they always say "no problem".
miitool and ethtool
Check that you're running at the right speed; sometimes bad cables will make the negotiation go bad. Sometimes you just connect to the wrong switch port. You may also note if the switch has hard-set values about the negotiation such as duplex settings. Make sure they match on both ends. You should be full duplex unless there's a very very good network reason not to.
Lastly, check ifconfig output. Make sure there's no errors reported. If you have a high error count, it's a good bet the cable is bad or the sync settings are wrong.
-- Best Regards Peter Larsen
Wise words of the day: Showing up is 80% of life. -- Woody Allen
-----內含下列夾帶檔案-----
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 3/28/11 7:32 PM, mcclnx mcc wrote:
I don't think it is cable problem. The reason are:
it happen every 3 to 4 weeks once.
problem last 4 to 5 hours then back to normal.
not one server has this problem, several servers on that rack all have problem at same time.
Is this slowness in sustained throughput (big ftp, etc.) or in establishing connections? If the latter you could have a DNS problem, like your first-choice resolver being down or slow.
On 3/29/11, mcclnx mcc mcclnx@yahoo.com.tw wrote:
I don't think it is cable problem. The reason are:
it happen every 3 to 4 weeks once.
problem last 4 to 5 hours then back to normal.
not one server has this problem, several servers on that rack all have
problem at same time.
Did you note down the exact dates and times of the past few occurrences? Have you checked crontab to see if anything is scheduled to run during those times, e.g. monthly backup set for the wrong time such as 2pm instead of 2am.
If you think the issue is with network latency. Try using smokeping. If all of the servers are on the same rack, the switch could be overloaded. Check the other servers connected to the switch when the slow down happens. The switch could be overloaded.
On Mon, Mar 28, 2011 at 11:33 PM, Emmanuel Noobadmin <centos.admin@gmail.com
wrote:
On 3/29/11, mcclnx mcc mcclnx@yahoo.com.tw wrote:
I don't think it is cable problem. The reason are:
it happen every 3 to 4 weeks once.
problem last 4 to 5 hours then back to normal.
not one server has this problem, several servers on that rack all have
problem at same time.
Did you note down the exact dates and times of the past few occurrences? Have you checked crontab to see if anything is scheduled to run during those times, e.g. monthly backup set for the wrong time such as 2pm instead of 2am. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 3/29/11, mcclnx mcc mcclnx@yahoo.com.tw wrote:
we have several servers on same rack and servers are all inside firewall. Centos version from 4.X to 5.X. sometime the connection are very slow (compare to servers on other racks also inside firewall).
we discuss with network engineer he ask us use "ping" and traceroute" to check. Both tools response time are good but if we connect through application like Web browser, database, or DELL OPMN tool. The response time is very very slow.
Are you connecting to the servers from within the same network/building or via Internet?
If the ping times are good, it could be the servers are too heavily loaded, whether due to too many users or inefficient applications. So they would respond very slowly to requests although there is no network problem.
To answer your questios:
1. network is Intranet not internet.
2. servers CPU and I/O are very light. We did use "sar -u" and sar -b to check.
3. is NOT only one server has this network slow problem. at least 4 to 5 servers on that rack all report slow. it is NOT possible all servers on that rack are all heavy load.
--- 11/3/28 (一),Emmanuel Noobadmin centos.admin@gmail.com 寫道:
寄件者: Emmanuel Noobadmin centos.admin@gmail.com 主旨: Re: [CentOS] centos server network speed check??? 收件者: "CentOS mailing list" centos@centos.org 日期: 2011年3月28日,一,下午5:45 On 3/29/11, mcclnx mcc mcclnx@yahoo.com.tw wrote:
we have several servers on same rack and servers are
all inside firewall.
Centos version from 4.X to 5.X. sometime the
connection are very slow
(compare to servers on other racks also inside
firewall).
we discuss with network engineer he ask us use "ping"
and traceroute" to
check. Both tools response time are good but if
we connect through
application like Web browser, database, or DELL OPMN
tool. The response
time is very very slow.
Are you connecting to the servers from within the same network/building or via Internet?
If the ping times are good, it could be the servers are too heavily loaded, whether due to too many users or inefficient applications. So they would respond very slowly to requests although there is no network problem. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 29/03/11 11:29, mcclnx mcc wrote:
To answer your questios:
network is Intranet not internet.
servers CPU and I/O are very light. We did use "sar -u" and sar -b to check.
is NOT only one server has this network slow problem. at least 4 to 5 servers on that rack all report slow. it is NOT possible all servers on that rack are all heavy load.
Try using 'nethogs' during the time when network is slow on the suspect machines. It will show you which processes on the local machine are hogging the network bandwidth! I don't know why I explained that last part; it is self explanatory :)
I was most likely talking out loud, so forgive me!
Ak.
On 29 March 2011 11:32, Anthony akcentos@anroet.com wrote:
On 29/03/11 11:29, mcclnx mcc wrote:
To answer your questios:
network is Intranet not internet.
servers CPU and I/O are very light. We did use "sar -u" and sar -b to check.
is NOT only one server has this network slow problem. at least 4 to 5 servers on that rack all report slow. it is NOT possible all servers on that rack are all heavy load.
Expanding on Tym's reply i would also check the counters on the switch interfaces that serve that rack to look for packet drops or buffer problems at the time of the slowdown. Also look for the cpu utilisation on the switch in question in case it is being overwhelmed with control plane traffic That would seem to be the most likely explanation for 4 or 5 servers going slow at the same time.
mike
On Monday, March 28, 2011 08:29:58 pm mcclnx mcc wrote:
To answer your questios:
network is Intranet not internet.
servers CPU and I/O are very light. We did use "sar -u" and sar -b to check.
is NOT only one server has this network slow problem. at least 4 to 5 servers on that rack all report slow. it is NOT possible all servers on that rack are all heavy load.
I have seen issues in the past with certain Broadcom gigabit ethernet NICs and the tg3 Linux kernel driver. Occasionally the NIC would just go into 'molasses' mode and get really slow. I haven't seen the problem in quite a while, though, so I don't know if that issue has been fixed or not. Note that not seeing the problem doesn't mean the problem didn't occur, of course. I never saw a correlation between multiple servers with that NIC going slow at the same time, however.
The next thing I would check is the network switch these servers are attached to. Many switches, especially Cisco Catalyst switches with hardware-assisted forwarding at mixed layers, tend to provide multiple physical connections on a single ASIC; the networking people can check in the Cisco IOS command line and see if the ASIC is throwing errors and such; the particular commands vary by ASIC, by switch model, and by operating system.
I had an older Catalyst 2900XL (I did say 'older' after all) where a certain set of ports would hang and go slow for minutes at a time; plugged the devices into ports served by a different ASIC, and things got better. I then put a home-made permaplug into each of the the bad ports (A permaplug, something I make a few dozen of every so often, is an RJ45 with no contacts and where the latch release has been cut off, and the back end of the plug filled with red silicone; it'll go in, but it takes some work to get back out; I have been known to epoxy them into bad ports before to keep people from trying to use them....). It was the ASIC; on that switch each ASIC serves eight ports.
And the last problem I had was related to a new IP security camera that had multicast features; note to self: always check to make sure multicast is set ot OFF if the entire subnet that camera is on is not carried by multicast-aware switches. I had lots of devices just give up under the sustained 5Mb/s multicast load. Multicast traffic also doesn't necessarily show up in the usual places for checking network traffic; you need Wireshark running on a SPAN port to catch it most of the time. Since I wasn't aware that multicast was on by default, it took an inordinate amount of time to find the issue; I had switches giving up, losing BPDU's, causing spanning-tree loops, etc. It was not a pleasant day. My console terminal servers for devices, SitePlayer Telnets, all stopped responding completely after an hour of that sort of traffic. Like I say, it was not a pleasant day.
I have revisited the multicast filtering features of many of my switches in the days since that issue.
On Tuesday, March 29, 2011 11:25:26 am Les Mikesell wrote:
On 3/29/2011 9:50 AM, Lamar Owen wrote:
I had lots of devices just give up under the sustained 5Mb/s multicast load.
I assume that was back in 10Mb/s ethernet days?
No; but the devices in question have 10Mb/s NIC chips. That's all that's needed for a single-port RS-232 terminal server, or a remote weather station. Just have to filter multicast and rate limit broadcasts. The backbone is gigabit, and the switches all have either 10/100/1000 or 10/100 ports.
No, that event happened two weeks ago, thanks to a misbehaving megapixel IP camera.
On Mon, Mar 28, 2011 at 5:13 PM, mcclnx mcc mcclnx@yahoo.com.tw wrote:
we have several servers on same rack and servers are all inside firewall. Centos version from 4.X to 5.X. sometime the connection are very slow (compare to servers on other racks also inside firewall).
we discuss with network engineer he ask us use "ping" and traceroute" to check. Both tools response time are good but if we connect through application like Web browser, database, or DELL OPMN tool. The response time is very very slow.
This situation normally last 4 to 5 hours then it back to normal. Does there has way to check "real" network response time so we can show to network engineer. Otherwise they always say "no problem".
Thanks.
If you run 'iperf' between the 2 servers you will be able to check the bandwidth at that time. It could be that you have another application using all the bandwidth, or the network has slowed down for some other reason. you could also try using Cacti to monitor the bandwidth through the switch.
Also install the sysstat package on all servers and let it run every minute. You can graph the data using ksar on your local system. The graphs may show you that something else is going on.
Check the crontabs as well. Does the time window line up with when you are running backups or something else over the network?
Ping an traceroute are going to be mostly useless for this problem. They can help determine connectivity, but even a 56k modem can be "connected" and pass. Iperf will stress the bandwidth portion of the equation.