On Fri, Jan 22, 2010 at 12:23:07 +0000, Karanbir Singh wrote:
On 01/22/2010 12:11 PM, Prof. P. Sriram wrote:
I believe a correction might be in order - we have made it non-usable for those that have 1 ip address and want to download at a rate exceeding 5 active connections per minute. Do you know of any such organizations?
yes, lots! including almost every office environment in the SME setup. Many people run development and testing VM's / machines inside their offices - and almost all have a small set of adsl links coming in ( in EU and US atleast ), that they use for all outbound internet connectivity behind a NAT setup. In many cases, yum-cron like jobs will kickoff at very similar times across an organisation.
I agree that 5 connections per IP is a little slim. However, given you apply such limits to the files where it matters (large files), it's certainly okay to expect from such companies or regional setups to set up a mirror for them, in their own best interest.
In my findings, it wasn't necessary though to go lower than 20 with this kind of restriction, and I'm quite sure that this leaves enough headroom for the type of setups that you mention.
If it really poses a problem, it would be trivial to enhance the Apache module to look not only at IP + number of connections, but to key this to URL+User-Agent.
Yes, I'm postulating that these excessive parallel connections are _not_ the result of some evil mind, but purely the result of misinformation on the side of users (naively tweaking the button that is "supposed" to make it faster (and it works to some extent for the)). In fact, there'll be some good amount of desperation involved that causes people to try out these kind of extreme settings. Nobody in the better-connected world would ever see the need to do so. But Chinese users need to go through needle eyes...
(I have virtually never seen (and virtually never heard) of deliberate DoS attacks against open source mirrors; most issues seem to be misconfiguration or broken software; anyone else?)
Shouldn't they be enhancing their connectivity?
an example - adsl2+ brings in approx 16Mbps downstram, thats plenty of connectivity for most offices with <= 50 employes who mostly only do :80/:443 sort of traffic, with some other things like :22 and maybe rsync. They should perhaps consider setting up local repo's within their facility, but many lack the resources to do so.
If you know of any package that provides this enhanced functionality, I would be happy to implement that instead of our current scheme.
I personally dont. But if its a case of watching a log file, should not be hard to implement. However, the problem of things like repomd.xml etc still persists.
I doubt that the negative effects of massive parallel connections occur with repomd.xml files -- so there wouldn't be a need to impose the limitation on them in the first place, or what do you think?
How about turning off 'RANGE' requests in httpd ? is that an option.
I recommend against it, because even though Range requests are not mandatory in HTTP/1.1 to be supported by servers, they are so universally supported (in default configurations) that there is certain expectation on the client side that they _will_ suppor it. If a server doesn't, it can lead to ugly surprises on the client side (pulling gigabytes of data instead of a small chunk into memory), and also the server might end up delivering more data than it would otherwise.
Some people have made claims that range requests have a negative effect on buffer caches (and proposed to switch them off for that reason0, but from what I see it doesn't seem to pose a real-world problem for mirrors.
Peter