[CentOS-mirror] New Mirror Traffic Expectations

Fri Jun 8 18:38:07 UTC 2018
David Richardson <david.richardson at utah.edu>

On Fri, 8 Jun 2018, Jonathan Wright wrote:

> Hi,
>
> I'm in the process of setting up 1-4 mirrors and I'm wondering what sort of 
> traffic stats I should expect in general, and then surrounding new releases.  
> All 4 mirrors would have dedicated gigabit links and be in the following 
> locations:
>
> Baltimore, MD, USA
> Dallas, TX, USA
> Seattle, WA, USA
> Amsterdam, NL
>
> I'm just trying to get an idea of what is actually going to be consumed.  Am 
> I looking at pushing the pipe size a lot of the time or should I expect 
> 10-20Mbps or somewhere in between?  If anyone in similar geographic locations 
> could share some stats that would be great.


Hi Jonathan,

I run mirror.chpc.utah.edu in Salt Lake City, UT, USA.

I also mirror the CentOS vault, EPEL, Fedora, and Arch, and perform 
installations of local machines from my mirror, so my data isn't going to 
exactly correlate to what you'll see for a CentOS-only mirror, but maybe 
it'll give you an idea of what to expect.

My average link usage is 70 mbits/sec for the last day; 150 mbits/sec over 
the last week, and 275 mbits/sec over the last month. Normally when a new 
version comes out, my traffic hovers around 400 mbits for a couple days, 
with some spikes to 850-900 mbits.

Every day, I generate around 700 gigabytes of traffic. Some days it drops 
to 500, some days it spikes to 900; breaking a terabyte is rare. The day 
of the CentOS 7.5 release, I sent nearly 3 terabytes of traffic.

My mirror is not a very impressive machine by current standards. It's a 
Dell 2900 with 8 cores (2x 2.66GHz Xeon E5430s), 48 gig of RAM, and 8x 1TB 
7200rpm drives in a software raid6. I have a 1-gbit link to the server (my 
org has 100-gbit to Internet2 and multiple 10-gbit links to other 
providers).



There is one bit of tuning advice I recommend. In sysctl.conf, I set: 
vm.vfs_cache_pressure = 10

vm.vfs_cache_pressure controls how the Linux disk cache favors file 
contents versus metadata. The default value is 100. Lower values favor 
file metadata, higher values favor file contents. (The docs I found on 
this said that if you set it to zero, the kernel will eventually OOM and 
you'll crash.)

By setting this lower, I keep the file metadata (permissions, directory 
structure, etc) in memory. The big benefit of this is when something needs 
to walk the tree (like when I rsync from upstream or someone rsyncs from 
me). Because the directory structure is in memory, rsync doesn't have to 
churn the disk to discover changes.

When there's no new updates, I can sync from the CentOS masters in under 
10 seconds. Before I found this setting, it took several minutes.

Thanks,
DR

-- 
David Richardson <david.richardson at utah.edu>
Center for High Performance Computing
University of Utah