I just wanted to make a note that I have worked out a system which enables me to mirror all of the Fedora repository, which consists of about 12TB in I believe eleven million files, with a polling interval of ten minutes. A typical update (when there are changes to mirror) including the mirrormanager checkin takes about four minutes (most of it waiting for mirrormanager). A poll when there are no changes takes six seconds. The load on the server during a poll is rsync startup time and a handful of stat calls. A full tree traversal on the server is not required. (It may still be required on the client, but that's no worse than plain rsync.)
The software which handles this is at https://pagure.io/quick-fedora-mirror
It involves a server-side component (written in python2 with limited dependencies) to generate file lists in a useful format, and a client side component (currently written in zsh) which fetches the file lists, processes them, calls rsync with a list of changed files, and does a mirrormanager checkin. (The mirrormanager client is not required.)
A tiered setup (mirrors pulling from other mirrors) works fine; only the master mirror ever needs to generate the file lists. None of this limits the ability of clients to mirror in any other way.
Hardlinks are copied as hardlinks assuming that the file lists for all cross-linked rsync modules are regenerated at the same time, and when that doesn't happen, there's an included client-side hardlinker which uses the file list data to more quickly hardlink a repostory.
A form of exclude lists is supported in the client.
We're also working on a no-polling setup using the Fedora message bus, with mirrors automatically waking up and fetching when new content is pushed out.
While the default configurations, the above message bus stuff, and maybe the mirrormanager checkin are Fedora-specific, I do believe the software will work for any rsync server willing to run the file list generator. If any of this interests your project, please let me know.