[CentOS-devel] mock using yum .repo file?

Sun Jul 24 19:37:01 UTC 2011
Jeff Johnson <n3npq at mac.com>

On Jul 24, 2011, at 3:14 PM, Ljubomir Ljubojevic wrote:

>> 
>> Eeek! You are already well beyond my expertise: that's a whole lotta repos.
>> 
>> You are likely paying a significant performance cost carrying around
>> that number of repositories. Can you perhaps estimate how much
>> that performance cost is? Say, how long does it take to do some single
>> package update with
>> 	only CentOS repositories configured
>> 	all of the above configured
>> I'm just interested in a data point to calibrate my expectations
>> of how yum behaves with lots of repositories. You're one of the
>> few and the brave with that number of repositories …
> 
> Take notice that only 16 are enabled, and ~24 are disabled by default 
> and used only if I do not find what I am looking for.
> 

I can tell that there are already yum performance problems scaling to that number
because you (like any rational person would) are choosing to manually
intervene and enable/disable repositories as needed.

> Performance is not much of an issue, since the attributing factor is the 
> number of packages in side those repositories. Biggest of third party 
> repos are repoforge and repoforge-dag.
> 

You are correct that the scaling depends on the number of packages
not the number of repositories.

However the solution to a distributed lookup scaling problem *does* depend
on the number of places that have to be searched as well as the cost of a
failed lookup. If you have to look in a large number of repositories to ensure
that some packages does NOT exist anywhere, well there are ways to do that efficiently.

And none of the right solutions to the increasing cost of a failed lookup
are implemented in yum afiak.

I was hoping to get an estimate of how bad the scaling problem actually
is from an objective wall clock time seat-of-the-pants measurement.

Meanwhile I'm happy that you've found a workable solution for
your purposes. I'm rather more interested in what happens when
there hundreds of repositories and 10's of thousands of packages
that MUST be searched.

I suspect that yum will melt into a puddle if/when faced with depsolving
on that scale. Not that anyone needs depsolving on the scale of
hundreds of repos and 10's of thousands of packages in the "real world",
but that isn't a proper justification for not considering the cost
of a failed lookup carefully which (from what you are telling me)
you are already seeing, and dealing with by enabling/disabling repositories
and inserting a high priority repository that is also acting as a de facto
cache and "working set" for the most useful packages.


>> 
>> … again no fault intended: I am seriously interested in the objective
>> number for "engineering" and development purposes, not in criticizing.
>> 
> <snip>
>> 	Prefer answers from the same repository.
>> A "nearness" rather than a "priority" metric starts to scale better. E.g.
>> with a "priority" metric, adding a few more repositories likely forces
>> an adjustment in *all* the priorities. There's some chance (I haven't
>> looked) that a "nearness" metric would be more localized and that
>> a "first found" search on a simple repository order might be
>> sufficient to mostly get the right answer without the additional artifact
>> of attaching a "priority" score to every package.
>> 
> 
> This is why I chose to create plnet-downloaded. Versions on useful 
> packages are copied and their versions frozen with stable releases, and 
> updated in bulk and controlled. Might be easier to just repac them and 
> create separate repository.
> 

Prsumably this is the high priority (and hence searched first) that
is acting as a de facto cache, thereby avoiding the failed lookup scaling
issues I've just alluded to.

73 de Jeff