mock using yum .repo file?

List overview All Threads
Download

newer

older

mantis e-mail

automated response

Ljubomir Ljubojevic

24 Jul 2011 24 Jul '11

2:02 p.m.

Hi. I hope this is not too off-topic.

I am setting up and learning mock to build packages for CentOS 6, and I have couple of questions for experts.

1. Can I somehow use .repo files from /etc/yum.repos.d/ ? I use 20-30 .repo files on and off (each repository(folder) in it's own file for easier manipulation without editing files), and editing repo configs in 2 places is not too appealing. I have seen this:

$MOCK --copyin /etc/yum.repos.d/texlive-mock.repo /etc/yum.repos.d/

is this what I am looking for?

2. Will cleaning mentioned in manual delete all installed rpms (dependencies) or just building folder? If I want to compile same (or different but using same dependancies) package, how to keep dependencies packages for faster compile? I have ccache = 4G, yum cache on 15 days and root cahce on 10 days. Will this do?

3. How do yum priorities work in mock? I have them set for every repo and if I can use .repo files will mock honor priorities?

If .repo files can not be used, do I have to sort repos based on priorities values or can I use priority= ?

Thanks in advance.

-- Ljubomir Ljubojevic (Love is in the Air) PL Computers Serbia, Europe Google is the Mother, Google is the Father, and traceroute is your trusty Spiderman... StarOS, Mikrotik and CentOS/RHEL/Linux consultant

Show replies by date

Alberto Sentieri

24 Jul 24 Jul

3:15 p.m.

I am also interested in building packages and I do not know where to start from. Is there a howto with the basics on the subject?

On 07/24/2011 10:02 AM, Ljubomir Ljubojevic wrote:

...

Hi. I hope this is not too off-topic.

I am setting up and learning mock to build packages for CentOS 6, and I have couple of questions for experts.

Can I somehow use .repo files from /etc/yum.repos.d/ ? I use 20-30

.repo files on and off (each repository(folder) in it's own file for easier manipulation without editing files), and editing repo configs in 2 places is not too appealing. I have seen this:

$MOCK --copyin /etc/yum.repos.d/texlive-mock.repo /etc/yum.repos.d/

is this what I am looking for?

Will cleaning mentioned in manual delete all installed rpms

(dependencies) or just building folder? If I want to compile same (or different but using same dependancies) package, how to keep dependencies packages for faster compile? I have ccache = 4G, yum cache on 15 days and root cahce on 10 days. Will this do?

How do yum priorities work in mock? I have them set for every repo

and if I can use .repo files will mock honor priorities?

If .repo files can not be used, do I have to sort repos based on priorities values or can I use priority= ?

Thanks in advance.

Ljubomir Ljubojevic

4:59 p.m.

Alberto Sentieri wrote:

...

I am also interested in building packages and I do not know where to start from. Is there a howto with the basics on the subject?

[1]: http://www.rpm.org/max-rpm/ [2]: http://fedoraproject.org/wiki/Projects/Mock [3]: http://fedoraproject.org/wiki/Extras/MockTricks

Jeff Johnson

5:14 p.m.

On Jul 24, 2011, at 12:59 PM, Ljubomir Ljubojevic wrote:

...

Alberto Sentieri wrote:

...
I am also interested in building packages and I do not know where to start from. Is there a howto with the basics on the subject?

(aside) Um, you *could* be a little bit more verbose and helpful.

You can build packages using rpmbuild or you can attempt to use mock.

The choice between rpm build <-> mock will depend on what platform you are building on, and what intent you have with building packages.

If you are just getting started building, then rpmbuild is likely the easier tool to learn. The link [1] isn't bad, but max-rom assumes you are writing recipes from scratch, and none (that I know of) has been doing that for almost a decade. Instead what is typically done is to find a closely similar *.src.rpm and edit at the *.spec recipe.

Starting with Big Packages like kernel/glibc/X11/python isn't advised, theres a fairly steep learning curve there.

If your intent is contributing a package to Fedora, then you will need to learn to use mock. One of the impediments to using mock on CentOS will be setting up entries in /etc/mock/* for CentOS.

(aside) I'm not sure whether CentOS adds those to mock or not. Consider it a RFE if not, and blame me for ignorance because these days I use Serentos not CentOS, largely … because … well … you know why.

The benefit of mock over rpmbuild is that it will setup most of the build environment for you (when configured correctly).

The benefit of rpmbuild over mock is that its a simpler tool to understand (too much automation isn't always the best way to learn).

hth

73 de Jeff

...

--

Ljubomir Ljubojevic (Love is in the Air) PL Computers Serbia, Europe

Google is the Mother, Google is the Father, and traceroute is your trusty Spiderman... StarOS, Mikrotik and CentOS/RHEL/Linux consultant _______________________________________________ CentOS-devel mailing list CentOS-devel@centos.org http://lists.centos.org/mailman/listinfo/centos-devel

Ljubomir Ljubojevic

5:31 p.m.

Jeff Johnson wrote:

...

On Jul 24, 2011, at 12:59 PM, Ljubomir Ljubojevic wrote:

...
Alberto Sentieri wrote:

...
I am also interested in building packages and I do not know where to start from. Is there a howto with the basics on the subject?

(aside) Um, you *could* be a little bit more verbose and helpful.

Too much postings and not much free time meant more errors and that got me a lot of warnings on centos-users ml, I finally unsubscribed, and I am limiting my help for now to what other people do not respond to, on the rest of comm media.

That and this is basically what I read to learn when I started few years ago.

Jeff Johnson

5:52 p.m.

On Jul 24, 2011, at 1:31 PM, Ljubomir Ljubojevic wrote:

...

Jeff Johnson wrote:

...
On Jul 24, 2011, at 12:59 PM, Ljubomir Ljubojevic wrote:

...
Alberto Sentieri wrote:

...
I am also interested in building packages and I do not know where to start from. Is there a howto with the basics on the subject?

(aside) Um, you *could* be a little bit more verbose and helpful.

Too much postings and not much free time meant more errors and that got me a lot of warnings on centos-users ml, I finally unsubscribed, and I am limiting my help for now to what other people do not respond to, on the rest of comm media.

Understood.

Oddly I'm on a mock learning curve as well today, but mock running on Lion and koji+mock running on armv5te dreamplugs and rpmv7he panda boards

That and this is basically what I read to learn when I started few years

Those aren't bad pointers, I merely tried to supply some context.

Meanwhile (while I'm here) here's the answers to your questions:

1) Yes, mock uses yum which will use repo files. You likely need to configure up /etc/mock/* somehow … checking … yes stanzas like this are merely yum repositories written differently: [fedora] name=fedora mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=fedora-15&arch=i386 failovermethod=priority See if someone hasn't configured up mock for cents somehow: that was the point of my aside: its often quite mysterious how to find the right URI. I have difficulties all the time.

2) Cleaning is likely to be mapped onto yum cache cleaning (mock goes to some lengths to preserve a build system image and not download needlessly). The mock->yum connection is moderately expensive, but the intent is 1) (initially) populate the chroot with packages 2) (normally) update the chroot (and the image) with later packages. The options win the man page are more to work around issues afaict: try-and-see, the normal operations SHOULD just work and if not think carefully about what isn't working for your purposes.

3) priorities don't work too well generally (but do work with specific usage cases). The typical usage case for priorities is to set up a precedence order if/when there are multiple choices for a package to install. I would always use whatever is recommended by add-on repositories, and (if you intent is to become an add-on repository using preferences) well Dag's repository uses priorities nicely and flawlessly (from first hand experience) and is worthy of study. But you shouldn't have to do too much with priorities, you will absolutely know when there is a need to use, because your updates will be breaking, and your preference (which will determine the priority field) will be clearer.

hth

73 de Jeff

...

ago.

--

Ljubomir Ljubojevic (Love is in the Air) PL Computers Serbia, Europe

Google is the Mother, Google is the Father, and traceroute is your trusty Spiderman... StarOS, Mikrotik and CentOS/RHEL/Linux consultant _______________________________________________ CentOS-devel mailing list CentOS-devel@centos.org http://lists.centos.org/mailman/listinfo/centos-devel

Manuel Wolfshant

6:16 p.m.

On 07/24/2011 08:52 PM, Jeff Johnson wrote:

...

Yes, mock uses yum which will use repo files. You likely need

to configure up /etc/mock/* somehow … checking … yes stanzas like this are merely yum repositories written differently: [fedora] name=fedora mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=fedora-15&arch=i386 failovermethod=priority See if someone hasn't configured up mock for cents somehow: that was the point of my aside: its often quite mysterious how to find the right URI. I have difficulties all the time.

mock from epel comes preconfigured to use centos repos when building for RHEL 4 and RHEL 5. As last versions of mock (both for epel-5 and epel-6) were released before centos 6 was out, the config files for RHEL6 are preconfigured to use RH beta ( 5.90). However it's quite trivial to adjust the 2 lines to point to centos repos instead.

Why on Earth did mock's maintainer decide to point the config files for PPC to centos ( give that there is no centos for ppc) is however an enigma for me.

73 de wolfy

Jeff Johnson

6:21 p.m.

On Jul 24, 2011, at 2:16 PM, Manuel Wolfshant wrote:

...

On 07/24/2011 08:52 PM, Jeff Johnson wrote:

...

Yes, mock uses yum which will use repo files. You likely need

to configure up /etc/mock/* somehow … checking … yes stanzas like this are merely yum repositories written differently: [fedora] name=fedora mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=fedora-15&arch=i386 failovermethod=priority See if someone hasn't configured up mock for cents somehow: that was the point of my aside: its often quite mysterious how to find the right URI. I have difficulties all the time.

mock from epel comes preconfigured to use centos repos when building for RHEL 4 and RHEL 5. As last versions of mock (both for epel-5 and epel-6) were released before centos 6 was out, the config files for RHEL6 are preconfigured to use RH beta ( 5.90). However it's quite trivial to adjust the 2 lines to point to centos repos instead.

Why on Earth did mock's maintainer decide to point the config files for PPC to centos ( give that there is no centos for ppc) is however an enigma for me.

Brain fart likely … the real flaw in /etc/mock/* is too many notes and not enough music.

What I mean by that is this: its all cookie-cutter cut-n-paste of .ini files with way too many details and complexity for not much purpose.

A better approach (if anyone is listening and uses mock) would be to write a script that generates the necessary information as needed for mock, not distributing all possible (and conceivable) configuration, typos and all.

...

73 de wolfy

;-)

73 de Jeff

Ljubomir Ljubojevic

6:25 p.m.

Manuel Wolfshant wrote:

...

Why on Earth did mock's maintainer decide to point the config files for PPC to centos ( give that there is no centos for ppc) is however an enigma for me.

Maybe there was rummer there will be PPC version

Ljubomir Ljubojevic

6:24 p.m.

Jeff Johnson wrote:

...

Yes, mock uses yum which will use repo files. You likely need

to configure up /etc/mock/* somehow … checking … yes stanzas like this are merely yum repositories written differently: [fedora] name=fedora mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=fedora-15&arch=i386 failovermethod=priority See if someone hasn't configured up mock for cents somehow: that was the point of my aside: its often quite mysterious how to find the right URI. I have difficulties all the time.

...

priorities don't work too well generally (but do work with specific usage cases).

The typical usage case for priorities is to set up a precedence order if/when there are multiple choices for a package to install. I would always use whatever is recommended by add-on repositories, and (if you intent is to become an add-on repository using preferences) well Dag's repository uses priorities nicely and flawlessly (from first hand experience) and is worthy of study. But you shouldn't have to do too much with priorities, you will absolutely know when there is a need to use, because your updates will be breaking, and your preference (which will determine the priority field) will be clearer.

hth

73 de Jeff

I use following repos:

playonlinux-on.repo plc-adobe-linux-on.repo plc-atrpms-stable-off.repo plc-atrpms-testing-off.repo plc-c6-testing-off.repo plc-centosplus-on.repo plc-elrepo-extras-off.repo plc-elrepo-fasttrack-off.repo plc-elrepo-kernel.repo plc-elrepo-on.repo plc-elrepo-testing-off.repo plc-epel-on.repo plc-extras-on.repo plc-fasttrack-off.repo plc-kb-el6-ext-off.repo plc-kb-el6-ext-test-off.repo plc-kb-el6-misc-off.repo plc-kb-el6-misc-test-off.repo plc-os-on.repo plc-releases-on.repo plc-remi-off.repo plc-remi-test-off.repo plc-repoforge-buildtools-off.repo plc-repoforge-dag-off.repo plc-repoforge-extras-off.repo plc-repoforge-on.repo plc-rpmfusion-free-updates-off.repo plc-rpmfusion-free-updates-testing-off.repo plc-rpmfusion-nonfree-updates-off.repo plc-rpmfusion-nonfree-updates-testing-off.repo plc-sernet-samba-off.repo plc-updates-on.repo plc-virtualbox-on.repo plc-virtualmin-universal-on.repo plnet-archive-off.repo plnet-compiled-on.repo plnet-downloaded-on.repo plnet-releases-on.repo plnet-replace-off.repo plnet-test-off.repo

plc-os-on.repo: name=Spec CentOS-$releasever - os - $releasever - $basearch baseurl=http://xxx.wwwww.rs/mrepo/plc-centos6-$basearch/RPMS.os/ gpgcheck=0 enabled=1 priority=1 exclude=*releases

All priorities in yum .repo files are carefully adjusted to not mess with repos of higher value for me, but to also provide all available packages.

plnet-downloaded for example is repo with high priority and is populated with carefully selected packages from other repos with lower priority that conflict (like aTrpms and repoForge) and packages not available via regular repositories but from download web pages (skype, shorewall, etc..).

"-off" at the end marks disabled repos. Those are all local repos and I have URI's. But I am still modifying and selecting repositories and packages. That is why I would like to use .repo files from /etc/yum/repos.d/. So I do not have to worry if there were changes.

Jeff Johnson

6:43 p.m.

On Jul 24, 2011, at 2:24 PM, Ljubomir Ljubojevic wrote:

...

Jeff Johnson wrote:

...

Yes, mock uses yum which will use repo files. You likely need

to configure up /etc/mock/* somehow … checking … yes stanzas like this are merely yum repositories written differently: [fedora] name=fedora mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=fedora-15&arch=i386 failovermethod=priority See if someone hasn't configured up mock for cents somehow: that was the point of my aside: its often quite mysterious how to find the right URI. I have difficulties all the time.

...

priorities don't work too well generally (but do work with specific usage cases).

The typical usage case for priorities is to set up a precedence order if/when there are multiple choices for a package to install. I would always use whatever is recommended by add-on repositories, and (if you intent is to become an add-on repository using preferences) well Dag's repository uses priorities nicely and flawlessly (from first hand experience) and is worthy of study. But you shouldn't have to do too much with priorities, you will absolutely know when there is a need to use, because your updates will be breaking, and your preference (which will determine the priority field) will be clearer.

hth

73 de Jeff

I use following repos:

playonlinux-on.repo plc-adobe-linux-on.repo plc-atrpms-stable-off.repo plc-atrpms-testing-off.repo plc-c6-testing-off.repo plc-centosplus-on.repo plc-elrepo-extras-off.repo plc-elrepo-fasttrack-off.repo plc-elrepo-kernel.repo plc-elrepo-on.repo plc-elrepo-testing-off.repo plc-epel-on.repo plc-extras-on.repo plc-fasttrack-off.repo plc-kb-el6-ext-off.repo plc-kb-el6-ext-test-off.repo plc-kb-el6-misc-off.repo plc-kb-el6-misc-test-off.repo plc-os-on.repo plc-releases-on.repo plc-remi-off.repo plc-remi-test-off.repo plc-repoforge-buildtools-off.repo plc-repoforge-dag-off.repo plc-repoforge-extras-off.repo plc-repoforge-on.repo plc-rpmfusion-free-updates-off.repo plc-rpmfusion-free-updates-testing-off.repo plc-rpmfusion-nonfree-updates-off.repo plc-rpmfusion-nonfree-updates-testing-off.repo plc-sernet-samba-off.repo plc-updates-on.repo plc-virtualbox-on.repo plc-virtualmin-universal-on.repo plnet-archive-off.repo plnet-compiled-on.repo plnet-downloaded-on.repo plnet-releases-on.repo plnet-replace-off.repo plnet-test-off.repo

Eeek! You are already well beyond my expertise: that's a whole lotta repos.

You are likely paying a significant performance cost carrying around that number of repositories. Can you perhaps estimate how much that performance cost is? Say, how long does it take to do some single package update with only CentOS repositories configured all of the above configured I'm just interested in a data point to calibrate my expectations of how yum behaves with lots of repositories. You're one of the few and the brave with that number of repositories …

… again no fault intended: I am seriously interested in the objective number for "engineering" and development purposes, not in criticizing.

...

plc-os-on.repo: name=Spec CentOS-$releasever - os - $releasever - $basearch baseurl=http://xxx.wwwww.rs/mrepo/plc-centos6-$basearch/RPMS.os/ gpgcheck=0 enabled=1 priority=1 exclude=*releases

All priorities in yum .repo files are carefully adjusted to not mess with repos of higher value for me, but to also provide all available packages.

plnet-downloaded for example is repo with high priority and is populated with carefully selected packages from other repos with lower priority that conflict (like aTrpms and repoForge) and packages not available via regular repositories but from download web pages (skype, shorewall, etc..).

This is the right approach: Get the priority metric approximately right, and pperpare a layer to deal with the inevitable flaws (that come from using per-repository, not per-package, "priority", a design flaw imho).

(aside) There's a better metric than "priority" that SHOULD be used. An integer priority is fine if you just need some general means to order choices and do tie breaking. The better metric would be "nearness", where the usual per-package choice would be to Prefer answers from the same repository. A "nearness" rather than a "priority" metric starts to scale better. E.g. with a "priority" metric, adding a few more repositories likely forces an adjustment in *all* the priorities. There's some chance (I haven't looked) that a "nearness" metric would be more localized and that a "first found" search on a simple repository order might be sufficient to mostly get the right answer without the additional artifact of attaching a "priority" score to every package.

...

"-off" at the end marks disabled repos. Those are all local repos and I have URI's. But I am still modifying and selecting repositories and packages. That is why I would like to use .repo files from /etc/yum/repos.d/. So I do not have to worry if there were changes.

That's a sane idea (and related to "too many notes and not enough music" with yum repositories and mock build targets being essentially the same info but specified in multiple places).

73 de Jeff

Ljubomir Ljubojevic

7:14 p.m.

Jeff Johnson wrote:

...

On Jul 24, 2011, at 2:24 PM, Ljubomir Ljubojevic wrote:

...
I use following repos:

playonlinux-on.repo plc-adobe-linux-on.repo plc-atrpms-stable-off.repo plc-atrpms-testing-off.repo plc-c6-testing-off.repo plc-centosplus-on.repo plc-elrepo-extras-off.repo plc-elrepo-fasttrack-off.repo plc-elrepo-kernel.repo plc-elrepo-on.repo plc-elrepo-testing-off.repo plc-epel-on.repo plc-extras-on.repo plc-fasttrack-off.repo plc-kb-el6-ext-off.repo plc-kb-el6-ext-test-off.repo plc-kb-el6-misc-off.repo plc-kb-el6-misc-test-off.repo plc-os-on.repo plc-releases-on.repo plc-remi-off.repo plc-remi-test-off.repo plc-repoforge-buildtools-off.repo plc-repoforge-dag-off.repo plc-repoforge-extras-off.repo plc-repoforge-on.repo plc-rpmfusion-free-updates-off.repo plc-rpmfusion-free-updates-testing-off.repo plc-rpmfusion-nonfree-updates-off.repo plc-rpmfusion-nonfree-updates-testing-off.repo plc-sernet-samba-off.repo plc-updates-on.repo plc-virtualbox-on.repo plc-virtualmin-universal-on.repo plnet-archive-off.repo plnet-compiled-on.repo plnet-downloaded-on.repo plnet-releases-on.repo plnet-replace-off.repo plnet-test-off.repo

Eeek! You are already well beyond my expertise: that's a whole lotta repos.

You are likely paying a significant performance cost carrying around that number of repositories. Can you perhaps estimate how much that performance cost is? Say, how long does it take to do some single package update with only CentOS repositories configured all of the above configured I'm just interested in a data point to calibrate my expectations of how yum behaves with lots of repositories. You're one of the few and the brave with that number of repositories …

Take notice that only 16 are enabled, and ~24 are disabled by default and used only if I do not find what I am looking for.

Performance is not much of an issue, since the attributing factor is the number of packages in side those repositories. Biggest of third party repos are repoforge and repoforge-dag.

...

… again no fault intended: I am seriously interested in the objective number for "engineering" and development purposes, not in criticizing.

<snip>

...

Prefer answers from the same repository. A "nearness" rather than a "priority" metric starts to scale better. E.g. with a "priority" metric, adding a few more repositories likely forces an adjustment in *all* the priorities. There's some chance (I haven't looked) that a "nearness" metric would be more localized and that a "first found" search on a simple repository order might be sufficient to mostly get the right answer without the additional artifact of attaching a "priority" score to every package.

This is why I chose to create plnet-downloaded. Versions on useful packages are copied and their versions frozen with stable releases, and updated in bulk and controlled. Might be easier to just repac them and create separate repository.

Jeff Johnson

7:37 p.m.

On Jul 24, 2011, at 3:14 PM, Ljubomir Ljubojevic wrote:

...

...
Eeek! You are already well beyond my expertise: that's a whole lotta repos.

You are likely paying a significant performance cost carrying around that number of repositories. Can you perhaps estimate how much that performance cost is? Say, how long does it take to do some single package update with only CentOS repositories configured all of the above configured I'm just interested in a data point to calibrate my expectations of how yum behaves with lots of repositories. You're one of the few and the brave with that number of repositories …

Take notice that only 16 are enabled, and ~24 are disabled by default and used only if I do not find what I am looking for.

I can tell that there are already yum performance problems scaling to that number because you (like any rational person would) are choosing to manually intervene and enable/disable repositories as needed.

...

Performance is not much of an issue, since the attributing factor is the number of packages in side those repositories. Biggest of third party repos are repoforge and repoforge-dag.

You are correct that the scaling depends on the number of packages not the number of repositories.

However the solution to a distributed lookup scaling problem *does* depend on the number of places that have to be searched as well as the cost of a failed lookup. If you have to look in a large number of repositories to ensure that some packages does NOT exist anywhere, well there are ways to do that efficiently.

And none of the right solutions to the increasing cost of a failed lookup are implemented in yum afiak.

I was hoping to get an estimate of how bad the scaling problem actually is from an objective wall clock time seat-of-the-pants measurement.

Meanwhile I'm happy that you've found a workable solution for your purposes. I'm rather more interested in what happens when there hundreds of repositories and 10's of thousands of packages that MUST be searched.

I suspect that yum will melt into a puddle if/when faced with depsolving on that scale. Not that anyone needs depsolving on the scale of hundreds of repos and 10's of thousands of packages in the "real world", but that isn't a proper justification for not considering the cost of a failed lookup carefully which (from what you are telling me) you are already seeing, and dealing with by enabling/disabling repositories and inserting a high priority repository that is also acting as a de facto cache and "working set" for the most useful packages.

...

...
… again no fault intended: I am seriously interested in the objective number for "engineering" and development purposes, not in criticizing.

<snip> > Prefer answers from the same repository. > A "nearness" rather than a "priority" metric starts to scale better. E.g. > with a "priority" metric, adding a few more repositories likely forces > an adjustment in *all* the priorities. There's some chance (I haven't > looked) that a "nearness" metric would be more localized and that > a "first found" search on a simple repository order might be > sufficient to mostly get the right answer without the additional artifact > of attaching a "priority" score to every package. >

This is why I chose to create plnet-downloaded. Versions on useful packages are copied and their versions frozen with stable releases, and updated in bulk and controlled. Might be easier to just repac them and create separate repository.

Prsumably this is the high priority (and hence searched first) that is acting as a de facto cache, thereby avoiding the failed lookup scaling issues I've just alluded to.

73 de Jeff

Ljubomir Ljubojevic

8:35 p.m.

Jeff Johnson wrote:

...

You are correct that the scaling depends on the number of packages not the number of repositories.

However the solution to a distributed lookup scaling problem *does* depend on the number of places that have to be searched as well as the cost of a failed lookup. If you have to look in a large number of repositories to ensure that some packages does NOT exist anywhere, well there are ways to do that efficiently.

And none of the right solutions to the increasing cost of a failed lookup are implemented in yum afiak.

I was hoping to get an estimate of how bad the scaling problem actually is from an objective wall clock time seat-of-the-pants measurement.

Meanwhile I'm happy that you've found a workable solution for your purposes. I'm rather more interested in what happens when there hundreds of repositories and 10's of thousands of packages that MUST be searched.

I suspect that yum will melt into a puddle if/when faced with depsolving on that scale. Not that anyone needs depsolving on the scale of hundreds of repos and 10's of thousands of packages in the "real world", but that isn't a proper justification for not considering the cost of a failed lookup carefully which (from what you are telling me) you are already seeing, and dealing with by enabling/disabling repositories and inserting a high priority repository that is also acting as a de facto cache and "working set" for the most useful packages.

Here are 2 speed tests. Repos are on 100Mbps LAN. First is with all repositories enabled and (left from CentOS 5 and old style RPMForge) aTrpms has higher priority over RepoForge. Second is with only safe RepoForge and other default (in my sistem) repositories enabled:

[root@mama ~]# yum clean all Loaded plugins: fastestmirror, priorities, refresh-packagekit Cleaning up Everything Cleaning up list of fastest mirrors [root@mama ~]# time yum install vlc --enablerepo=* --showduplicates Loaded plugins: fastestmirror, priorities, refresh-packagekit Loading mirror speeds from cached hostfile 2233 packages excluded due to repository priority protections Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package vlc.x86_64 0:1.1.10-71.el6 set to be updated --> Processing Dependency: bitstream-vera-serif-fonts for package: vlc-1.1.10-71.el6.x86_64 <snip> --> Finished Dependency Resolution Error: Package: vlc-1.1.10-71.el6.x86_64 (plc-atrpms-stable) Requires: libmodplug.so.0()(64bit) You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest

real 1m9.365s user 0m36.585s sys 0m2.830s

[root@mama ~]# time yum install vlc --showduplicates Loaded plugins: fastestmirror, priorities, refresh-packagekit Loading mirror speeds from cached hostfile 952 packages excluded due to repository priority protections Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package vlc.x86_64 0:1.1.11-1.el6.rf set to be updated --> Processing Dependency: libavformat.so.52(LIBAVFORMAT_52)(64bit) for package: vlc-1.1.11-1.el6.rf.x86_64 <snip> --> Finished Dependency Resolution Error: Package: vlc-1.1.11-1.el6.rf.x86_64 (plc-repoforge) Requires: libmodplug.so.0()(64bit) You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest

real 0m30.869s user 0m25.619s sys 0m1.261s

Difference is double, but time difference is only 30sec on low budget PC (1GB RAM, AMD Sempron 64-bit, 40GB IDE HDD).

And first run had to download yum repo files, second was from cache.

If you have some specific stres test I would be happy to run it.

Oh, yeah, yum reads and process xml files, not actual files, so searches are fast because of it.

Here are repolist data (1=all repos, 2=default, 3 = default with no rpmforge and epel):

<snip>

repo id repo name status playonlinux 2+6 plc-adobe-linux 0 plc-atrpms-stable 1,636+382 plc-atrpms-testing 43+67 plc-c6-testing 0 plc-centosplus 42 plc-elrepo 92 plc-elrepo-extras 0 plc-elrepo-fasttrack 0 plc-elrepo-kernel 0 plc-elrepo-testing 31 plc-epel 6,080 plc-extras 0 plc-fasttrack 0 plc-kb-el6-ext 0 plc-kb-el6-ext-test 0 plc-kb-el6-misc 0 plc-kb-el6-misc-test 0 plc-os 6,019 plc-releases 0 plc-remi 114+223 plc-remi-test 2+76 plc-repoforge 2,835+1,106 plc-repoforge-buildtools 0 plc-repoforge-dag 0 plc-repoforge-extras 25+357 plc-rpmfusion-free-updates 0 plc-rpmfusion-free-updates-testing 70 plc-rpmfusion-nonfree-updates 0 plc-rpmfusion-nonfree-updates-testing 70 plc-sernet-samba 0 plc-updates 1,042 plc-virtualbox 10 plc-virtualmin-universal 134 plnet-archive 0 plnet-compiled 28 plnet-downloaded 89+16 plnet-releases 0 plnet-replace 0 plnet-test 0 repolist: 18,364

real 0m29.675s user 0m15.167s sys 0m1.633s

<snip>

952 packages excluded due to repository priority protections repo id status playonlinux 2+6 plc-adobe-linux 0 plc-centosplus 42 plc-elrepo 92 plc-elrepo-kernel 0 plc-epel 6,080 plc-extras 0 plc-os 6,019 plc-releases 0 plc-repoforge 3,011+930 plc-updates 1,042 plc-virtualbox 10 plc-virtualmin-universal 134 plnet-compiled 28 plnet-downloaded 89+16 plnet-releases 0 repolist: 16,549

real 0m28.276s user 0m14.475s sys 0m1.658s

[root@mama ~]# yum clean all Loaded plugins: fastestmirror, priorities, refresh-packagekit Cleaning up Everything Cleaning up list of fastest mirrors [root@mama ~]# time yum repolist --disablerepo=*repoforge* --disablerepo=*epel* Loaded plugins: fastestmirror, priorities, refresh-packagekit Determining fastest mirrors

<snip>

17 packages excluded due to repository priority protections repo id status playonlinux 2+6 plc-adobe-linux 0 plc-centosplus 42 plc-elrepo 92 plc-elrepo-kernel 0 plc-extras 0 plc-os 6,019 plc-releases 0 plc-updates 1,042 plc-virtualbox 10 plc-virtualmin-universal 134 plnet-compiled 28 plnet-downloaded 94+11 plnet-releases 0 repolist: 7,463

real 0m19.810s user 0m8.663s sys 0m1.227s

Jeff Johnson

9:23 p.m.

On Jul 24, 2011, at 4:35 PM, Ljubomir Ljubojevic wrote:

...

Jeff Johnson wrote:

...
You are correct that the scaling depends on the number of packages not the number of repositories.

However the solution to a distributed lookup scaling problem *does* depend on the number of places that have to be searched as well as the cost of a failed lookup. If you have to look in a large number of repositories to ensure that some packages does NOT exist anywhere, well there are ways to do that efficiently.

And none of the right solutions to the increasing cost of a failed lookup are implemented in yum afiak.

I was hoping to get an estimate of how bad the scaling problem actually is from an objective wall clock time seat-of-the-pants measurement.

Meanwhile I'm happy that you've found a workable solution for your purposes. I'm rather more interested in what happens when there hundreds of repositories and 10's of thousands of packages that MUST be searched.

I suspect that yum will melt into a puddle if/when faced with depsolving on that scale. Not that anyone needs depsolving on the scale of hundreds of repos and 10's of thousands of packages in the "real world", but that isn't a proper justification for not considering the cost of a failed lookup carefully which (from what you are telling me) you are already seeing, and dealing with by enabling/disabling repositories and inserting a high priority repository that is also acting as a de facto cache and "working set" for the most useful packages.

Thank you! In general those numbers are better than I would have guessed from yum.

<snip>

...

If you have some specific stres test I would be happy to run it.

If I can think of something, I'll pass it along.

...

Oh, yeah, yum reads and process xml files, not actual files, so searches are fast because of it.

Here's something that might help you:

Using xml is a significant performance hit: see recent patches to yum/createrepo to use sqlite instead of xml … lemme find the check-in claim … here is the claim http://lists.baseurl.org/pipermail/rpm-metadata/2011-July/001353.html and quoting

Tested locally on repodata of 9000 pkgs.

Goes from 1.8-> 2GB of memory in use with the old createrepo code to 325MB of memory in use - same operation - performance-wise it is not considerably different. More testing will bear that out, though.

So -- if I believe those numbers -- there's *lots* of room for improvement in yum ripping out xml and replacing with a sqlite database. Note that createrepo != yum but some of the usage cases are similar. The general problem in yum (and smart and apt) is the high cost of the cache load, and the amount of aml that must be parsed/read in order to be cached. Adding a sqlite backing store which can just be used, not loaded, is a win.

Note that the other problem I alluded to, avoiding the cost of a failed search across a distributed store, is very well researched and modeled (and unimplemented in yum). But most depsolving just needs to find what package is needed and using priority is a reasonable way to improve that search (if you can choose the priorities sanely, which is hard).

The usual approach is to devise a cheap way to detect and avoid a failing search. This is often done with Bloom filters, but there are other equivalent ways to avoid the cost of failure.

Wikipedia isn't too bad an introduction to Bloom filters if interested. The hard part is choosing the parameters correctly for an "expected" population. If you miss that estimate (or choose the parameters incorrectly) then Bloom filters will just make matters worse.

<snip>

off to study and think a bit … thanks!

73 de Jeff

Jeff Johnson

9:35 p.m.

On Jul 24, 2011, at 5:23 PM, Jeff Johnson wrote:

...

So -- if I believe those numbers -- there's *lots* of room for improvement in yum ripping out xml and replacing with a sqlite database. Note that createrepo != yum but some of the usage cases are similar. The general problem in yum (and smart and apt) is the high cost of the cache load, and the amount of aml that must be parsed/read in order to be cached. Adding a sqlite backing store which can just be used, not loaded, is a win.

And just for completeness: zypper with *.solv files avoids many of the problems with "caching" seen in yum/apt/smart, but introduces some other problems (binary format maintenance is really painfully hard, and similarly merging hash buckets and memorization are overly complex, but these are different issues mentioned solely in passing).

Ljubomir Ljubojevic

9:37 p.m.

Jeff Johnson wrote:

...

On Jul 24, 2011, at 4:35 PM, Ljubomir Ljubojevic wrote:

...
Oh, yeah, yum reads and process xml files, not actual files, so searches are fast because of it.

Here's something that might help you:

Using xml is a significant performance hit: see recent patches to yum/createrepo to use sqlite instead of xml … lemme find the check-in claim … here is the claim http://lists.baseurl.org/pipermail/rpm-metadata/2011-July/001353.html and quoting

Tested locally on repodata of 9000 pkgs.

Goes from 1.8-> 2GB of memory in use with the old createrepo code to 325MB of memory in use - same operation - performance-wise it is not considerably different. More testing will bear that out, though.

So -- if I believe those numbers -- there's *lots* of room for improvement in yum ripping out xml and replacing with a sqlite database. Note that createrepo != yum but some of the usage cases are similar. The general problem in yum (and smart and apt) is the high cost of the cache load, and the amount of aml that must be parsed/read in order to be cached. Adding a sqlite backing store which can just be used, not loaded, is a win.

You have mistaken createrepo with yum repomd data. Createrepo is for creating actual repository (I use mrepo).

Yum data (repomd, repoview) is different story. Every repository stores data in xml file packed with tar. They are unpacked in memory and xml data is parsed and put into internal database (and cache). It is very much possible that yum internally (for cache) uses sqlite database, haven't had the need to research. Using "yum -C <command>" will use yum cache rather then download repomd data again.

Jeff Johnson

9:46 p.m.

On Jul 24, 2011, at 5:37 PM, Ljubomir Ljubojevic wrote:

...

Jeff Johnson wrote:

...
On Jul 24, 2011, at 4:35 PM, Ljubomir Ljubojevic wrote:

...
Oh, yeah, yum reads and process xml files, not actual files, so searches are fast because of it.

Here's something that might help you:

Using xml is a significant performance hit: see recent patches to yum/createrepo to use sqlite instead of xml … lemme find the check-in claim … here is the claim http://lists.baseurl.org/pipermail/rpm-metadata/2011-July/001353.html and quoting

Tested locally on repodata of 9000 pkgs.

Goes from 1.8-> 2GB of memory in use with the old createrepo code to 325MB of memory in use - same operation - performance-wise it is not considerably different. More testing will bear that out, though.

So -- if I believe those numbers -- there's *lots* of room for improvement in yum ripping out xml and replacing with a sqlite database. Note that createrepo != yum but some of the usage cases are similar. The general problem in yum (and smart and apt) is the high cost of the cache load, and the amount of aml that must be parsed/read in order to be cached. Adding a sqlite backing store which can just be used, not loaded, is a win.

You have mistaken createrepo with yum repomd data. Createrepo is for creating actual repository (I use mrepo).

I haven't (if you read what I said carefully). Meanwhile mrepo is nicely done, worth using if you have to babysit tonnes of package metadata. I like what Dag implements, sane and simple and useful.

...

Yum data (repomd, repoview) is different story. Every repository stores data in xml file packed with tar. They are unpacked in memory and xml data is parsed and put into internal database (and cache). It is very much possible that yum internally (for cache) uses sqlite database, haven't had the need to research. Using "yum -C <command>" will use yum cache rather then download repomd data again.

Please note that I'm speaking way way generally and from memory. What you gave me was a data point about how well yum performs, and yum is better than I would have guessed with 10+ repositories underneath it.

Anything else you read is pure crack smoke from me thinking out loud. I don't even agree with myself often ;-)

73 de Jeff

...

--

Ljubomir Ljubojevic (Love is in the Air) PL Computers Serbia, Europe

Google is the Mother, Google is the Father, and traceroute is your trusty Spiderman... StarOS, Mikrotik and CentOS/RHEL/Linux consultant _______________________________________________ CentOS-devel mailing list CentOS-devel@centos.org http://lists.centos.org/mailman/listinfo/centos-devel

seth vidal

25 Jul 25 Jul

3:27 a.m.

On Sun, 2011-07-24 at 23:37 +0200, Ljubomir Ljubojevic wrote:

...

You have mistaken createrepo with yum repomd data. Createrepo is for creating actual repository (I use mrepo).

Yum data (repomd, repoview) is different story. Every repository stores data in xml file packed with tar. They are unpacked in memory and xml data is parsed and put into internal database (and cache). It is very much possible that yum internally (for cache) uses sqlite database, haven't had the need to research. Using "yum -C <command>" will use yum cache rather then download repomd data again.

Ljubomir, If you wanna pop by #fedora-devel on freenode tomorrow the mock, yum and createrepo(clark, james, me, tim, nils, panu and others) maintainers and developers tend to all be in there often.

the long and short is - yum plugins don't work inside mock overly well and they are tricky to get in there.

You could also pop by #yum on freenode, you'll find a lot of similarly good folks around who can help.

finally, just to dispel any confusion: yum uses sqlitedbs first if they are available. createrepo generates both the xml and the sqlitedbs, by default repomd.xml is just used for the index to the repo.

good luck with your building process. -sv

Ljubomir Ljubojevic

8:01 a.m.

seth vidal wrote:

...

Ljubomir, If you wanna pop by #fedora-devel on freenode tomorrow the mock, yum and createrepo(clark, james, me, tim, nils, panu and others) maintainers and developers tend to all be in there often.

the long and short is - yum plugins don't work inside mock overly well and they are tricky to get in there.

Is there a short explanation how you sort repository priority? Like "on top will have first priority"? This is only real problem I see now. I have added priority=x to mock cfg and will experiment, but definite "this is how you should do it" would be nice since I am not able to find this info so far. Ones repos are set I am going to test compilation.

...

You could also pop by #yum on freenode, you'll find a lot of similarly good folks around who can help.

I will see to find free time for IRC. Thanks.

5101

Age (days ago)

5102

Last active (days ago)

devel@lists.centos.org

19 comments

5 participants

tags (0)

participants (5)

Alberto Sentieri
Jeff Johnson
Ljubomir Ljubojevic
Manuel Wolfshant
seth vidal