hi what's the current status of deltarpms and presto for centos? i'm just rebuild deltarpms, presto-utils, yum-presto packages from fedora for centos. is there any plan to add them to centos extras? what's the current state? i'm interested about not just using but generating deltarpms enabled repositories. imho it can save a lot's of bandwith for everybody (not to mention if presto will use google's courgette algorithm too). thanks in advance.
On Sat, 2009-11-14 at 16:08 +0100, Farkas Levente wrote:
hi what's the current status of deltarpms and presto for centos? i'm just rebuild deltarpms, presto-utils, yum-presto packages from fedora for centos. is there any plan to add them to centos extras? what's the current state? i'm interested about not just using but generating deltarpms enabled repositories. imho it can save a lot's of bandwith for everybody (not to mention if presto will use google's courgette algorithm too). thanks in advance.
FWIW, deltarpm is already in EPEL and I don't mind putting yum-presto there as well (though both would probably be better served in CentOS extras if we want CentOS to use them).
There are currently test CentOS 5.4 repositories with deltarpms at http://lesloueizeh.com/centos5 (though they get updated manually, which means not too frequently).
As for presto and courgette, I'll respond to that at: https://bugzilla.redhat.com/show_bug.cgi?id=512515
Jonathan
On 11/14/2009 05:59 PM, Jonathan Dieter wrote:
On Sat, 2009-11-14 at 16:08 +0100, Farkas Levente wrote:
hi what's the current status of deltarpms and presto for centos? i'm just rebuild deltarpms, presto-utils, yum-presto packages from fedora for centos. is there any plan to add them to centos extras? what's the current state? i'm interested about not just using but generating deltarpms enabled repositories. imho it can save a lot's of bandwith for everybody (not to mention if presto will use google's courgette algorithm too). thanks in advance.
FWIW, deltarpm is already in EPEL and I don't mind putting yum-presto there as well (though both would probably be better served in CentOS extras if we want CentOS to use them).
There are currently test CentOS 5.4 repositories with deltarpms at http://lesloueizeh.com/centos5 (though they get updated manually, which means not too frequently).
ok let make things more specific:
deltarpm in epel 3.4-8.el5.1 while in fedora 3.5-0.4.20090913git is there any significant/relevant changes? when do you plan 3.5 release?
i'd be useful to add yum-presto to epel too. at the same time which version required by yum-presto? in rhel/centos-5 yum-3.2.22 included, but i read somewhere (and i can't find now where) that at least yum-3.2.23 required. so what's the correct version?
also a presto-utils would be useful in epel. rebuilding from fedora presto-utils-0.3.4-3 gives this error: -------------------------------------- byte-compiling /var/tmp/presto-utils-0.3.4-3.el5-root-lfarkas/usr/lib/python2.4/site-packages/presto-utils/deltarpmd.py to deltarpmd.pyc File "/usr/lib/python2.4/site-packages/presto-utils/deltarpmd.py", line 98 class Builder(): ^ SyntaxError: invalid syntax -------------------------------------- while running: -------------------------------------- # createdeltarpms . drpms /usr/bin/python: module presto-utils.gendeltarpms not found -------------------------------------- gives the above error on centos-5.4
it's be useful to be at least a minimal manual how we can use these tools and and who we can generate deltarpms.
As for presto and courgette, I'll respond to that at: https://bugzilla.redhat.com/show_bug.cgi?id=512515
ok i respond there too.
On Nov 15, 2009, at 1:30 PM, Farkas Levente wrote:
As for presto and courgette, I'll respond to that at: https://bugzilla.redhat.com/show_bug.cgi?id=512515
ok i respond there too.
FWIW, there are deep fundamental design issues wrto Courgette that have nothing to do with with whether Google is choosing Courgette for Chromium peculier updates.
For starters:
1) RFC 3229 at http://www.rfc-editor.org/rfc/rfc3229.txt
This is what subversion uses (afaik) instead of xdelta While I privately like xdelta _A LOT_ and I think that Josh McDonald's master's thesis, and xdelta[123] are the cat's pajama's, xdelta code is quite obscure and hard to justify deploying generally. YMMV, everyone's does, but (objectively) subevrsion chose vdelta rather than xdelta because xdelta code is insanely difficult and uncommented, and uses vdelta (ala RFC 3229) instead.
2) disassembling code to remove pointer entropy (as in Courgette) maye be a win for executables, but is not generally useful for presto (or packaging).
There are other issues, but its entirely unclear whether Courgette is The Right Answer for presto and deltarpm's atm.
Disclaimer: YMMV, everyone's does.
73 de Jeff
On Sun, 2009-11-15 at 13:44 -0500, Jeff Johnson wrote:
FWIW, there are deep fundamental design issues wrto Courgette that have nothing to do with with whether Google is choosing Courgette for Chromium peculier updates.
For starters:
RFC 3229 at http://www.rfc-editor.org/rfc/rfc3229.txt
This is what subversion uses (afaik) instead of xdelta While I privately like xdelta _A LOT_ and I think that Josh McDonald's master's thesis, and xdelta[123] are the cat's pajama's, xdelta code is quite obscure and hard to justify deploying generally. YMMV, everyone's does, but (objectively) subevrsion chose vdelta rather than xdelta because xdelta code is insanely difficult and uncommented, and uses vdelta (ala RFC 3229) instead.
FWIW, I think the delta algorithm is one of the smaller problems deltarpm needs to deal with right now.
- disassembling code to remove pointer entropy (as in Courgette)
maye be a win for executables, but is not generally useful for presto (or packaging).
To clarify for others following this thread, most of the files in an rpm are data, *not* executables.
Deltarpm currently has two big problems that keep it from having hugely efficient deltas even when two rpms have barely changed.
1) Any colored binaries that aren't in a multilib directory (i.e. /usr/bin/*) are never delta'd at all. This was because we didn't want to lose the complete delta because some 32-bit package on a 64-bit machine was missing some file in /usr/bin. We may want to rethink this now as 64-bit installs tend to have fewer 32-bit packages then when this decision was made. 2) A small change in an uncompressed file will result in a huge change after it's been compressed. Many of the larger packages have at least some compressed files and those files are essentially not delta'd at all.
In my mind, at least, solving these two problems will have a far bigger effect in reducing deltarpm size than adopting Courgette.
Jonathan
On Nov 15, 2009, at 2:45 PM, Jonathan Dieter wrote:
On Sun, 2009-11-15 at 13:44 -0500, Jeff Johnson wrote:
FWIW, there are deep fundamental design issues wrto Courgette that have nothing to do with with whether Google is choosing Courgette for Chromium peculier updates.
For starters:
RFC 3229 at http://www.rfc-editor.org/rfc/rfc3229.txt
This is what subversion uses (afaik) instead of xdelta While I privately like xdelta _A LOT_ and I think that Josh McDonald's master's thesis, and xdelta[123] are the cat's pajama's, xdelta code is quite obscure and hard to justify deploying generally. YMMV, everyone's does, but (objectively) subevrsion chose vdelta rather than xdelta because xdelta code is insanely difficult and uncommented, and uses vdelta (ala RFC 3229) instead.
FWIW, I think the delta algorithm is one of the smaller problems deltarpm needs to deal with right now.
But perhaps the current patent infringement actions underway wrto Courgette _IS_ an issue, particularly for Fedorable (and other risk-averse FLOSS distros):
http://www.h-online.com/open/news/item/Patent-action-over-Google-s-Courgette...
But you seem not to be able to find any code relevant to implementing Courgette in presto (per yr bugzilla entries), so it likely Simply Doesn't Matter.
- disassembling code to remove pointer entropy (as in Courgette)
maye be a win for executables, but is not generally useful for presto (or packaging).
To clarify for others following this thread, most of the files in an rpm are data, *not* executables.
Deltarpm currently has two big problems that keep it from having hugely efficient deltas even when two rpms have barely changed.
- Any colored binaries that aren't in a multilib directory
(i.e. /usr/bin/*) are never delta'd at all. This was because we didn't want to lose the complete delta because some 32-bit package on a 64-bit machine was missing some file in /usr/bin. We may want to rethink this now as 64-bit installs tend to have fewer 32-bit packages then when this decision was made. 2) A small change in an uncompressed file will result in a huge change after it's been compressed. Many of the larger packages have at least some compressed files and those files are essentially not delta'd at all.
Heh, colored binaries not in a multilib directory, and doubly compressed files, are the least of the problems with presto deltafication imho.
In my mind, at least, solving these two problems will have a far bigger effect in reducing deltarpm size than adopting Courgette.
Have fun!
73 de Jeff
On Sun, 2009-11-15 at 19:30 +0100, Farkas Levente wrote:
ok let make things more specific:
deltarpm in epel 3.4-8.el5.1 while in fedora 3.5-0.4.20090913git is there any significant/relevant changes? when do you plan 3.5 release?
I'm currently building it and will push it to testing ASAP. The main difference is that it supports xz-compressed RPMS (which shouldn't matter for CentOS) and it has a python API (which will be useful).
i'd be useful to add yum-presto to epel too. at the same time which version required by yum-presto? in rhel/centos-5 yum-3.2.22 included, but i read somewhere (and i can't find now where) that at least yum-3.2.23 required. so what's the correct version?
I think presto should work with any modern version of yum, including 3.2.22. In fact, I have at various times used it on my CentOS 5 boxes. I'll see about branching it for EPEL.
also a presto-utils would be useful in epel. rebuilding from fedora presto-utils-0.3.4-3 gives this error:
<snip>
it's be useful to be at least a minimal manual how we can use these tools and and who we can generate deltarpms.
Fedora is using createrepo, not presto-utils to generate their deltarpms. Presto-utils should be deprecated (or at least, mainly used for pruning out old drpms, etc).
Jonathan
On 11/15/2009 08:28 PM, Jonathan Dieter wrote:
On Sun, 2009-11-15 at 19:30 +0100, Farkas Levente wrote:
it's be useful to be at least a minimal manual how we can use these tools and and who we can generate deltarpms.
Fedora is using createrepo, not presto-utils to generate their deltarpms. Presto-utils should be deprecated (or at least, mainly used for pruning out old drpms, etc).
but rhel/centos has very old createrepo, so we'd have to update createrepo, but it's against centos policy to update packages which is in the upstream distro;-(
On 11/15/2009 10:23 PM, Farkas Levente wrote:
On 11/15/2009 08:28 PM, Jonathan Dieter wrote:
On Sun, 2009-11-15 at 19:30 +0100, Farkas Levente wrote:
it's be useful to be at least a minimal manual how we can use these tools and and who we can generate deltarpms.
Fedora is using createrepo, not presto-utils to generate their deltarpms. Presto-utils should be deprecated (or at least, mainly used for pruning out old drpms, etc).
but rhel/centos has very old createrepo, so we'd have to update createrepo, but it's against centos policy to update packages which is in the upstream distro;-(
and i find createrepo need yum-3.2.23 for delta support so that's another problem:-(
On 11/15/2009 08:28 PM, Jonathan Dieter wrote:
On Sun, 2009-11-15 at 19:30 +0100, Farkas Levente wrote:
ok let make things more specific:
deltarpm in epel 3.4-8.el5.1 while in fedora 3.5-0.4.20090913git is there any significant/relevant changes? when do you plan 3.5 release?
I'm currently building it and will push it to testing ASAP. The main difference is that it supports xz-compressed RPMS (which shouldn't matter for CentOS) and it has a python API (which will be useful).
i'd be useful to add yum-presto to epel too. at the same time which version required by yum-presto? in rhel/centos-5 yum-3.2.22 included, but i read somewhere (and i can't find now where) that at least yum-3.2.23 required. so what's the correct version?
I think presto should work with any modern version of yum, including 3.2.22. In fact, I have at various times used it on my CentOS 5 boxes. I'll see about branching it for EPEL.
also a presto-utils would be useful in epel. rebuilding from fedora presto-utils-0.3.4-3 gives this error:
<snip> > it's be useful to be at least a minimal manual how we can use these > tools and and who we can generate deltarpms.
Fedora is using createrepo, not presto-utils to generate their deltarpms. Presto-utils should be deprecated (or at least, mainly used for pruning out old drpms, etc).
FYI: now i able to do build and run delta update on my systems, but there was a few problems: - i use http://infrastructure.fedoraproject.org/%27s createrepo-0.9.8-2.el5 for generating deltarpms and delta repo and use fedora's yum-presto. - unfortunately this version of createrepo is not able to generate sha1 checksum for the repo (at least even if i set it on the command line i've got a python stack trace), but rhel/centos-5 doesn't have sha256 in python my default so you've to install python-hashlib and add createrepo to depend on it. imho it's a createrepo bug. - and also remove the versioned yum req from createrepo since 5.4's yum is enough. - but as it generate sha256 checksum i also have to add python-hashlib req for yum-presto (otherwise it can't check the checksum on the client side).
ps. another strange problem that in one of our internal rpm the deltarpm rebuild failed: ------------------------------------------ <delta rebuild> 93% [=========================================================================- ] 521 kB/s | 180 MB 00:25 ETA //var/cache/yum/xxx-beta-4.2.1-5082/deltas/xxx-4.2.0-4763.el5_4.2.1-5082.el5.i386.drpm: md5 mismatch of result <delta rebuild> | 193 MB 05:11 Error rebuilding rpm from xxx-4.2.0-4763.el5_4.2.1-5082.el5.i386.drpm! Will download full package. Presto reduced the update size by 74% (from 193 M to 51 M). ------------------------------------------ but good to know that fall back to normal rpm is working:-)
On Wed, 18 Nov 2009, Farkas Levente wrote:
FYI: now i able to do build and run delta update on my systems, but there was a few problems:
createrepo-0.9.8-2.el5 for generating deltarpms and delta repo and use fedora's yum-presto.
- unfortunately this version of createrepo is not able to generate sha1
checksum for the repo (at least even if i set it on the command line i've got a python stack trace), but rhel/centos-5 doesn't have sha256 in python my default so you've to install python-hashlib and add createrepo to depend on it. imho it's a createrepo bug.
createrepo -s sha /path/
works just fine for me on 0.9.8-2
If you have a bug, file it.
-sv
Am 14.11.09 16:08, schrieb Farkas Levente:
hi what's the current status of deltarpms and presto for centos? i'm just rebuild deltarpms, presto-utils, yum-presto packages from fedora for centos. is there any plan to add them to centos extras? what's the current state? i'm interested about not just using but generating deltarpms enabled repositories. imho it can save a lot's of bandwith for everybody (not to mention if presto will use google's courgette algorithm too).
Do you have any metrics?
Like: Updates for 5.x are x GB without prestom but only y GB with presto? How much additional space will be required on the mirrors? Other things which might be needed to make a decision?
Ralph
On Mon, 2009-11-16 at 11:06 +0100, Ralph Angenendt wrote:
Am 14.11.09 16:08, schrieb Farkas Levente:
hi what's the current status of deltarpms and presto for centos? i'm just rebuild deltarpms, presto-utils, yum-presto packages from fedora for centos. is there any plan to add them to centos extras? what's the current state? i'm interested about not just using but generating deltarpms enabled repositories. imho it can save a lot's of bandwith for everybody (not to mention if presto will use google's courgette algorithm too).
Do you have any metrics?
Like: Updates for 5.x are x GB without prestom but only y GB with presto? How much additional space will be required on the mirrors? Other things which might be needed to make a decision?
In my presto-enabled CentOS 5.4 i386 mirror, the deltarpms take up 91MB (compared to 1.2GB for the actual RPMS).
Total savings tends to be anywhere from 60%-80%, though YMMV. Large packages like openoffice tend to delta well, while packages with lots of compressed files tend to delta poorly.
In a rather extreme example, openoffice.org-core-2.3.0-6.11.el5_4.1 is 88MB. The deltarpm from 2.3.0-6.11.el5 is 917K. In a more normal example, kernel-PAE-2.6.18-164.6.1.el5 is 16M, while the deltarpm from 2.6.18-164.2.1.el5 is 2.5MB.
Jonathan
Am 16.11.09 11:26, schrieb Jonathan Dieter:
On Mon, 2009-11-16 at 11:06 +0100, Ralph Angenendt wrote:
Like: Updates for 5.x are x GB without prestom but only y GB with presto? How much additional space will be required on the mirrors? Other things which might be needed to make a decision?
In my presto-enabled CentOS 5.4 i386 mirror, the deltarpms take up 91MB (compared to 1.2GB for the actual RPMS).
Hmmm. 5.3 would be a bit more interesting, as it had a "complete round" of updates, but ...
So would you say that ~10% seems like a workable rule of thumb?
Another question: Can those repos be mixed or do I have to have an "updates/" and an "updates.presto/" directory?
Cheers,
Ralph
On Mon, 2009-11-16 at 12:52 +0100, Ralph Angenendt wrote:
Am 16.11.09 11:26, schrieb Jonathan Dieter:
On Mon, 2009-11-16 at 11:06 +0100, Ralph Angenendt wrote:
Like: Updates for 5.x are x GB without prestom but only y GB with presto? How much additional space will be required on the mirrors? Other things which might be needed to make a decision?
In my presto-enabled CentOS 5.4 i386 mirror, the deltarpms take up 91MB (compared to 1.2GB for the actual RPMS).
Hmmm. 5.3 would be a bit more interesting, as it had a "complete round" of updates, but ...
Unfortunately, I cleared out all of the 5.3 deltarpms when I updated to 5.4. I did do deltarpms from 5.3+updates (or was it just 5.3, I'm afraid I don't remember) to 5.4, and it came to 151MB of deltarpms.
So would you say that ~10% seems like a workable rule of thumb?
In Fedora, it's much larger (closer to 25% - 30%), but I think that has to do with the fact that in my mirrors, I throw away any deltarpms that save less than 50%, while Fedora seems to keep all generated deltarpms. Also, Fedora goes through far more updates in a cycle (and the updates tend to have bigger changes) than CentOS.
I just wish I had kept the 5.3 deltarpms so I could give you a better idea.
Another question: Can those repos be mixed or do I have to have an "updates/" and an "updates.presto/" directory?
Normally, the deltarpms are dumped into "updates/drpms/", while the regular rpms stay in "updates/". One of the most important requirements for yum-presto is that it falls back on regular rpms if the preferred deltarpm doesn't exist, so it's built on top of the regular yum procedure rather than replacing it.
Hope that clarifies things.
Jonathan