[CentOS] [OT] Concept: RepoDELTA (simple createrepo hack)

Fri Sep 9 15:20:31 UTC 2005
Bryan J. Smith <b.j.smith at ieee.org>

I'm not on the YUM lists, so I'll post this here in the hope
it will make someone happy.  I'll forward the concept to Seth
for consideration (if he doesn't see it here).  This is a
simple "createrepo" hack that would require corresponding
support in the "yum" binary as well.  I'm not sure this would
work, but it's just an idea I figure I might as well through
out.

PROBLEM

Users want to be able to access the "state" of a YUM
repository at any arbitrary point in time or by a known tag.

REQUIREMENTS

In order for this to work, there are 3 requirements:

1.  Appending-only Repository
No package deletions, packages are only added to the YUM
repository, never removed (at least not without careful
considerations -- i.e., not recommended).

2.  Multiple Repodata Instances/Tracking
For those who actually know how YUM works at the
HTTP-accessed repository, it is the "repodata" directory that
holds all meta-data/package list information.  In order for
any resolution to occur on any arbitrary date/tag, there must
be a way to control/delta/host multiple Repodata Instances
and track.

3.  Resolution
There must be a new set of mechanisms for resolving what
Repodata Instance to use, given an option in the YUM client. 
Either a radical change must occur in the way YUM operates
(i.e., repositories are accessed via HTTP), or the client
must take on this additional burden.

SOLUTION

1.  Doesn't change regardless of solution -- unless, of
course, the server wants the additional overhead of binary
deltas between RPMs.  No thanks, we'll just keep all RPM
package releases.

2/3.  An evolutionary approach is used, no fancy version
control system, just a flat directory system all still
accessed via HTTP.  This is how it works.

New Subdirectory:  ./repodelta

A.  Tree organization
Repodelta is a tree of subdirectories whose names are the
string of absolute POSIX seconds assuming UTC -- i.e., the
same output as "date -u +%s", e.g., ./repodelta/1126278362

B.  Tree index
Since HTTP does not lend itself to file management (at least
not without something like WebDAV on both the server and
client), there will be an index file of the tree.  This
should be a simple file, something like
"./repodelta/releases.txt" or similar.  It should be a flat
text file, just the `date -u +%s" format on each line,
although an MD5 or other checksum could be attached to the
end or a separate file for verification of its integrity.

C.  New createrepo option:  --delta
The createrepo should have a new option called "--delta". 
Instead of generating new meta-data files in the "./repodata"
subdirectory, it creates a new "./repodelta/`date -u +%s`"
subdirectory with new meta-data files, then symlinks
./repodata/ to it.  It also re-reads the "./repodelta/"
directory, looks for subdirectories ( [ -d ] "test" in most
scripting languages) with valid meta-data indicies (this
would be more subjective/involved), and regenerates the
"./repodelta/releases.txt" file with an date index to those
releases.  Again, we do this since standard HTTP does not
define file operations such as reading a directory list.

D.  Tags:  Symlinks
Creating tags is as simple as symlinking a new directory
against one of the `date -u +%s` format directories.  The
createrepo "--delta" option will look for symlinks in the
under "./repodelta", and add the tag in the
"./repodelta/releases.txt" after the de-referenced directory
name (whitespace delimited).  E.g., given the directory
listing:  
  ./repodelta/1126278362
  ./repodelta/4.1 -> 1126278362
  ./repodelta/current -> 1126278362
The line in ./repodelta/releases.txt would be: 
  1126278362 4.1 current
It probably wouldn't hurt for a reserved "HEAD" tag to be
automagically generated whenever createrepo --delta is run
for the new directory.

E.  YUM client options 
New YUM clients would have to be written with new options and
resolution logic -- date and tag options.  Resolution is
straight-forward for tags, although a decision whether to
default to HEAD or give an error if a tag doesn't exist would
have to be considered.  Date resolution is almost as
straight-forward -- given a date/time (in a variety of
formats -- absolute seconds, traditional format, reverse
offset from current, etc...), the YUM client will look for
the closest date that is "no later" than the one given. 
Kinda like the Price-is-Right, the closest without going over
(the given date/time).

F.  Backward compatibility, both client and server
Because the ./repodata/ is just a symlink, legacy YUM clients
work without modification.  And in the case where someone
runs a non-repodelta verison of createrepo, only the last
repo is overwritten.
NOTE:  An option to avoid this might be to have ./repodata
not be a symlink to a subdirectory in ./repodelta/, but a
copy of of the appropriate subdirectory.  Another option
would be to leave the symlink, but it always points to a
reserved ./repodelta/HEAD/ directory that is not a symlink. 
That way you always know the ./repodelta/`date -u +%s` is
always true to that date, and the only consideration is if
./repodelta/HEAD/ is not the same as the latest
./repodelta/`date -u +%s`.  In fact, that might be most
ideal, HEAD is _never_ a tag/symlink, but its own, real
subdirectory.

EXAMPLE:  

An example directory structure might be:  

  ./repodata
  ./repodelta/1125372135
  ./repodelta/1126278362
  ./repodelta/4.0 -> 1125372135
  ./repodelta/os -> 1125372135
  ./repodelta/4.1 -> 1126278362 
  ./repodelta/HEAD

Where ./repodelta/releases.txt contains:  
  HEAD
  1126278362 4.1
  1125372135 4.0 os

HEAD may or may not be the same as 1126278362.  Again,
refereincing "F" above, this is to prevent 1126278362 from
being changed in case someone runs "createrepo" without the
"--delta" option.  But if someone did when 1126278362 was
created, it will copy its contents into HEAD.  And if no one
has run createrepo since, the contents will match.

Again, this "hack" is probably very easy to implement on the
server side.  In fact, it could be run daily or on another,
regular period.  If new files have been added (Q: by date? or
my comparing to the previous meta-data?), it will revision a
new ./repodelta/`date -u +%s` subdirectory and the other
operations.  If not, there is no sense in creating a second,
exact copy from the previous run.


-- 
Bryan J. Smith                | Sent from Yahoo Mail
mailto:b.j.smith at ieee.org     |  (please excuse any
http://thebs413.blogspot.com/ |   missing headers)