[CentOS-devel] CentOS SIGs and lookaside cache

Mon Feb 21 15:51:02 UTC 2022
Pierre-Yves Chibon <pingou at pingoured.fr>

Good Morning Everyone,

There are currently two lookaside caches in use around the CentOS project:
* One used by CentOS-Stream: https://sources.stream.centos.org/sources it's not
  browsable, but it uses the structure:
  `baseurl/pkgname/tarball/hashtype/hash/tarball`. Example:
* One used by CentOS-Linux, CentOS-Stream 8 and the SIGs:
  https://git.centos.org/sources/ this one is browsable and as you can see uses
  the structure: `baseurl/pkgname/branch/hash`. Example:

The rest of this email focuses on this second one. SIGs upload to it using the
route: https://git.centos.org/sources/upload.cgi

In an email last week [1] was proposed an idea for how SIGs could leverage the
centos namespace in gitlab for those who wishes.

One of the benefits of using gitlab would be increased flexibility for SIGs and
a clear example for this would be the ability to drop the branch structures
currently imposed on the git repositories. That structure is imposed because the
git repositories are shared between CentOS-Linux, CentOS-Stream and (potentially
multiple) SIGs, so that structure ensures groups are not stepping on each
other's toes. By moving the SIGs out of these shared repositories, imposing that
structure is no longer needed.

However, since the lookaside cache relies on branch name, lifting that structure
would break the lookaside cache.

I have already brought this idea to a few folks to see if the idea was sane. The
consensus that emerged is:
* Introduce a new upload endpoint next to the existing one, something like:
* That new endpoint would upload the sources given using the same structure as
  the one used for CentOS-Stream, but ensuring that the person uploading is
  member of at least one SIG.

The idea of using `sig_upload.cgi` instead of just replacing `upload.cgi` is the
assumption that we want to preserve the current structure used for CentOS-Linux
and CentOS-Stream, allowing to find more easily which sources are used where and
not impacting the process Red Hat uses to push its releases.

Since the structures used by the two upload scripts are different, they will not
What we will end up seeing is something like:

├── pkg1
│   ├── c7
│   │   ├── hash1
│   │   └── hash2
│   ├── c8
│   │   ├── hash3
│   │   └── hash4
│   ├── tarball1
│   │    └── sha name
│   │         └── hash5
│   │              └── tarball1
│   └── tarball2
│       └── sha name
│            └── hash6
│                 └── tarball2
└── pkg2
    ├── c8
    │   ├── hash7
    │   └── hash8
    ├── c8s
    │   ├── hash9
    │   └── hash10
    ├── tarball3
    │    └── sha name
    │         └── hash11
    │              └── tarball3
    └── tarball4
        └── sha name
             └── hash12
                  └── tarball4
and so on

On CBS, the script that downloads the sources, will then need to be adjusted to
try first the old structure before trying the new one. This may slow down a
little bit the builds, but that should be most of the time, at most by a single
http request.

In this email I'm calling for feedback, do you like the idea?

I'm happy to work on making it happen if there is consensus on this :)

Looking forward for your thoughts,

[1] https://lists.centos.org/pipermail/centos-devel/2022-February/120216.html