[CentOS-devel] CentOS SIGs and lookaside cache

Mon Feb 21 15:58:58 UTC 2022
Neal Gompa <ngompa13 at gmail.com>

On Mon, Feb 21, 2022 at 10:51 AM Pierre-Yves Chibon <pingou at pingoured.fr> wrote:
> Good Morning Everyone,
> There are currently two lookaside caches in use around the CentOS project:
> * One used by CentOS-Stream: https://sources.stream.centos.org/sources it's not
>   browsable, but it uses the structure:
>   `baseurl/pkgname/tarball/hashtype/hash/tarball`. Example:
>   https://sources.stream.centos.org/sources/rpms/kernel/linux-5.14.0-62.el9.tar.xz/sha512/f7aeac0fe5bf594933cd35b7ecc94ea8ddcbfedc04fa769c4da298e7bf105df116375d44711d944c748c85f61f96f6149be34c76eb37f28aa1f16359a9122abf/linux-5.14.0-62.el9.tar.xz
> * One used by CentOS-Linux, CentOS-Stream 8 and the SIGs:
>   https://git.centos.org/sources/ this one is browsable and as you can see uses
>   the structure: `baseurl/pkgname/branch/hash`. Example:
>   https://git.centos.org/sources/kernel/c8s/0c4e10577cfd4b4f8e3d83c0406da8ab05eb775f
> The rest of this email focuses on this second one. SIGs upload to it using the
> route: https://git.centos.org/sources/upload.cgi
> In an email last week [1] was proposed an idea for how SIGs could leverage the
> centos namespace in gitlab for those who wishes.
> One of the benefits of using gitlab would be increased flexibility for SIGs and
> a clear example for this would be the ability to drop the branch structures
> currently imposed on the git repositories. That structure is imposed because the
> git repositories are shared between CentOS-Linux, CentOS-Stream and (potentially
> multiple) SIGs, so that structure ensures groups are not stepping on each
> other's toes. By moving the SIGs out of these shared repositories, imposing that
> structure is no longer needed.
> However, since the lookaside cache relies on branch name, lifting that structure
> would break the lookaside cache.
> I have already brought this idea to a few folks to see if the idea was sane. The
> consensus that emerged is:
> * Introduce a new upload endpoint next to the existing one, something like:
>   https://git.centos.org/sources/sig_upload.cgi
> * That new endpoint would upload the sources given using the same structure as
>   the one used for CentOS-Stream, but ensuring that the person uploading is
>   member of at least one SIG.
> The idea of using `sig_upload.cgi` instead of just replacing `upload.cgi` is the
> assumption that we want to preserve the current structure used for CentOS-Linux
> and CentOS-Stream, allowing to find more easily which sources are used where and
> not impacting the process Red Hat uses to push its releases.
> Since the structures used by the two upload scripts are different, they will not
> conflict.
> What we will end up seeing is something like:
> sources
>> ├── pkg1
> │   ├── c7
> │   │   ├── hash1
> │   │   └── hash2
> │   ├── c8
> │   │   ├── hash3
> │   │   └── hash4
> │   ├── tarball1
> │   │    └── sha name
> │   │         └── hash5
> │   │              └── tarball1
> │   └── tarball2
> │       └── sha name
> │            └── hash6
> │                 └── tarball2
>> └── pkg2
>     ├── c8
>     │   ├── hash7
>     │   └── hash8
>     ├── c8s
>     │   ├── hash9
>     │   └── hash10
>     ├── tarball3
>     │    └── sha name
>     │         └── hash11
>     │              └── tarball3
>     └── tarball4
>         └── sha name
>              └── hash12
>                   └── tarball4
> and so on
> On CBS, the script that downloads the sources, will then need to be adjusted to
> try first the old structure before trying the new one. This may slow down a
> little bit the builds, but that should be most of the time, at most by a single
> http request.
> In this email I'm calling for feedback, do you like the idea?
> I'm happy to work on making it happen if there is consensus on this :)
> Looking forward for your thoughts,
> Pierre

The git server and git structure is orthogonal to the lookaside
problem. Fundamentally, the issue was that the same upload endpoint is
used for both Red Hat compliance and SIG work. We already have
authentication/authorization on branches at the Pagure level, so we
just lacked a way to handle this for the lookaside upload. By
splitting the endpoint, it should be possible to solve that since you
can deny access to the Red Hat endpoint to everyone.

However, I'd make a small suggestion: instead of changing the endpoint
URL for SIGs, change the endpoint URL for Red Hat. RCM uses that
endpoint through automation (I assume), so changing the endpoint for
the one service is considerably simpler than dealing with everyone's
own scripts to adjust for SIGs.

As an example, I've written automation to deal with Hyperscale work
because doing it by hand is a lot of grunt work. While I can probably
tweak my stuff easily enough, I don't know if *everyone* can.

And again, the lookaside thing is completely orthogonal to the git
structure. I should be able to use it just fine from git.centos.org in
the current branched package structure.

真実はいつも一つ!/ Always, there's only one truth!