*TL;DR: CentOS CI is going hardwareless and if you wish your project
remains using it, we need your opt-in by August 2022. There is a Dojo
Summer 2022 <https://wiki.centos.org/Events/Dojo/Summer2022> session
happening on Thursday, June 17th, that will explain further technical
details. *
Hello everyone,
As many of you know, since the beginning of this year we have been
reevaluating the future of CentOS CI, as currently the hardware being used
for it is out of warranty. This is due to the fact that CentOS CI came from
community donations of hardware which were maintained in a best effort
manner by our team. With no warranties, when the physical machine dies we
have no means to replace it. Right now though, our hardware, due to a lack
of warranty, will not be moved with the upcoming data center changes due to
data center requirements to have in warranty hardware for supportability.
We decided to take this opportunity to modernize our current
infrastructure, pushing it to a hybrid cloud environment. Duffy CI will
become the main tool from now on, so that we can support the CI workflow
and best practices on cloud and for this reason, the current hardware infra
will no longer be available soon. However, as an effort to continuously
provide resources and support CI best practices for projects, our team is
adapting Duffy CI so that we can maintain most of the characteristics of
our current, physical-based offering.
At the technical level, what does that mean for you, CI tenants?
-
A new Duffy API service will replace the existing one: while it will be
running in compatibility/legacy mode with the previous version, you will
need to adapt your workflow to the new API, but more details below
-
We will transition to AWS EC2 instances for the aarch64 and x86_64
architectures by default, with a (limited) option to request “metal”
instances for projects requiring virtualization for their tests (like
KVM/vagrant/etc)
-
We will keep a (very small) Power9 infra “on-premise” (AWS does not
support ppc64le) for the ppc64le tests (available through a dedicated VPN
tunnel)
-
The existing OpenShift cluster will be also decommissioned and a new one
(hosted in AWS, so without an option to run kubevirt operator nor VMs) will
be then used (you will have to migrate from one to the other)
With that being said, tenants can start preparing for the changes to happen
with the maximum deadline of the end of December 2022 wherein at this
point, Duffy API legacy mode will be removed. You are required to opt-in if
you and/or your team want to use Duffy CI. Projects will only be migrated
if they reply to this email confirming that they wish to proceed. Worth
knowing that not opting in means that your API key will not be migrated and
so all your requests to get temporary/ephemeral nodes will be rejected by
the new Duffy API.
The maximum decommission deadline of the current hardware infrastructure is
December 12th, 2022 and the new Duffy CI will go live in August 2022, so
please, complete your migration process by the end of CY22. Reminders of
deadlines and of the opt-in requirements will be sent monthly, but your
confirmation of opt-in is required by August 2022. When approaching
December, reminders about deadlines frequency will increase so that we can
ensure effective communication throughout the process.
Here are the steps in which we will migrate CI Infra:
Phase 1 - Deploy Duffy V3 (August 2022)
-
Deploy in legacy/compatibility mode, so existing tenants (that opted in
!) can still request duffy nodes the same way (like with
'python-cicoclient') : no change at tenants side, and exactly same hardware
for tests (transparent migration)
-
New Duffy API endpoint becomes available, and tenants can start adapting
their workflows to point to new API (new ‘duffy-cli’ tool coming, with
documentation)
-
Bare metal and VMs options will be available already through the new
API (x86_64, aarch64, ppc64le)
Phase 2 - Hybrid Cloud (October 2022)
-
Legacy/compatibility API endpoint will handover EC2 instances instead of
local seamicro nodes (VMs vs bare metal)
-
Bare metal options will be available through the new API only
-
Legacy seamicro and aarch64/ThunderX hardware are decommissioned
-
Only remaining "on-premise" option is ppc64le (local cloud)
Phase 3 - Decommission (December 2022)
-
Legacy/compatibility API deprecated and requests (even for EC2
instances) will no longer be accepted
-
All tenants that opted in will be using only EC2 for aarch64/x86_64 and
on-premise cloud for ppc64le
OpenShift new deployment planning and timeline
To be defined (deadline for planning and timeline: end of June 2022)
Do not hesitate to reach out if you have any questions. It is worth knowing
that there will be a dedicated session about the Future of CentOS CI infra
at the next CentOS Dojo happening on June 17h (check Dojo Summer 2022
<https://wiki.centos.org/Events/Dojo/Summer2022>). That session will be
recorded and then available on Youtube but if you have any questions. Feel
free to join the CentOS Dojo and reach out to us!
Best regards,
--
Camila Granella
Associate Manager, Software Engineering
Red Hat <https://www.redhat.com/>
@Red Hat <https://twitter.com/redhat> Red Hat
<https://www.linkedin.com/company/red-hat> Red Hat
<https://www.facebook.com/RedHatInc>
<https://www.redhat.com/>
As announced some months ago, we are moving all CI infra components to
new infra (mostly AWS)
WRT Duffy api service itself, phase 1 was completed earlier this month
(August) and phase 2 will be announced for October.
Another service we need to migrate is https://artifacts.ci.centos.org,
which is the next service we'll migrate next week.
Migration is scheduled for """"Monday September 5th, 6:00 am UTC time"""".
You can convert to local time with $(date -d '2022-09-05 06:00 UTC')
The expected "downtime" should be really small, as it's will be just
pointing DNS CNAME record to new host, and eventually (see below) last
rsync between hosts.
Worth knowing that CI tenants using that service in the past were
relying on direct rsync access (tcp/873 and available only internally in
dedicated CI VLAN) with a rsync secret.
Due to the service being now publicly available on AWS, we decided to
just disable plain rsync but allow rsync over ssh (or sftp/scp if you
want), reusing your existing project keypair.
More details available at
https://sigs.centos.org/guide/ci/#artifacts-storage
Don't forget to update your script for next monday or you'll not be able
to push to the new storage server !
PS: while we consider all data "ephemeral" (and so to be discarded as
there is also no backup at all for that temporary hosting solution), we
can though migrate your existing data from old to new server. For that,
please "opt-in" in the existing ticket so that we can keep track of
projects we need to migrate. See ticket
https://pagure.io/centos-infra/issue/906
Thanks for your understanding and patience.
on behalf of the Infra team,
--
Fabian Arrotin
The CentOS Project | https://www.centos.org
gpg key: 17F3B7A1 | twitter: @arrfab
Hi,
We are currently suffering from a flapping network connectivity to the
main DC where majority of the CentOS Infra is hosted.
After some internal discussion, we confirmed that upstream link provider
is aware of the issue and they are looking for a fix (but no ETA)
Impacted services:
- centos ci
- git.centos.org
- cbs.centos.org
- mirror.centos.org (downstream consumer and having issues pulling content)
- mirror.stream.centos.org (downstream consumer and having issues
pulling content)
- buildlogs.centos.org (same reason)
We'll post an update when this will be finally resolved
--
Fabian Arrotin
The CentOS Project | https://www.centos.org
gpg key: 17F3B7A1 | twitter: @arrfab
Hello everyone,
This is a friendly reminder of the current and upcoming status of CentOS CI
changes (check [1]).
Projects that opted-in for continuing on CentOS CI have been migrated, and
the new Duffy API is available. With that, *phase 0 has been completed*.
Regarding *phase 1*, we are still working on a permanent fix for the DB
Concurrency issues [2]. Also, as for our OpenShift new deployment, we have
a staging environment up and running, and it should be available at
the beginning of September 2022.
In October 2022 we begin *phase 2* when we will work through the following
items (these were also previously communicated in [1]):
- legacy/compatibility API endpoint will handover EC2 instances instead
of local seamicro nodes (VMs vs bare metal)
- bare-metal options will be available through the new API only
- legacy seamicro and aarch64/ThunderX hardware are decommissioned
- only remaining "on-premises" option is ppc64le (local cloud)Feel free
to reach out if you have any questions or concerns
The final deadline for decommissioning the old infrastructure (*phase 3*)
is *December 2022*. We will be communicating further until then, and
meanwhile, reach out to any of us in case you have any questions.
Regards,
[1] [ci-users] Changes on CentOS CI and next steps:
https://lists.centos.org/pipermail/ci-users/2022-June/004547.html
[2] DB Concurrency issues: https://github.com/CentOS/duffy/issues/523
--
Camila Granella
Associate Manager, Software Engineering
Red Hat <https://www.redhat.com/>
@Red Hat <https://twitter.com/redhat> Red Hat
<https://www.linkedin.com/company/red-hat> Red Hat
<https://www.facebook.com/RedHatInc>
<https://www.redhat.com/>
hi,
We switched last Monday to new Duffy API and while we saw machines being
requested (previous seamicro pool but also now VMs from AWS/EC2) and
returned, from time to time tenants informed us of transient errors.
Based on some troubleshooting, it seems that Duffy api was answering in
the same second to either different tenant with same nodes (so nodes
being handed over to different tenants) or even same tenant but with
different session IDs (but same hostname)
Nils (Duffy code author) is busy today looking at a fix and we'll let
you know when we'll be able to roll it out.
PS : that means that we'll have also to stop Duffy API and proceed with
some DB clean-up operations to restart from a clean/fresh situation.
That will mean that duffy will consider deployed node as unused and so
will start reinstalling these (to start from clean situation). We'll let
you know when we'll proceed with that hotfix push
--
Fabian Arrotin
The CentOS Project | https://www.centos.org
gpg key: 17F3B7A1 | twitter: @arrfab