[Ci-users] apps.ci.centos.org registry service degradtaion

Thu Sep 13 20:03:51 UTC 2018
Brian Stinson <brian at bstinson.com>

On Sep 13 12:28, Brian Stinson wrote:
> 
> Hi Folks,
> 
> I'm seeing a few issues with the internal registry on apps.ci.centos.org
> 
> 
> I'm going to work over the next couple of hours to get some things moved
> around and redeployed. You may notice some trouble pushing/pulling
> containers in the meantime. 
> 
> I'll give updates as they happen.
> 
> Cheers!
> 
> --Brian
> _______________________________________________
> Ci-users mailing list
> Ci-users at centos.org
> https://lists.centos.org/mailman/listinfo/ci-users

Hi All,

We should be back now, here's what happened:

- The registry service was hosted on one of our infra nodes, backed by
  local storage

- As part of regular cleanup cronjobs we *usually* try to prune out any
  orphaned image layers from the registry (if the layers are still in
  use, or still referenced in a tag we keep them around)

- The pruning process got stuck, and left the registry process in
  uninterruptable sleep.

To fix this we:

- Converted the registry to use shared storage so it can be
  scheduled/rescheduled across any of the infra nodes

- Restarted the infra node that was having trouble

- Ran a manual pruning

- Restored the registry service

If you notice any trouble because of this please let me know here or in
#centos-devel on freenode

Thanks all!

--
Brian Stinson
CentOS CI Infrastructure Team