[Ci-users] RDO jobs in cico

Brian Stinson brian at bstinson.com
Fri Aug 12 19:34:06 UTC 2016


On Aug 08 12:59, David Moreau Simard wrote:
> I'm having a hard time understanding what "X machines every 10 minutes" means.
> 
> Our jobs are long-lived and tend to be launched simultaneously in a pipeline.
> For example, we have this pipeline where 8 jobs will launch at once
> but then 6 of those will complete in ~45 minutes and then the other
> two take around ~90 minutes.
> 
> Is the quota represented as an amount of nodes per 10 minutes or an
> absolute cap on the concurrent nodes ?
> I think a cap on the concurrent nodes would make more sense ?
> 
> i.e, a tenant would not be able to request more than 30 nodes. If he
> requests a node and the tenant already has 30 active nodes, the
> request is refused with an error similar to the one we bump into when
> duffy is out of inventory.
> 
> 
> David Moreau Simard
> Senior Software Engineer | Openstack RDO
> 
> dmsimard = [irc, github, twitter]
> 
> 
> On Mon, Aug 8, 2016 at 11:24 AM, Karanbir Singh <mail-lists at karan.org> wrote:
> > hi David,
> >
> > so is 2 machines every 10 min a good rate to start from ?
> >
> > regards,
> >
> > On 08/08/16 16:08, David Moreau Simard wrote:
> >> Using the new slaves has a couple objectives:
> >> 1) Test the new OpenStack cloud
> >> 2) Increase redundancy (given the instability of our existing slave
> >> the past few weeks)
> >> 3) Increase concurrency/capacity
> >>
> >> We had 16 threads on a single slave before (down from 24) and that
> >> single slave was struggling to cope when all those 16 threads were
> >> actually busy.
> >> The new slaves have 8 threads each and we lowered the amount of
> >> threads on the original slave back to 10 so it isn't loaded (and isn't
> >> crashing) as much.
> >>
> >> So we're now at 34 threads total and I can indeed tell from our
> >> consumption logging that the usage has increased and peaks higher than
> >> before.
> >> We'll scale down the threads to 24 total, can you tell us if you see
> >> improvements ?
> >>
> >> We're also waiting for the feature in Duffy that'll enable us to track
> >> which node is associated with which job so we can hunt jobs that are
> >> potentially not being very good citizens.
> >>
> >>
> >> David Moreau Simard
> >> Senior Software Engineer | Openstack RDO
> >>
> >> dmsimard = [irc, github, twitter]
> >>
> >>
> >> On Mon, Aug 8, 2016 at 9:30 AM, Karanbir Singh <mail-lists at karan.org> wrote:
> >>> hi guys,
> >>>
> >>> with an increase in the number of slaves, we've noticed that the rdo
> >>> jobs are deploying machines at a much higher velocity than before - as a
> >>> result the ready pool is consistently hitting the low water mark.
> >>>
> >>> Rather than do an overall quota limit, I'm looking at limiting the
> >>> number of duffy deploy's per 10 min cycle, but rather than propose
> >>> something I'd like to see what folks think is a reasonable number to
> >>> start from ?
> >>>
> >>> regards,
> >>>
> >>> --
> >>> Karanbir Singh
> >>> +44-207-0999389 | http://www.karan.org/ | twitter.com/kbsingh
> >>> GnuPG Key : http://www.karan.org/publickey.asc
> >>> _______________________________________________
> >>> Ci-users mailing list
> >>> Ci-users at centos.org
> >>> https://lists.centos.org/mailman/listinfo/ci-users
> >>
> >
> >
> > --
> > Karanbir Singh
> > +44-207-0999389 | http://www.karan.org/ | twitter.com/kbsingh
> > GnuPG Key : http://www.karan.org/publickey.asc

Did we land on an acceptable rate for this? We've exhausted the ready
pool completely a few times today.

One thing that might help is to work on the backoff in cicoclient.
@David do your jobs sit in a holding pattern if the API returns 'out of
nodes'?  If so, we should peg the scheduled retry interval to our
estimate of how long it takes the workers to fill out some machines. 

--Brian 



More information about the Ci-users mailing list