[Ci-users] RDO jobs in cico

Mon Aug 8 17:08:50 UTC 2016
Karanbir Singh <mail-lists at karan.org>

On 08/08/16 17:59, David Moreau Simard wrote:
> I'm having a hard time understanding what "X machines every 10 minutes" means.
> Our jobs are long-lived and tend to be launched simultaneously in a pipeline.
> For example, we have this pipeline where 8 jobs will launch at once
> but then 6 of those will complete in ~45 minutes and then the other
> two take around ~90 minutes.
> Is the quota represented as an amount of nodes per 10 minutes or an
> absolute cap on the concurrent nodes ?

that would be the number of machines that can be allocated per 10
minutes. They can then run through to reap limits ( ideally keep it
under 6 hrs ); if there are jobs that need more than 1 machine - we'd
need to tweak it further.

> I think a cap on the concurrent nodes would make more sense ?

we do have that, but tweak it up as needed - eg. RDO  is limited to 100
physical nodes at any one point; most projects start at 10 and we tweak
up as needed ( again, were not trying to get in the way, just protecting
machine stock against runaway scripts etc ).

> i.e, a tenant would not be able to request more than 30 nodes. If he
> requests a node and the tenant already has 30 active nodes, the
> request is refused with an error similar to the one we bump into when
> duffy is out of inventory.

The rate at which we allocate machines, is the rate at which we need to
install and provision machines from the unused machine stock - and the
bootup + anaconda run + reboot and contextualisation takes time, so we
need to tweak the flow in order to best optimise the allocation rate.

Karanbir Singh
+44-207-0999389 | http://www.karan.org/ | twitter.com/kbsingh
GnuPG Key : http://www.karan.org/publickey.asc