On 08/08/16 17:59, David Moreau Simard wrote: > I'm having a hard time understanding what "X machines every 10 minutes" means. > > Our jobs are long-lived and tend to be launched simultaneously in a pipeline. > For example, we have this pipeline where 8 jobs will launch at once > but then 6 of those will complete in ~45 minutes and then the other > two take around ~90 minutes. > > Is the quota represented as an amount of nodes per 10 minutes or an > absolute cap on the concurrent nodes ? that would be the number of machines that can be allocated per 10 minutes. They can then run through to reap limits ( ideally keep it under 6 hrs ); if there are jobs that need more than 1 machine - we'd need to tweak it further. > I think a cap on the concurrent nodes would make more sense ? we do have that, but tweak it up as needed - eg. RDO is limited to 100 physical nodes at any one point; most projects start at 10 and we tweak up as needed ( again, were not trying to get in the way, just protecting machine stock against runaway scripts etc ). > i.e, a tenant would not be able to request more than 30 nodes. If he > requests a node and the tenant already has 30 active nodes, the > request is refused with an error similar to the one we bump into when > duffy is out of inventory. The rate at which we allocate machines, is the rate at which we need to install and provision machines from the unused machine stock - and the bootup + anaconda run + reboot and contextualisation takes time, so we need to tweak the flow in order to best optimise the allocation rate. -- Karanbir Singh +44-207-0999389 | http://www.karan.org/ | twitter.com/kbsingh GnuPG Key : http://www.karan.org/publickey.asc