[Ci-users] RDO jobs in cico

Mon Aug 15 08:49:10 UTC 2016
Karanbir Singh <mail-lists at karan.org>

On 14/08/16 17:17, Brian Stinson wrote:
> On Aug 14 09:07, Fabian Arrotin wrote:
>> On 12/08/16 21:34, Brian Stinson wrote:
>> <snip>
>>>
>>> Did we land on an acceptable rate for this? We've exhausted the ready
>>> pool completely a few times today.
>>>
>>> One thing that might help is to work on the backoff in cicoclient.
>>> @David do your jobs sit in a holding pattern if the API returns 'out of
>>> nodes'?  If so, we should peg the scheduled retry interval to our
>>> estimate of how long it takes the workers to fill out some machines. 
>>>
>>> --Brian 
> 
>> Another (possible) option is also to either :
>> - increase the number of workers on the CI infra sides
> 
> Did this temporarily yesterday.

We should really not need to do this - it just increases the number of
wasted nodes and increases the quantity of hardware that wont get used.
In an idea world we want to be a point where hardware is deployed just
in time to get consumed, ie. near zero ready nodes in the pool.
increasing the pool size just mask's away the real problem. So its ok to
do on a temporary basis for a day or to workaround a genuine spike, but
we should not let this go past the 20 number as a regular thing.


> 
>> - have the workers job deploy in parallel mutiple nodes instead of just
>> one (actually the ansible job can deploy multiple nodes already in
>> parallel, but duffy limits the call to only one specific node per job)
> 
> This is in the works, but will take some time to work it into duffy. 

The limits we have are more or less enforced from the hardware side are
they not ? the api call rate into the firmware starts taking quite a hit
once you go past a certain ( fairly low per chassis ? ) number. The only
way around this would be to remove the ansible abstraction and just call
the api directly from the python side.

However, I dont think this is really a problem we have at the moment.
folks who need a very high density of instances per minute can fall back
to just using the cloud infra, or using the jenkins queue management and
serialise better.

Regards,

-- 
Karanbir Singh
+44-207-0999389 | http://www.karan.org/ | twitter.com/kbsingh
GnuPG Key : http://www.karan.org/publickey.asc