On 23/08/2022 16:41, Camila Granella wrote:
Hi all,
Earlier today the infra team attempted to bump the amount of metal machines available for provisioning on Duffy. However, the AWS API returned that currently there is no capacity to provision metal machines in the Availability Zone we are currently in (us-east-1a). For this reason, we will need to default to the use of EC2.
I had a look at the number of deployed c5n.metal instances for c8s and it reached 11 nodes ... so that also means that now Duffy is trying to have 5 nodes in Ready state (it was bumped from 1 to 5 through git commit/push earlier today)
It seems we're reaching a limit of c5n.metal available physical machines in us-east-1 (we use 3 availability zones there, through three subnets in dedicated duffy VPC)
Worth knowing that Duffy is catching ansible error and so knows that it was failing, so just retries every 60 seconds to provision such instance type machines , but by looking at the logs, we clearly ask much more than what AWS can offer. And that's also normal : AWS is about EC2 Virtual Machines, not (costly) bare-metal options. Also worth knowing that we added that option to let people transition their workflow but clearly metal option will be limited (by AWS availability, not even by us in this case) ....
For the time being, you can just put all your jobs in a queue, and retry to get one node through duffy api, if duffy itself was able to have some in ready state . At each point, one can see the pool status :
duffy client show-pool metal-ec2-c5n-centos-8s-x86_64 { "action": "get", "pool": { "name": "metal-ec2-c5n-centos-8s-x86_64", "fill_level": 5, "levels": { "provisioning": 0, "ready": 0, "contextualizing": 0, "deployed": 10, "deprovisioning": 0 } } }
In this case, it's showing 10 metal nodes deployed to tenants, and duffy not able to provision more (provisioning will show number and back to zero if it fails)