On 20/10/2025 17:51, Pierre-Yves Chibon wrote:
Good Morning!
I was recently seeing quite a few failures on the CI jobs of the Automotive SIG. Diving into it a little bit, it looks like duffy errors with: { "error": { "detail": "can't reserve nodes: quantity=1 pool='metal-ec2-c5n-centos-9s-x86_64'" } } and { "error": { "detail": "can't reserve nodes: quantity=1 pool='virt-ec2-c6g-centos-9s-aarch64'" }
So I check the pools: $ duffy client list-pools { "action": "get", "pools": [ { "name": "virt-ec2-t2-centos-9s-x86_64", "fill_level": 8 }, { "name": "virt-ec2-c6g-centos-9s-aarch64", "fill_level": 5 }, { "name": "metal-ec2-c5n-centos-9s-x86_64", "fill_level": 6 }, { "name": "virt-ec2-t2-centos-10s-x86_64", "fill_level": 8 }, { "name": "virt-ec2-c6g-centos-10s-aarch64", "fill_level": 2 }, { "name": "metal-ec2-c5n-centos-10s-x86_64", "fill_level": 3 } ] }
Which seems fine from here. Do we have monitoring of how busy the nodes are in the pools? Also: is there a way to have duffy client request-session have a --retry option so that it can keep on retrying until a node becomes available? (Maybe combined with a --timeout?) (happy to work with claude on the patches if that's deemed helpful)
Thanks in advance :) Pierre
Welcome to us-east-1 outage day ... https://health.aws.amazon.com/health/status
For the --retry option, you can eventually open a PR against duffy code itself, and Nils will probably review it if he has time ;-)