Good Morning!
I was recently seeing quite a few failures on the CI jobs of the Automotive SIG.
Diving into it a little bit, it looks like duffy errors with:
{
"error": {
"detail": "can't reserve nodes: quantity=1 pool='metal-ec2-c5n-centos-9s-x86_64'"
}
}
and
{
"error": {
"detail": "can't reserve nodes: quantity=1 pool='virt-ec2-c6g-centos-9s-aarch64'"
}
So I check the pools:
$ duffy client list-pools
{
"action": "get",
"pools": [
{
"name": "virt-ec2-t2-centos-9s-x86_64",
"fill_level": 8
},
{
"name": "virt-ec2-c6g-centos-9s-aarch64",
"fill_level": 5
},
{
"name": "metal-ec2-c5n-centos-9s-x86_64",
"fill_level": 6
},
{
"name": "virt-ec2-t2-centos-10s-x86_64",
"fill_level": 8
},
{
"name": "virt-ec2-c6g-centos-10s-aarch64",
"fill_level": 2
},
{
"name": "metal-ec2-c5n-centos-10s-x86_64",
"fill_level": 3
}
]
}
Which seems fine from here. Do we have monitoring of how busy the nodes are in
the pools?
Also: is there a way to have duffy client request-session have a --retry option
so that it can keep on retrying until a node becomes available? (Maybe combined
with a --timeout?)
(happy to work with claude on the patches if that's deemed helpful)
Thanks in advance :)
Pierre