hi,
We switched last Monday to new Duffy API and while we saw machines being requested (previous seamicro pool but also now VMs from AWS/EC2) and returned, from time to time tenants informed us of transient errors.
Based on some troubleshooting, it seems that Duffy api was answering in the same second to either different tenant with same nodes (so nodes being handed over to different tenants) or even same tenant but with different session IDs (but same hostname)
Nils (Duffy code author) is busy today looking at a fix and we'll let you know when we'll be able to roll it out.
PS : that means that we'll have also to stop Duffy API and proceed with some DB clean-up operations to restart from a clean/fresh situation. That will mean that duffy will consider deployed node as unused and so will start reinstalling these (to start from clean situation). We'll let you know when we'll proceed with that hotfix push
On 03/08/2022 16:23, Fabian Arrotin wrote:
hi,
We switched last Monday to new Duffy API and while we saw machines being requested (previous seamicro pool but also now VMs from AWS/EC2) and returned, from time to time tenants informed us of transient errors.
Based on some troubleshooting, it seems that Duffy api was answering in the same second to either different tenant with same nodes (so nodes being handed over to different tenants) or even same tenant but with different session IDs (but same hostname)
Nils (Duffy code author) is busy today looking at a fix and we'll let you know when we'll be able to roll it out.
PS : that means that we'll have also to stop Duffy API and proceed with some DB clean-up operations to restart from a clean/fresh situation. That will mean that duffy will consider deployed node as unused and so will start reinstalling these (to start from clean situation). We'll let you know when we'll proceed with that hotfix push
Follow-up : Nils released a newer minor version (https://pypi.org/project/duffy/3.3.2/) that should have needed fix.
I just rolled it out and restarted the service (ansible did it) while all existing deployed/sessions were cleared out from DB (to restart from a clean state)
Hope that it fixes the issue some of you were seeing and sorry for the troubles.