Reaping at 12 hrs instead of 2 days

List overview All Threads
Download

newer

older

How do you transform the...

artifacts from inside and outside...

Karanbir Singh

22 Jan 2016 22 Jan '16

11:26 a.m.

hi,

In the coming days, we are going to start moving towards reaping orphaned machines every 12 hrs. At the moment, machines are reclaimed on the second date change ( ie. if you provisioned at 00:01 on anyday, you get 48 hrs before the machine is reclaimed, or if you provisioned at 23:59, you get 24 hrs ). Going forward, this will reduce to 12 hrs.

Looking through the last month worth of reclaimed nodes, it looks like the longest running roles typically finish in under 2 hrs or thereabouts - so 12 hrs should still be plenty of head room. Machines that dont usually returned within 3 hrs + are the ones we almost exclusively end up reclaiming on the timeout.

Therefore, this 12 hr timeout should not impact anyone / any jobs. Let me know if thats not the case.

Regards

-- Karanbir Singh +44-207-0999389 | http://www.karan.org/ | twitter.com/kbsingh GnuPG Key : http://www.karan.org/publickey.asc

Show replies by date

James

22 Jan 22 Jan

5:02 p.m.

On Fri, Jan 22, 2016 at 6:26 AM, Karanbir Singh mail-lists@karan.org wrote:

...

hi,

In the coming days, we are going to start moving towards reaping orphaned machines every 12 hrs. At the moment, machines are reclaimed on the second date change ( ie. if you provisioned at 00:01 on anyday, you get 48 hrs before the machine is reclaimed, or if you provisioned at 23:59, you get 24 hrs ). Going forward, this will reduce to 12 hrs.

Looking through the last month worth of reclaimed nodes, it looks like the longest running roles typically finish in under 2 hrs or thereabouts

so 12 hrs should still be plenty of head room. Machines that dont

usually returned within 3 hrs + are the ones we almost exclusively end up reclaiming on the timeout.

Therefore, this 12 hr timeout should not impact anyone / any jobs. Let me know if thats not the case.

Regards

I agree that this is a good idea, but in addition it would be useful to know which users/jobs are not reaping themselves so that they can fix their build scripts!

David Moreau Simard

5:39 p.m.

+1, I'd like to know if jobs are leaking.

I have something built into my jobs to be able to identify which job a node belongs to but this might not be the case for every job.

We could *probably* (just thinking out loud here) add a small hook to node requests in cicoclient that adds an identifier (i.e, echo $BUILD_URL > /root/.jenkins)

David Moreau Simard Senior Software Engineer | Openstack RDO

dmsimard = [irc, github, twitter]

On Fri, Jan 22, 2016 at 12:02 PM, James purpleidea@gmail.com wrote:

...

On Fri, Jan 22, 2016 at 6:26 AM, Karanbir Singh mail-lists@karan.org wrote:

...
hi,

In the coming days, we are going to start moving towards reaping orphaned machines every 12 hrs. At the moment, machines are reclaimed on the second date change ( ie. if you provisioned at 00:01 on anyday, you get 48 hrs before the machine is reclaimed, or if you provisioned at 23:59, you get 24 hrs ). Going forward, this will reduce to 12 hrs.

Looking through the last month worth of reclaimed nodes, it looks like the longest running roles typically finish in under 2 hrs or thereabouts

so 12 hrs should still be plenty of head room. Machines that dont

usually returned within 3 hrs + are the ones we almost exclusively end up reclaiming on the timeout.

Therefore, this 12 hr timeout should not impact anyone / any jobs. Let me know if thats not the case.

Regards

I agree that this is a good idea, but in addition it would be useful to know which users/jobs are not reaping themselves so that they can fix their build scripts! _______________________________________________ Ci-users mailing list Ci-users@centos.org https://lists.centos.org/mailman/listinfo/ci-users

James

5:44 p.m.

On Fri, Jan 22, 2016 at 12:39 PM, David Moreau Simard dms@redhat.com wrote:

...

+1, I'd like to know if jobs are leaking.

I have something built into my jobs to be able to identify which job a node belongs to but this might not be the case for every job.

We could *probably* (just thinking out loud here) add a small hook to node requests in cicoclient that adds an identifier (i.e, echo $BUILD_URL > /root/.jenkins)

Did you look at `env` to see if this contains what you want?

David Moreau Simard

5:47 p.m.

Yup, here's a list of environment variables that are available by default: https://wiki.jenkins-ci.org/display/JENKINS/Building+a+software+project

David Moreau Simard Senior Software Engineer | Openstack RDO

dmsimard = [irc, github, twitter]

On Fri, Jan 22, 2016 at 12:44 PM, James purpleidea@gmail.com wrote:

...

On Fri, Jan 22, 2016 at 12:39 PM, David Moreau Simard dms@redhat.com wrote:

...
+1, I'd like to know if jobs are leaking.

I have something built into my jobs to be able to identify which job a node belongs to but this might not be the case for every job.

We could *probably* (just thinking out loud here) add a small hook to node requests in cicoclient that adds an identifier (i.e, echo $BUILD_URL > /root/.jenkins)

Did you look at `env` to see if this contains what you want?

Karanbir Singh

5:49 p.m.

On 22/01/16 17:02, James wrote:

...

On Fri, Jan 22, 2016 at 6:26 AM, Karanbir Singh mail-lists@karan.org wrote:

...
hi,

In the coming days, we are going to start moving towards reaping orphaned machines every 12 hrs. At the moment, machines are reclaimed on the second date change ( ie. if you provisioned at 00:01 on anyday, you get 48 hrs before the machine is reclaimed, or if you provisioned at 23:59, you get 24 hrs ). Going forward, this will reduce to 12 hrs.

Looking through the last month worth of reclaimed nodes, it looks like the longest running roles typically finish in under 2 hrs or thereabouts

so 12 hrs should still be plenty of head room. Machines that dont

usually returned within 3 hrs + are the ones we almost exclusively end up reclaiming on the timeout.

Therefore, this 12 hr timeout should not impact anyone / any jobs. Let me know if thats not the case.

Regards

I agree that this is a good idea, but in addition it would be useful to know which users/jobs are not reaping themselves so that they can fix their build scripts!

We do try and communicate back with the projects, if you havent had Brian reach out yet, your jobs are ok :)

-- Karanbir Singh +44-207-0999389 | http://www.karan.org/ | twitter.com/kbsingh GnuPG Key : http://www.karan.org/publickey.asc

3483

Age (days ago)

3483

Last active (days ago)

ci-users@lists.centos.org

5 comments

4 participants

tags (0)

participants (4)

David Moreau Simard
James
Karanbir Singh
Karanbir Singh