Dirty nodes

List overview All Threads
Download

newer

older

URGENT - https://ci.centos.org/ is...

CentOS 7/8 OpenSSL / FIPS 140-2 /...

Katerina Foniok

20 Jan 2020 20 Jan '20

11:47 a.m.

Hello guys,

it seems that data are not cleaned correctly from the nodes. Some of our tests are failing on that. E.g. https://ci.centos.org/view/Devtools/job/devtools-rh-che-periodic-prod-2aProd... where the folder with the logs already exists. This job was run on devtools-ci-slave04.

Thank you for your time. Have a nice day!

Katka Foniok

Attachments:

attachment.html (text/html — 572 bytes)

Show replies by date

Vipul Siddharth

20 Jan 20 Jan

12:11 p.m.

On Mon, 20 Jan, 2020, 5:18 pm Katerina Foniok, kkanova@redhat.com wrote:

...

Hello guys,

it seems that data are not cleaned correctly from the nodes. Some of our tests are failing on that. E.g. https://ci.centos.org/view/Devtools/job/devtools-rh-che-periodic-prod-2aProd... where the folder with the logs already exists. This job was run on devtools-ci-slave04.

Hi, I am out for ~an hour and will see to fixing this as soon as I get home.

...

Thank you for your time. Have a nice day!

Katka Foniok _______________________________________________ CI-users mailing list CI-users@centos.org https://lists.centos.org/mailman/listinfo/ci-users

František Šumšal

6:30 p.m.

Hello,

Can confirm this issue as well. Also, some systemd jobs die with:

bash: dnf: command not found

on CentOS 8. Not sure if it's related.

On 1/20/20 12:47 PM, Katerina Foniok wrote:

...

Hello guys,

it seems that data are not cleaned correctly from the nodes. Some of our tests are failing on that. E.g. https://ci.centos.org/view/Devtools/job/devtools-rh-che-periodic-prod-2aProd... folder with the logs already exists. This job was run on devtools-ci-slave04.

Thank you for your time. Have a nice day!

Katka Foniok

CI-users mailing list CI-users@centos.org https://lists.centos.org/mailman/listinfo/ci-users

-- PGP Key ID: 0xFB738CE27B634E4B

Vipul Siddharth

9:12 p.m.

so the issue is another one of the reinstallation problems where it rolls back to the previous version. In +Katerina Foniok's case, it contained old data and in +František Šumšal's case, it was not actually a C8 node but a C7 marked wrongly. The solution is to confirm twice (after installing the OS) in duffy.. and the way we think that would be very easy to implement would be to check number of ssh key in authorized_keys. If it contains more in ready state, that means it still has old data and instead of ready, put it in some other state which I can take a look at later on. This would be a quick fix but till then, I will drain the pufty chassis (this is one with problems) and reset the chassis (this has fixed the problem in the past so far)

On Tue, Jan 21, 2020 at 12:00 AM František Šumšal frantisek@sumsal.cz wrote:

...

Hello,

Can confirm this issue as well. Also, some systemd jobs die with:

bash: dnf: command not found

on CentOS 8. Not sure if it's related.

On 1/20/20 12:47 PM, Katerina Foniok wrote:

...
Hello guys,

it seems that data are not cleaned correctly from the nodes. Some of our tests are failing on that. E.g. https://ci.centos.org/view/Devtools/job/devtools-rh-che-periodic-prod-2aProd... where the folder with the logs already exists. This job was run on devtools-ci-slave04.

Thank you for your time. Have a nice day!

Katka Foniok

CI-users mailing list CI-users@centos.org https://lists.centos.org/mailman/listinfo/ci-users

-- PGP Key ID: 0xFB738CE27B634E4B

CI-users mailing list CI-users@centos.org https://lists.centos.org/mailman/listinfo/ci-users

-- Vipul Siddharth He/His/Him Fedora | CentOS CI Infrastructure Team Red Hat w: vipul.dev

František Šumšal

9:28 p.m.

On 1/20/20 10:12 PM, Vipul Siddharth wrote:

...

so the issue is another one of the reinstallation problems where it rolls back to the previous version. In +Katerina Foniok's case, it contained old data and in +František Šumšal's case, it was not actually a C8 node but a C7 marked wrongly. The solution is to confirm twice (after installing the OS) in duffy.. and the way we think that would be very easy to implement would be to check number of ssh key in authorized_keys. If it contains more in ready state, that means it still has old data and instead of ready, put it in some other state which I can take a look at later on. This would be a quick fix but till then, I will drain the pufty chassis (this is one with problems) and reset the chassis (this has fixed the problem in the past so far)

Thank you very much for your analysis and prompt solution!

-- PGP Key ID: 0xFB738CE27B634E4B

Katerina Foniok

23 Jan 23 Jan

6:53 a.m.

Hello, we are still facing this problem on devtools-ci-slave04. Our job is failing on conflicts during the installation of the dependencies of our project.

Thank you for taking care of that. Have a nice day!

Katka

On Mon, Jan 20, 2020 at 10:28 PM František Šumšal frantisek@sumsal.cz wrote:

...

On 1/20/20 10:12 PM, Vipul Siddharth wrote:

...
so the issue is another one of the reinstallation problems where it rolls back to the previous version. In +Katerina Foniok's case, it contained old data and in +František Šumšal's case, it was not actually a C8 node but a C7 marked wrongly. The solution is to confirm twice (after installing the OS) in duffy.. and the way we think that would be very easy to implement would be to check number of ssh key in authorized_keys. If it contains more in ready state, that means it still has old data and instead of ready, put it in some other state which I can take a look at later on. This would be a quick fix but till then, I will drain the pufty chassis (this is one with problems) and reset the chassis (this has fixed the problem in the past so far)

Thank you very much for your analysis and prompt solution!

-- PGP Key ID: 0xFB738CE27B634E4B

Vipul Siddharth

8:13 a.m.

On Thu, Jan 23, 2020 at 12:23 PM Katerina Foniok kkanova@redhat.com wrote:

...

Hello, we are still facing this problem on devtools-ci-slave04. Our job is failing on conflicts during the installation of the dependencies of our project.

yes, since I have been travelling for DevConf, I couldn't keep a watch on marking affected nodes in admin down state

@Katerina Foniok, When you get time, could you please send me the job ids failing because of this issue? It would help me tracking down the nodes (also conforming if the issue is with other chassis as well)

...

Thank you for taking care of that. Have a nice day!

Katka

On Mon, Jan 20, 2020 at 10:28 PM František Šumšal frantisek@sumsal.cz wrote:

...
On 1/20/20 10:12 PM, Vipul Siddharth wrote:

...
so the issue is another one of the reinstallation problems where it rolls back to the previous version. In +Katerina Foniok's case, it contained old data and in +František Šumšal's case, it was not actually a C8 node but a C7 marked wrongly. The solution is to confirm twice (after installing the OS) in duffy.. and the way we think that would be very easy to implement would be to check number of ssh key in authorized_keys. If it contains more in ready state, that means it still has old data and instead of ready, put it in some other state which I can take a look at later on. This would be a quick fix but till then, I will drain the pufty chassis (this is one with problems) and reset the chassis (this has fixed the problem in the past so far)

Thank you very much for your analysis and prompt solution!

-- PGP Key ID: 0xFB738CE27B634E4B

-- Vipul Siddharth He/His/Him Fedora | CentOS CI Infrastructure Team Red Hat w: vipul.dev

Tibor Dancs

24 Jan 24 Jan

9:49 a.m.

Hello.

Do you want specific job builds as well or only the job IDs?

On Thu, Jan 23, 2020 at 9:14 AM Vipul Siddharth vipul@redhat.com wrote:

...

On Thu, Jan 23, 2020 at 12:23 PM Katerina Foniok kkanova@redhat.com wrote:

...
Hello, we are still facing this problem on devtools-ci-slave04. Our job is

failing on conflicts during the installation of the dependencies of our project. yes, since I have been travelling for DevConf, I couldn't keep a watch on marking affected nodes in admin down state

@Katerina Foniok, When you get time, could you please send me the job ids failing because of this issue? It would help me tracking down the nodes (also conforming if the issue is with other chassis as well)

...
Thank you for taking care of that. Have a nice day!

Katka

On Mon, Jan 20, 2020 at 10:28 PM František Šumšal frantisek@sumsal.cz

wrote:

...
...
On 1/20/20 10:12 PM, Vipul Siddharth wrote:

...
so the issue is another one of the reinstallation problems where it rolls back to the previous version. In +Katerina Foniok's case, it contained old data and in +František Šumšal's case, it was not actually a C8 node but a C7 marked wrongly. The solution is to confirm twice (after installing the OS) in duffy.. and the way we think that would be very easy to implement would be to check number of ssh key in authorized_keys. If it contains more in ready state, that means it still has old data and instead of ready, put it in some other state which I can take a look at later on. This would be a quick fix but till then, I will drain the pufty chassis (this is one with problems) and reset the chassis (this has fixed the problem in the past so far)

Thank you very much for your analysis and prompt solution!

-- PGP Key ID: 0xFB738CE27B634E4B

-- Vipul Siddharth He/His/Him Fedora | CentOS CI Infrastructure Team Red Hat w: vipul.dev

CI-users mailing list CI-users@centos.org https://lists.centos.org/mailman/listinfo/ci-users

Vipul Siddharth

9:54 a.m.

On Fri, 24 Jan, 2020, 10:50 am Tibor Dancs, tdancs@redhat.com wrote:

...

Hello.

Do you want specific job builds as well or only the job IDs?

Anything would do The former would save me some time but not a problem at all

...

On Thu, Jan 23, 2020 at 9:14 AM Vipul Siddharth vipul@redhat.com wrote:

...
On Thu, Jan 23, 2020 at 12:23 PM Katerina Foniok kkanova@redhat.com wrote:

...
Hello, we are still facing this problem on devtools-ci-slave04. Our job is

failing on conflicts during the installation of the dependencies of our project. yes, since I have been travelling for DevConf, I couldn't keep a watch on marking affected nodes in admin down state

@Katerina Foniok, When you get time, could you please send me the job ids failing because of this issue? It would help me tracking down the nodes (also conforming if the issue is with other chassis as well)

...
Thank you for taking care of that. Have a nice day!

Katka

On Mon, Jan 20, 2020 at 10:28 PM František Šumšal frantisek@sumsal.cz

wrote:

...
...
On 1/20/20 10:12 PM, Vipul Siddharth wrote:

...
so the issue is another one of the reinstallation problems where it rolls back to the previous version. In +Katerina Foniok's case, it contained old data and in +František Šumšal's case, it was not actually a C8 node but a C7 marked wrongly. The solution is to confirm twice (after installing the OS) in duffy.. and the way we think that would be very easy to implement would be to check number of ssh key in authorized_keys. If it contains more in ready state, that means it still has old data and instead of ready, put it in some other state which I can take a look at later on. This would be a quick fix but till then, I will drain the pufty chassis (this is one with problems) and reset the chassis (this has fixed the problem in the past so far)

Thank you very much for your analysis and prompt solution!

-- PGP Key ID: 0xFB738CE27B634E4B

-- Vipul Siddharth He/His/Him Fedora | CentOS CI Infrastructure Team Red Hat w: vipul.dev

CI-users mailing list CI-users@centos.org https://lists.centos.org/mailman/listinfo/ci-users

CI-users mailing list CI-users@centos.org https://lists.centos.org/mailman/listinfo/ci-users

Tibor Dancs

11:15 a.m.

Hello. I'm sending you the list of the jobs that we've seen the failures on and specific build IDs as well.

Jobs: https://ci.centos.org/job/devtools-rh-che-periodic-prod-2/ https://ci.centos.org/view/Devtools/job/devtools-rh-che-periodic-prod-2aProd... https://ci.centos.org/view/Devtools/job/devtools-rh-che-periodic-prod-1a/ https://ci.centos.org/view/Devtools/job/devtools-rh-che-periodic-prod-1b/ https://ci.centos.org/job/devtools-rh-che-periodic-prod-preview-2a/

These are just examples. I hope it will be useful for you.

On Fri, Jan 24, 2020 at 10:55 AM Vipul Siddharth vipul@redhat.com wrote:

...

On Fri, 24 Jan, 2020, 10:50 am Tibor Dancs, tdancs@redhat.com wrote:

...
Hello.

Do you want specific job builds as well or only the job IDs?

Anything would do The former would save me some time but not a problem at all

...
On Thu, Jan 23, 2020 at 9:14 AM Vipul Siddharth vipul@redhat.com wrote:

...
On Thu, Jan 23, 2020 at 12:23 PM Katerina Foniok kkanova@redhat.com wrote:

...
Hello, we are still facing this problem on devtools-ci-slave04. Our job is

failing on conflicts during the installation of the dependencies of our project. yes, since I have been travelling for DevConf, I couldn't keep a watch on marking affected nodes in admin down state

@Katerina Foniok, When you get time, could you please send me the job ids failing because of this issue? It would help me tracking down the nodes (also conforming if the issue is with other chassis as well)

...
Thank you for taking care of that. Have a nice day!

Katka

On Mon, Jan 20, 2020 at 10:28 PM František Šumšal frantisek@sumsal.cz

wrote:

...
...
On 1/20/20 10:12 PM, Vipul Siddharth wrote:

...
so the issue is another one of the reinstallation problems where it rolls back to the previous version. In +Katerina Foniok's case, it contained old data and in +František Šumšal's case, it was not actually a C8 node but a C7 marked wrongly. The solution is to confirm twice (after installing the OS) in

duffy..

...
...
...
and the way we think that would be very easy to implement would be

to

...
...
...
check number of ssh key in authorized_keys. If it contains more in ready state, that means it still has old data and instead of ready, put it in some other state which I can take a look at later on. This would be a quick fix but till then, I will drain the pufty chassis (this is one with problems) and reset the chassis (this has fixed the problem in the past so

far)

...
...
...
Thank you very much for your analysis and prompt solution!

-- PGP Key ID: 0xFB738CE27B634E4B

-- Vipul Siddharth He/His/Him Fedora | CentOS CI Infrastructure Team Red Hat w: vipul.dev

CI-users mailing list CI-users@centos.org https://lists.centos.org/mailman/listinfo/ci-users

CI-users mailing list CI-users@centos.org https://lists.centos.org/mailman/listinfo/ci-users

CI-users mailing list CI-users@centos.org https://lists.centos.org/mailman/listinfo/ci-users

2094

Age (days ago)

2098

Last active (days ago)

ci-users@lists.centos.org

9 comments

4 participants

tags (0)

participants (4)

František Šumšal
Katerina Foniok
Tibor Dancs
Vipul Siddharth