Hi, folks,
The other admin updated torque without testing it on one machine, and we had Issues. The first I knew was when a user reported qstat returning socket_connect_unix failed: 15137 socket_connect_unix failed: 15137 socket_connect_unix failed: 15137 qstat: cannot connect to server (null) (errno=15137) could not connect to trqauthd
Attempting to restart the pbs_server did the same. Working with my manager, we found: a) torque had been updated from 2.x to 4.2.10, which is huge. b) Apparently, it no longer uses munged. Instead, it uses trqauthd, and that wasn't in the updated packages. c) We could not downgrade!!! d) My manager updated from testing, and installed, and then running trqauthd, and restarting pbs_server, it appears to be working again.
Should I be filing a bug report?
mark
On 05/27/2015 09:07 AM, m.roth@5-cent.us wrote:
Hi, folks,
The other admin updated torque without testing it on one machine, and we had Issues. The first I knew was when a user reported qstat returning socket_connect_unix failed: 15137 socket_connect_unix failed: 15137 socket_connect_unix failed: 15137 qstat: cannot connect to server (null) (errno=15137) could not connect to trqauthd
Attempting to restart the pbs_server did the same. Working with my manager, we found: a) torque had been updated from 2.x to 4.2.10, which is huge. b) Apparently, it no longer uses munged. Instead, it uses trqauthd, and that wasn't in the updated packages. c) We could not downgrade!!! d) My manager updated from testing, and installed, and then running trqauthd, and restarting pbs_server, it appears to be working again.
Should I be filing a bug report?
You don not mention which version of CentOS you are using, but for CentOS-7 ..
The only torque I see is in epel-testing (which is their unstable branch) .. I would think that is the list for this discussion. Or did it come from somewhere else?
Not that I mind it being discussed here too .. but you might get better results there.
Thanks, Johnny Hughes
Johnny Hughes wrote:
On 05/27/2015 09:07 AM, m.roth@5-cent.us wrote:
Hi, folks,
The other admin updated torque without testing it on one machine, and we had Issues. The first I knew was when a user reported qstat returning socket_connect_unix failed: 15137 socket_connect_unix failed: 15137 socket_connect_unix failed: 15137 qstat: cannot connect to server (null) (errno=15137) could not connect to trqauthd
Attempting to restart the pbs_server did the same. Working with my manager, we found: a) torque had been updated from 2.x to 4.2.10, which is huge. b) Apparently, it no longer uses munged. Instead, it uses trqauthd, and that wasn't in the updated packages. c) We could not downgrade!!! d) My manager updated from testing, and installed, and then running trqauthd, and restarting pbs_server, it appears to be working again.
Should I be filing a bug report?
You don not mention which version of CentOS you are using, but for CentOS-7 ..
Sorry, it's 6.6.
The only torque I see is in epel-testing (which is their unstable branch) .. I would think that is the list for this discussion. Or did it come from somewhere else?
Not that I mind it being discussed here too .. but you might get better results there.
Thanks, Johnny. I *just* posted an apology, that I realized it was an EPEL issue.... Talk about an "upgrade disaster"! I think the other admin - he's been here less than a year, is coming to understand why I'm paranoid about some updates, and why we roll out some things stepwise, testing it first.... I see he updated firefox & t-bird; I'm guessing that the most current fixes the updates that broke language, etc, a week or two ago.
mark
On Wed, May 27, 2015 9:46 am, m.roth@5-cent.us wrote:
Johnny Hughes wrote:
On 05/27/2015 09:07 AM, m.roth@5-cent.us wrote:
Hi, folks,
The other admin updated torque without testing it on one machine, and we had Issues. The first I knew was when a user reported qstat returning socket_connect_unix failed: 15137 socket_connect_unix failed: 15137 socket_connect_unix failed: 15137 qstat: cannot connect to server (null) (errno=15137) could not connect to trqauthd
Attempting to restart the pbs_server did the same. Working with my manager, we found: a) torque had been updated from 2.x to 4.2.10, which is huge. b) Apparently, it no longer uses munged. Instead, it uses trqauthd, and that wasn't in the updated packages. c) We could not downgrade!!! d) My manager updated from testing, and installed, and then running trqauthd, and restarting pbs_server, it appears to be working again.
Should I be filing a bug report?
You don not mention which version of CentOS you are using, but for CentOS-7 ..
Sorry, it's 6.6.
The only torque I see is in epel-testing (which is their unstable branch) .. I would think that is the list for this discussion. Or did it come from somewhere else?
Not that I mind it being discussed here too .. but you might get better results there.
Thanks, Johnny. I *just* posted an apology, that I realized it was an EPEL issue.... Talk about an "upgrade disaster"! I think the other admin - he's been here less than a year, is coming to understand why I'm paranoid about some updates, and why we roll out some things stepwise, testing it first.... I see he updated firefox
<rant> Speaking of which... the last update ("release" or "upgrade" I probably should say, as it is latest version on FreeBSD and on MS Windows I will mean here) has nasty changes: it started "blocking downloads that potentially contain virus or spyware". It blocked download of pdfforge.org's PDF Creator for me (the last I use for ages). This was last drop for me in my over 4 years long search for replacement for Firefox. So, I'm switching away from it. Leaving alone google chrome (which I have my reservations about which I don't like to go into, so chrome is out of my consideration), I'm switching to vivaldi on Windows and Linux and to midori on FreeBSD (and Linux maybe instead of vivaldi)... </rant>
Valeri
& t-bird; I'm guessing that the most current fixes the updates that broke language, etc, a week or two ago.
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev wrote:
On Wed, May 27, 2015 9:46 am, m.roth@5-cent.us wrote:
Johnny Hughes wrote:
On 05/27/2015 09:07 AM, m.roth@5-cent.us wrote:
<snip>
Thanks, Johnny. I *just* posted an apology, that I realized it was an EPEL issue.... Talk about an "upgrade disaster"! I think the other admin - he's been here less than a year, is coming to understand why I'm paranoid about some updates, and why we roll out some things stepwise, testing it first.... I see he updated firefox
<rant> Speaking of which... the last update ("release" or "upgrade" I probably should say, as it is latest version on FreeBSD and on MS Windows I will mean here) has nasty changes: it started "blocking downloads that potentially contain virus or spyware". It blocked download of
<snip> Speaking of firefox, does anyone know how to make that idiocy of "Hi! It looks like you haven't run firefox in a while! Want to restart to a New Look?!!!" that shows up every time I log in, and doesn't seem to have any "no, don't bother me, ever" option?
mark
On Wed, May 27, 2015 10:46 am, m.roth@5-cent.us wrote:
Valeri Galtsev wrote:
On Wed, May 27, 2015 9:46 am, m.roth@5-cent.us wrote:
Johnny Hughes wrote:
On 05/27/2015 09:07 AM, m.roth@5-cent.us wrote:
<snip> >> >> Thanks, Johnny. I *just* posted an apology, that I realized it was an >> EPEL issue.... Talk about an "upgrade disaster"! I think the other >> admin - >> he's been here less than a year, is coming to understand why I'm >> paranoid >> about some updates, and why we roll out some things stepwise, testing >> it >> first.... I see he updated firefox > > <rant> > Speaking of which... the last update ("release" or "upgrade" I probably > should say, as it is latest version on FreeBSD and on MS Windows I will > mean here) has nasty changes: it started "blocking downloads that > potentially contain virus or spyware". It blocked download of <snip> Speaking of firefox, does anyone know how to make that idiocy of "Hi! It looks like you haven't run firefox in a while! Want to restart to a New Look?!!!" that shows up every time I log in, and doesn't seem to have any "no, don't bother me, ever" option?
My way of doing that (which took me about 5 years or so, and it _is_ painful) was: switch away from Firefox. I happen to know the guy in person (who was student here) who went to Mozilla Foundation as a production director. He didn't impress me as far as his attitude to software code is concerned. So, I wasn't much surprised when shortly after he became production director noticeable change at Mozilla foundation happened. And as things don't promise to change to decency I finally (5 or 6 years was long way to this decision!) gave up at least on their browser.
Valeri
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
m.roth@5-cent.us wrote:
Hi, folks,
The other admin updated torque without testing it on one machine, and we had Issues. The first I knew was when a user reported qstat returning socket_connect_unix failed: 15137 socket_connect_unix failed: 15137 socket_connect_unix failed: 15137 qstat: cannot connect to server (null) (errno=15137) could not connect to trqauthd
Attempting to restart the pbs_server did the same. Working with my manager, we found: a) torque had been updated from 2.x to 4.2.10, which is huge. b) Apparently, it no longer uses munged. Instead, it uses trqauthd, and that wasn't in the updated packages. c) We could not downgrade!!! d) My manager updated from testing, and installed, and then running trqauthd, and restarting pbs_server, it appears to be working again.
Should I be filing a bug report?
Sorry, realized after I posted that it's a package from epel. Which, of course, is part of fedora. Ah, how I love fedora...NOT.
mark
Mark, You might really want to compile torque from source (into an RPM if you'd like) and redistribute that. Every version is a little wonky and those of us that use(d) it often will poke around until we find a version / patch-set that makes us happy and stick with that for a bit. It's not an exact science and newer / higher versions are not always better.
As for the downgrade comment: Perhaps you can't, but, Torque, when it's down, doesn't really hold any state besides the configuration (queues and such), so you should be able to extract that, completely uninstall torque, and reinstall whatever version you want. If 2.x works for you, grab the latest from source, build it, reinstall and throw the config back in.
Hope this helps a little. -Zach (I don't read often, so I might go AWOL)
On Wed, May 27, 2015 at 10:07 AM, m.roth@5-cent.us wrote:
Hi, folks,
The other admin updated torque without testing it on one machine, and we had Issues. The first I knew was when a user reported qstat returning socket_connect_unix failed: 15137 socket_connect_unix failed: 15137 socket_connect_unix failed: 15137 qstat: cannot connect to server (null) (errno=15137) could not connect to trqauthd
Attempting to restart the pbs_server did the same. Working with my manager, we found: a) torque had been updated from 2.x to 4.2.10, which is huge. b) Apparently, it no longer uses munged. Instead, it uses trqauthd, and that wasn't in the updated packages. c) We could not downgrade!!! d) My manager updated from testing, and installed, and then running trqauthd, and restarting pbs_server, it appears to be working again.
Should I be filing a bug report?
mark
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Wed, May 27, 2015 10:55 am, Zachary Giles wrote:
Mark, You might really want to compile torque from source (into an RPM if you'd like) and redistribute that. Every version is a little wonky and those of us that use(d) it often will poke around until we find a version / patch-set that makes us happy and stick with that for a bit. It's not an exact science and newer / higher versions are not always better.
My experience exactly. We used version 2 for quite a while. Never managed to upgrade to version 3 (tried a few times, but didn't invest much of effort). Then we went directly to version 4. Starting trqauthd was the most notable difference. We never use rpms, we just compile torque on master and compute nodes. Compilation is always so straightforward, and never failed, so we didn't bother to package it...
Valeri
As for the downgrade comment: Perhaps you can't, but, Torque, when it's down, doesn't really hold any state besides the configuration (queues and such), so you should be able to extract that, completely uninstall torque, and reinstall whatever version you want. If 2.x works for you, grab the latest from source, build it, reinstall and throw the config back in.
Hope this helps a little. -Zach (I don't read often, so I might go AWOL)
On Wed, May 27, 2015 at 10:07 AM, m.roth@5-cent.us wrote:
Hi, folks,
The other admin updated torque without testing it on one machine, and we had Issues. The first I knew was when a user reported qstat returning socket_connect_unix failed: 15137 socket_connect_unix failed: 15137 socket_connect_unix failed: 15137 qstat: cannot connect to server (null) (errno=15137) could not connect to trqauthd
Attempting to restart the pbs_server did the same. Working with my manager, we found: a) torque had been updated from 2.x to 4.2.10, which is huge. b) Apparently, it no longer uses munged. Instead, it uses trqauthd, and that wasn't in the updated packages. c) We could not downgrade!!! d) My manager updated from testing, and installed, and then running trqauthd, and restarting pbs_server, it appears to be working again.
Should I be filing a bug report?
mark
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
-- Zach Giles zgiles@gmail.com _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev wrote:
On Wed, May 27, 2015 10:55 am, Zachary Giles wrote:
Mark, You might really want to compile torque from source (into an RPM if you'd like) and redistribute that. Every version is a little wonky and those of us that use(d) it often will poke around until we find a version / patch-set that makes us happy and stick with that for a bit. It's not an exact science and newer / higher versions are not always better.
My experience exactly. We used version 2 for quite a while. Never managed to upgrade to version 3 (tried a few times, but didn't invest much of effort). Then we went directly to version 4. Starting trqauthd was the most notable difference. We never use rpms, we just compile torque on master and compute nodes. Compilation is always so straightforward, and never failed, so we didn't bother to package it...
No. Not going to compile unless there's *no* other way. We've got...five? six? clusters or systems using torque. We've also got over 170 workstations and servers, most are getting up there, and there's me, the other admin, and our manager, who's at another Institute most of the time. Frequently, I feel like a one-armed paperhanger. We almost *never* do something like that; instead we build our own rpm. And doing that can range from one package that the folks knew what they were doing, and took a couple hours, to the horror of bioperl, which, on and off, took something over a month. *shudder* I haven't had to update that, happily.
What disturbs me most is going up *two* releases, and all *within* one version of the o/s. Upstream release an update to the umich package that jumped a full release or two, and several of our senior researchers were dead in the water, till we figured out what happened.
That ain't my idea of "enterprise".
As for the downgrade comment: Perhaps you can't, but, Torque, when it's down, doesn't really hold any state besides the configuration (queues and such), so you should be able to extract that, completely uninstall torque, and reinstall whatever version you want. If 2.x works for you, grab the latest from source, build it, reinstall and throw the config back in.
My manager managed to downgrade - we've got a local mirror, and there's backups.
Hope this helps a little. -Zach (I don't read often, so I might go AWOL)
Thanks.
mark