[CentOS] serious problem with torque

Wed May 27 16:12:14 UTC 2015
Valeri Galtsev <galtsev at kicp.uchicago.edu>

On Wed, May 27, 2015 10:55 am, Zachary Giles wrote:
> Mark, You might really want to compile torque from source (into an RPM
> if you'd like) and redistribute that. Every version is a little wonky
> and those of us that use(d) it often will poke around until we find a
> version / patch-set that makes us happy and stick with that for a bit.
> It's not an exact science and newer / higher versions are not always
> better.

My experience exactly. We used version 2 for quite a while. Never managed
to upgrade to version 3 (tried a few times, but didn't invest much of
effort). Then we went directly to version 4. Starting trqauthd was the
most notable difference. We never use rpms, we just compile torque on
master and compute nodes. Compilation is always so straightforward, and
never failed, so we didn't bother to package it...

Valeri

>
> As for the downgrade comment: Perhaps you can't, but, Torque, when
> it's down, doesn't really hold any state besides the configuration
> (queues and such), so you should be able to extract that, completely
> uninstall torque, and reinstall whatever version you want. If 2.x
> works for you, grab the latest from source, build it, reinstall and
> throw the config back in.
>
> Hope this helps a little.
> -Zach
> (I don't read often, so I might go AWOL)
>
>
>
>
> On Wed, May 27, 2015 at 10:07 AM,  <m.roth at 5-cent.us> wrote:
>> Hi, folks,
>>
>>    The other admin updated torque without testing it on one machine, and
>> we had Issues. The first I knew was when a user reported qstat
>> returning
>> socket_connect_unix failed: 15137
>> socket_connect_unix failed: 15137
>> socket_connect_unix failed: 15137
>> qstat: cannot connect to server (null) (errno=15137) could not connect
>> to
>> trqauthd
>>
>> Attempting to restart the pbs_server did the same. Working with my
>> manager, we found:
>>   a) torque had been updated from 2.x to 4.2.10, which is huge.
>>   b) Apparently, it no longer uses munged. Instead, it uses trqauthd,
>> and
>> that wasn't
>>         in the updated packages.
>>   c) We could not downgrade!!!
>>   d) My manager updated from testing, and installed, and then running
>> trqauthd, and
>>         restarting pbs_server, it appears to be working again.
>>
>> Should I be filing a bug report?
>>
>>        mark
>>
>> _______________________________________________
>> CentOS mailing list
>> CentOS at centos.org
>> http://lists.centos.org/mailman/listinfo/centos
>
>
>
> --
> Zach Giles
> zgiles at gmail.com
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>


++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++