[CentOS] Re: pam max locked memory issue after updating to 5.2 and rebooting
Rob Lines
rlinesseagate at gmail.com
Fri Aug 8 12:10:26 UTC 2008
It has been a few days so I am sending this again incase someone has
seen this issue and might have a seen this problem or has a suggestion
of where to look and why it might not be taking these settings with
5.2 when it did with 5.1
On Mon, Aug 4, 2008 at 2:00 PM, Rob Lines <rlinesseagate at gmail.com> wrote:
> We were previously running 5.1 x86_64 and recently updated to 5.2
> using yum. Under 5.1 we were having problems when running jobs using
> torque and the solution had been to add the following items to the
> files noted
>
> "* soft memlock unlimited" in /etc/security/limits.conf
> "session required pam_limits.so" in /etc/pam.d/{rsh,sshd}
>
> This changed the max locked memory setting in ulimit as follows:
>
> Before the change
> rsh nodeX ulimit -a
> still gives us
> max locked memory (kbytes, -l) 32
>
> After the change
> rsh nodeX ulimit -a
> max locked memory (kbytes, -l) 16505400
>
> The nodes have 16gb of memory.
>
> Now after the 5.2 updates those files are all the same and on most of
> the nodes we haven't yet rebooted them due to log running processes
> but a few nodes have been restarted and now that jobs are starting to
> be put on them we are back to max locked memory of 32k rather than
> 16gb.
>
> The error we are receiving on those jobs is :
>
> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
> This will severely limit memory registrations.
> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
> This will severely limit memory registrations.
> Fatal error in MPI_Init:
> Other MPI error, error stack:
> MPIR_Init_thread(306).......: Initialization failed
> MPID_Init(113)..............: channel initialization failed
> MPIDI_CH3_Init(167).........:
> MPIDI_CH3I_RDMA_init(138)...:
> rdma_setup_startup_ring(333): cannot create cq
> Fatal error in MPI_Init:
> Other MPI error, error stack:
> MPIR_Init_thread(306).......: Initialization failed
> MPID_Init(113)..............: channel initialization failed
> MPIDI_CH3_Init(167).........:
> MPIDI_CH3I_RDMA_init(138)...:
> rdma_setup_startup_ring(333): cannot create cq
> rank 45 in job 1 nodeX_35175 caused collective abort of all ranks
> exit status of rank 45: return code 1
> rank 44 in job 1 nodeX_35175 caused collective abort of all ranks
> exit status of rank 44: return code 1
>
>
> The full output of :
>
> rsh nodeX ulimit -a
>
> connect to address x.x.x.x port 544: Connection refused
> Trying krb4 rsh...
> connect to address x.x.x.x port 544: Connection refused
> trying normal rsh (/usr/bin/rsh)
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 135168
> max locked memory (kbytes, -l) 32
> max memory size (kbytes, -m) unlimited
> open files (-n) 1024
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority (-r) 0
> stack size (kbytes, -s) 10240
> cpu time (seconds, -t) unlimited
> max user processes (-u) 135168
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
>
>
> Any ideas, suggestions or items I could roll back would be
> appreciated. I looked through the list of packages that were updated
> and the only one that I could see that was related was pam. ssh and
> rsh were not updated.
>
> Thank you,
> Rob
>
More information about the CentOS
mailing list