[CentOS] pam max locked memory issue after updating to 5.2 and rebooting
Rob Lines
rlinesseagate at gmail.comMon Aug 4 18:00:28 UTC 2008
- Previous message: [CentOS] Outbound connections not using primary eth0 IP
- Next message: [CentOS] Re: pam max locked memory issue after updating to 5.2 and rebooting
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
We were previously running 5.1 x86_64 and recently updated to 5.2
using yum. Under 5.1 we were having problems when running jobs using
torque and the solution had been to add the following items to the
files noted
"* soft memlock unlimited" in /etc/security/limits.conf
"session required pam_limits.so" in /etc/pam.d/{rsh,sshd}
This changed the max locked memory setting in ulimit as follows:
Before the change
rsh nodeX ulimit -a
still gives us
max locked memory (kbytes, -l) 32
After the change
rsh nodeX ulimit -a
max locked memory (kbytes, -l) 16505400
The nodes have 16gb of memory.
Now after the 5.2 updates those files are all the same and on most of
the nodes we haven't yet rebooted them due to log running processes
but a few nodes have been restarted and now that jobs are starting to
be put on them we are back to max locked memory of 32k rather than
16gb.
The error we are receiving on those jobs is :
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(306).......: Initialization failed
MPID_Init(113)..............: channel initialization failed
MPIDI_CH3_Init(167).........:
MPIDI_CH3I_RDMA_init(138)...:
rdma_setup_startup_ring(333): cannot create cq
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(306).......: Initialization failed
MPID_Init(113)..............: channel initialization failed
MPIDI_CH3_Init(167).........:
MPIDI_CH3I_RDMA_init(138)...:
rdma_setup_startup_ring(333): cannot create cq
rank 45 in job 1 nodeX_35175 caused collective abort of all ranks
exit status of rank 45: return code 1
rank 44 in job 1 nodeX_35175 caused collective abort of all ranks
exit status of rank 44: return code 1
The full output of :
rsh nodeX ulimit -a
connect to address x.x.x.x port 544: Connection refused
Trying krb4 rsh...
connect to address x.x.x.x port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 135168
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 135168
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Any ideas, suggestions or items I could roll back would be
appreciated. I looked through the list of packages that were updated
and the only one that I could see that was related was pam. ssh and
rsh were not updated.
Thank you,
Rob
- Previous message: [CentOS] Outbound connections not using primary eth0 IP
- Next message: [CentOS] Re: pam max locked memory issue after updating to 5.2 and rebooting
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the CentOS mailing list