[CentOS] Strange happening with new users and keyed access on diskless cluster

Joseph Norris jnorris at ucmerced.edu
Tue Nov 23 19:22:21 UTC 2010

Hello to all,

I have been battling this situation now for 3 days and still have not 
found a resolution.  I appeal to any and all for help.

Here are the facts as far as I can tell.

1)   I moved a 66 node rocks based cluster to a diskless cluster using 
the latest version of Centos and all updates in place.
2)   users are added with home directory mounted across the nodes on the 
cluster so  a user's home directory would sit on /export/home with sym 
link from home on head node
3)  sshkey-gen used to create public/private key in .ssh so that user 
can have keyless access to all nodes, allowing sge jobs to run across nodes.

( if I have left any information out above - please let me know )

So here is the problem:

All of my current user base has had not issues what so ever with current 
arrangement and up to about a month ago I could create a user, use the 
the script that I have to build the keys and do an expect script out to 
each node answering yes to add access to known_hosts etc.... 
permissions correctly set on user .ssh directory and files.

Now the problem:

I build a new user - run my script and the users when ssh c33 ( name of 
a node ) gets a password challenge.  I dink with the sshd_config for the 
nodes and not matter what I do I keep getting the password challenge or 
a permission error on publickey.

I have done the ssh -vvv c33 and get "sending packet" but no return from 
openssh and then it defaults out to next method with no results.

I have checked permissions with users that have no issue and 
as-far-as-I-can tell there is not issue.

Now here is the real strange thing:

I take a user that is already been on the system with no issues and 
delete the .ssh directory.  Then I re-run my keybuilder bash script 
rebuilding the keys and setting the known_hosts and I get seamless ssh 
to all nodes.  With the new users in the last month, do the same thing 
and I get the issues above.

I am totally confused.  Has something changed with an update?  Do I need 
to do something different with the build of a new user that I did not 
have to do before? Do I have to do something in particular with my 
sshd_config file?

Please give me any and all observations - I really need to resolve this 


Joseph Norris
Applications Developer/Server Admin

