[CentOS] Commands failing silently?

Thu Mar 27 15:53:46 UTC 2008
Ross S. W. Walker <rwalker at medallion.com>

Dan Bongert wrote:
> Dan Bongert wrote:
> > Filipe Brandenburger wrote:
> >> Hi,
> >>
> >> On Tue, Mar 25, 2008 at 2:21 PM, Dan Bongert <dbongert at wisc.edu> wrote:
> >>>  thoth(3) /tmp> ls
> >>>
> >>>  thoth(4) /tmp> echo $?
> >>>  141
> >>
> >> 141 is SIGPIPE. If the process is killed by a signal, the return code
> >> will be 128+signal number. 141-128=13, and kill -l says: 13) SIGPIPE.
> >>
> >> SIGPIPE means that something that ls is writing to is being closed.
> >> That's really strange, and I couldn't find why.
> >>
> >> I still think strace would be the best way to trace it. Please try:
> >>
> >> # rm -f /tmp/ls-strace.txt; strace -o /tmp/ls-strace.txt -tt -s 1024
> >> -f ls --color=tty
> >>
> >> Repeat it until ls doesn't print anything. Then less your
> >> /tmp/ls-strace.txt file, you'll probably have something like +++
> >> killed by SIGPIPE +++ as the last line of it. Then try to figure out
> >> what happened before it got the SIGPIPE. Probably a "write" to
> >> something, try to figure out to which file descriptor. If you can't do
> >> it, try to post the last few lines of the file here.
> > 
> > I tried it, but as I said before, strace somehow interferes with what's 
> > going on. I wasn't able to get a program to fail via strace.
> > 
> >> Also, can you post the output of this command?
> >> # ls -la /proc/$$/fd/
> > 
> > thoth(265) /tmp> ls -la /proc/$$/fd/
> > 
> > thoth(266) /tmp> ls -la /proc/$$/fd/
> > total 5
> > dr-x------  2 dbongert dbongert  0 Mar 27 10:17 .
> > dr-xr-xr-x  3 dbongert dbongert  0 Mar 27 10:03 ..
> > lrwx------  1 dbongert dbongert 64 Mar 27 10:17 0 -> /dev/pts/0
> > lrwx------  1 dbongert dbongert 64 Mar 27 10:17 1 -> /dev/pts/0
> > lrwx------  1 dbongert dbongert 64 Mar 27 10:17 2 -> /dev/pts/0
> > lrwx------  1 dbongert dbongert 64 Mar 27 10:17 255 -> /dev/pts/0
> > lrwx------  1 dbongert dbongert 64 Mar 27 10:17 3 -> socket:[4425494]
> > 
> 
> Ok, here I am replying to myself. On a lark, I tried to strace a different 
> program, since I couldn't get strace + ls to fail. Here's the end of the 
> output from 'strace w':
> 
> connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = 0
> poll([{fd=4, events=POLLOUT|POLLERR|POLLHUP, 
> revents=POLLOUT|POLLHUP}], 1, 5000) = 1
> writev(4, [{"\2\0\0\0\1\0\0\0\2\0\0\0", 12}, {"0\0", 2}], 2) = -1 EPIPE 
> (Broken pipe)
> --- SIGPIPE (Broken pipe) @ 0 (0) ---
> +++ killed by SIGPIPE +++
> 
> Looks like a nscd problem, and disabling it seems to fix the problem.

Good stuff, actually the nscd problem may even be a symptom to an
nsswitch problem. Check to make sure you don't have a name service
enabled in /etc/nsswitch that isn't actually working.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.