[CentOS] Commands failing silently?

On Tue, 2008-03-25 at 13:21 -0500, Dan Bongert wrote:
> William L. Maltby wrote:
> > On Mon, 2008-03-24 at 16:19 -0500, Dan Bongert wrote:
> >> mouss wrote:
> >>> Dan Bongert wrote:
> >>>> Hello all:
> >>>>
> >>>> <snip>
> > 
> > 
> >> Though 'ls' was just an example -- just about any program will fail. The 'w'
> >> command will fail too:
> >>
> >> <snip>

> > 
> > Hmmm... Sure it's failing? Maybe just the output is going somewhere
> > else? After the command runs, what does "echo $?" show? Does it even
> > work? Echo is a bash internal command, so I would expect it to never
> > fail.
> 
> Ok, it's definitely getting an error from somewhere:
> 
> thoth(3) /tmp> ls
> 
> thoth(4) /tmp> echo $?
> 141
> 
> Although:
> 
> thoth(31) ~> top

"~>" ? Got me on that one.

> 
> 
> thoth(32) ~> echo $?
> 0

Ditto. Although I should mention that unless you "man bash" and find the
magic incantation I can't remember that gets return codes from a
pipeline (if that's what "~>" is supposed to be), the return from the
last command in the pipeline is what's returned. If echo is from bash,
as I expected, it should not fail and should return a 0 code regardless
of what happened ahead of it.

Your best tack is simplicity: one command, no pipes, just redirect
output with "&>" like so

   cat <your file> &>/tmp/test.out

Then you can see if the output file has greater than zero length, use
vim on in (if that works), etc.

> <snip possibility of serial connection>

> I'm usually sshing into the machine, but I've also experienced the problem
> on the console.

Ssh via e'net or serial? On the console, is the failure as reliable or
less frequent?

> > If you are on a normal console, try running the commands similart to
> > this (trying to determine if *something* else is receiving output or
> > not)
> > 
> >     <your command> &> /dev/tty
> > 
> > if this works reliably, maybe that's a starting point.
> 
> Nope, that fails intermittently as well.

I would surmise that means that basic kernel operations are good and
there is some common library routine involved.

> 
> > There's a couple kernel guys who frequent this list. Maybe one of them
> > will have a clue as to what could go wrong. Corrupted libraries and
> > whatnot.
> > 
> > You might try that rpm -V command earlier against all packages (add a
> > "a" IIRC). Maybe some library accessed by the coreutils, but which is
> > not itself part of coreutils, is corrupt.
> 
> Hmm....when I do a 'rpm -Va', I get lots of "at least one of file's
> dependencies has changed since prelinking" errors. Even if I run prelink
> manually, and then do a 'rpm -Va' immediately afterwards.

Well, I'd "man rpm" (no, I don't hate you, but I don't do rpm stuff
enough to remember it all and *I* am not going to "man rpm" unless I
suddenly become quite masochistic :-), select some promising looking
options and run it again, redirecting output to a file you can examine
(possibly have to get it to a machine that works reliably - "man nc"
someone mentioned in another thread looks like a useful tool).

You want to get the diagnostic output from rpm and see what files it
complains about. The ones tagged with a "c" are config files and will
often show up there. If your system hasn't been compromised, it's safe
to ignore these.

Examine all the ones that were unexpectedly tagged and see if there is a
pattern.

If your HDs are "smart", maybe a "smartctl -l <more params>" will
identify some sectors gone bad in a critical area of your HD.

I don't have a clue why right after prelink is run the rpm would claim
they had been changed, unless it's a matter of the rpm data base has not
yet been updated. I don't know how it all works together. Maybe the rpm
update runs at night or something?

WHERE'S THE KNOWLEDGEABLE FOLKS WHEN NEEDED? It's the blind leading the
blind ATM.  8-O

HTH
-- 
Bill