On Tue, 2008-03-25 at 13:21 -0500, Dan Bongert wrote:
William L. Maltby wrote:
On Mon, 2008-03-24 at 16:19 -0500, Dan Bongert wrote:
mouss wrote:
Dan Bongert wrote:
Hello all:
<snip>
Though 'ls' was just an example -- just about any program will fail. The 'w' command will fail too:
<snip>
Hmmm... Sure it's failing? Maybe just the output is going somewhere else? After the command runs, what does "echo $?" show? Does it even work? Echo is a bash internal command, so I would expect it to never fail.
Ok, it's definitely getting an error from somewhere:
thoth(3) /tmp> ls
thoth(4) /tmp> echo $? 141
Although:
thoth(31) ~> top
"~>" ? Got me on that one.
thoth(32) ~> echo $? 0
Ditto. Although I should mention that unless you "man bash" and find the magic incantation I can't remember that gets return codes from a pipeline (if that's what "~>" is supposed to be), the return from the last command in the pipeline is what's returned. If echo is from bash, as I expected, it should not fail and should return a 0 code regardless of what happened ahead of it.
Your best tack is simplicity: one command, no pipes, just redirect output with "&>" like so
cat <your file> &>/tmp/test.out
Then you can see if the output file has greater than zero length, use vim on in (if that works), etc.
<snip possibility of serial connection>
I'm usually sshing into the machine, but I've also experienced the problem on the console.
Ssh via e'net or serial? On the console, is the failure as reliable or less frequent?
If you are on a normal console, try running the commands similart to this (trying to determine if *something* else is receiving output or not)
<your command> &> /dev/tty
if this works reliably, maybe that's a starting point.
Nope, that fails intermittently as well.
I would surmise that means that basic kernel operations are good and there is some common library routine involved.
There's a couple kernel guys who frequent this list. Maybe one of them will have a clue as to what could go wrong. Corrupted libraries and whatnot.
You might try that rpm -V command earlier against all packages (add a "a" IIRC). Maybe some library accessed by the coreutils, but which is not itself part of coreutils, is corrupt.
Hmm....when I do a 'rpm -Va', I get lots of "at least one of file's dependencies has changed since prelinking" errors. Even if I run prelink manually, and then do a 'rpm -Va' immediately afterwards.
Well, I'd "man rpm" (no, I don't hate you, but I don't do rpm stuff enough to remember it all and *I* am not going to "man rpm" unless I suddenly become quite masochistic :-), select some promising looking options and run it again, redirecting output to a file you can examine (possibly have to get it to a machine that works reliably - "man nc" someone mentioned in another thread looks like a useful tool).
You want to get the diagnostic output from rpm and see what files it complains about. The ones tagged with a "c" are config files and will often show up there. If your system hasn't been compromised, it's safe to ignore these.
Examine all the ones that were unexpectedly tagged and see if there is a pattern.
If your HDs are "smart", maybe a "smartctl -l <more params>" will identify some sectors gone bad in a critical area of your HD.
I don't have a clue why right after prelink is run the rpm would claim they had been changed, unless it's a matter of the rpm data base has not yet been updated. I don't know how it all works together. Maybe the rpm update runs at night or something?
WHERE'S THE KNOWLEDGEABLE FOLKS WHEN NEEDED? It's the blind leading the blind ATM. 8-O
HTH