[CentOS] RAID level and killing a job

Tue Jan 13 18:53:27 UTC 2009
William L. Maltby <CentOS4Bill at triad.rr.com>

On Tue, 2009-01-13 at 10:04 -0800, nate wrote:
> <snip>

> > Second question - A newly installed server consisting of CentOS 5.2,
> > straight
> > off the DVD, I invoke a command by hand, realize I want to kill it soon
> > after
> > (logged in as root).  I issue ps auwx|grep name_of_command, get the PID, and
> > issue kill -9 PID.  ps auwx|grep name_of_command is still running.
> >
> > The command is NOT part of any scheduled job.    Why won't the process die?
> 
> Is the process state "D" or "Z" ? Frequently either of these states
> can trigger an unkillable process. Sometimes "Z" (zombies) can be
> killed but often times they can't be directly killed. And if the
> process is in "D" then it is stuck waiting for I/O(most often) and
> you have to wait for it to complete, or reboot, sometimes going to
> single user mode and back again works as well, and sometimes killing
> other processes that the stuck one depends on can sometimes free it
> up so it can die.

It's been a long time, so please forgive any FUD here.

IIRC, zombies are processes that have ended but can not be "cleaned up".
This can happen when a parent has died before the child ends, when a
parent exists but is "sleeping" (for whatever reason: it may be waiting
on another event, waiting for I/O that never completes, ...).

IIRC, when the parent has died, then PPID you'll see is "1". But the
zombie will still be un-killable because it can not complete the
termination process (the parent it has to notify no longer exists). I
can't recall any way to eliminate these with a re-boot. I can't recall
if I ever tried a telinit to see if run level changes would kill it. I
suspect not.

If the parent exists and signals are not disabled or otherwise handled,
the killing of the parent may cause the zombie to flee. This is most
common, IIRC, when the parent is awating an event notification.

If a "clean" termination is desired, the parent must support some signal
processing, e.g. SIGHUP, SIGUSER1, ... (man 7 signal). If it does, then
things like removal of temporary files and telling the children to "STOP
THAT" can be done.

That's all I can recall without some actual work.

> 
> If the process is zombied you can try to find the parent process (if
> there is one) with ps -efx, and kill that sometimes that can cause
> the child to die as well, doesn't always work though.

> nate
> <snip sig stuff>

HTH
-- 
Bill