[CentOS] another bizarre thing...

Tue Aug 6 15:49:29 UTC 2019

Fred Smith wrote:
> 
> Hi all!
> 
> I'm stuck on something really bizarre that is happening to a product
> I "own" at work. It's a C program, built on CentOS, runs on CentOs or
> RHEL, has been in circulation since the early 00's, is in use at
> hundreds of sites.
> 
> recently, at multiple customer sites it has started just going away.
> no core file (yes, ulimit is configured), nothing in any of its
> (several) log files. it's just gone.
> 
> running it under strace until it dies reveals that every thread has
> been given a SIGKILL.
> 
> How does one figure out who deliverd a SIGKILL? For other, non-fatal,
> signals it is possible to glean the PID of the sending process in a
> signal  handler, but obviously you can't do that for SIGKILL because
> the app doesn't survive the signal.
> 
> I'm grasping at straws here, and am open to almost any kind of
> suggestion that can be followed-up (as compared to "beats me" which
> is where I am now).
> 
> I'm even wondering if systemd has something to do with it.

I had an issue a few years ago where 'something' was killing processes - 
I found it by writing a simple LD_PRELOAD hack that intercepted kill(2) 
and logged what is was doing via syslog before doing the actual kill - 
and used /etc/ld.so.preload to get it loaded by every process ...

James Pearson