[CentOS] CentOS 5.5 Java Process Death

Mon Feb 14 10:54:05 UTC 2011
Martin Hewitt <martin.hewitt at gmail.com>

Hi Mark,

Over the weekend I've been testing the environment under various
circumstances, and it seems that the kill issue is not confined to one
app - it's afflicting all jars I've packaged with Eclipse.

I added in as many try...catch blocks as I could and got no useful
output, but it occurred to me that the Eclipse loader is adding in
another level of code between my application and the kernel.

Due to the fact that Eclipse uses a jar-in-jar loader to package in
classpath libraries, I'm going to be experimenting today with a
different jar packager and with executing the application without jar
packaging.

Martin

On 11 February 2011 14:13,  <m.roth at 5-cent.us> wrote:
> Martin Hewitt wrote:
>> Hi Mark,
>>
>> I've exhausted the Java avenues for debugging this issue, but, since
>> my last email, the process I pointed strace at has been killed, but
>> I'm afraid the rather raw format of the strace file is lost on me.
>> The last six lines of the ouput file are:
>>
>> clone(child_stack=0x4202a250,
>>
> At a guess, looks like it's creating a child process.
> <snip>
>> futex(0x4202a9d0, FUTEX_WAIT, 23241, NULL) = -1 EINTR (Interrupted system
>> call)
>> --- SIGHUP (Hangup) @ 0 (0) ---
>> futex(0x2ab0b620a000, FUTEX_WAKE_PRIVATE, 1) = 1
>> rt_sigreturn(0x2ab0b620a000)            = -1 EINTR (Interrupted system
>> call)
>> futex(0x4202a9d0, FUTEX_WAIT, 23241, NULL <unfinished ... exit status 129>
>>
>> The SIGHUP is new information, and appears to be what's causing the
>> java app to exit. Surely Java should be aware of the Interrupted
>> system call?
>>
>> There are no other signals in the output file, and the only EINTRs are
>> in the passage above.
>>
> Does the exit status of 129 say anything other than SIGHUP?
>
>> Looks like I need to delve back into Java...
>>
> Yeah. I think you need to try what I was suggesting: start wrapping
> function calls in try/catch, and work your way down (when you find the one
> it fails in, then go into that function, er, method, and wrap the calls in
> there (and/or even put a writeln in a few choice spots, until you find the
> exact function the SIGHUP (or whatever) is happening in.
>
>         mark "why, yes, I *was* a developer longer than I've been an admin"
>
>> Martin
>>
>> On 10 February 2011 19:37,  <m.roth at 5-cent.us> wrote:
>>> Hey, Martin,
>>>
>>> Martin Hewitt wrote:
>>>>
>>>> Thanks, I didn't know about the strace command, so that's useful.
>>>> Fortunately, this is on a dedicated server, so there's a fair amount
>>>> of free disk.
>>> <snip>
>>> If you can do the code changes (and the try/catch is *supposed* to be in
>>> there, according to java style), work your way down, y'know...
>>>
>>> main
>>>
>>> ...
>>> try {
>>> First actual call to do the job
>>> } catch
>>>   writeln error;
>>>
>>> And if it fails there, then you know; otherwise, go to the next main
>>> call,
>>> sorry, "invocation of a method"....
>>>
>>> Then again, this time in each of the main function calls under that, and
>>> step down until you find the function it's dying in. That'll give you a
>>> much better handle on what's happening.
>>>
>>>> Thanks for the help.
>>>>
>>> Good luck.
>>>
>>>        mark
>>>> Martin
>>>>
>>>> On 10 February 2011 18:58,  <m.roth at 5-cent.us> wrote:
>>>>> Martin Hewitt wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I'm running CentOS 5.5 Final, Java version "1.6.0_17" OpenJDK Runtime
>>>>>> Environment (IcedTea6 1.7.5) (rhel-1.16.b17.el5-x86_64) OpenJDK
>>>>>> 64-Bit
>>>>>> Server VM (build 14.0-b16, mixed mode) installed via Yum.
>>>>>>
>>>>>> We have a java application, packaged as a jar, running on our servers
>>>>>> which, periodically, crawls RSS feeds and writes the articles to a
>>>>>> database.
>>>>>>
>>>>>> Randomly, and seemingly without cause, these processes will die, not
>>>>>> through the application exiting, or due to my killing it, but due to
>>>>>> something that seems to kill without leaving a trace.
>>>>> <snip>
>>>>> The hard (but correct) way would be to put try {} catch in the code,
>>>>> and
>>>>> work your way down. Trying to debug it using a debugger might be real
>>>>> problematical, if you can't repeatably provoke it. I *suppose* you
>>>>> could
>>>>> attach strace to it, and dump the o/p into a file (on a filesystem
>>>>> with
>>>>> a
>>>>> *lot* of disk space)....
>>>>>
>>>>>        mark
>>>>>
>>>>> _______________________________________________
>>>>> CentOS mailing list
>>>>> CentOS at centos.org
>>>>> http://lists.centos.org/mailman/listinfo/centos
>>>>>
>>>> _______________________________________________
>>>> CentOS mailing list
>>>> CentOS at centos.org
>>>> http://lists.centos.org/mailman/listinfo/centos
>>>>
>>>
>>>
>>> _______________________________________________
>>> CentOS mailing list
>>> CentOS at centos.org
>>> http://lists.centos.org/mailman/listinfo/centos
>>>
>> _______________________________________________
>> CentOS mailing list
>> CentOS at centos.org
>> http://lists.centos.org/mailman/listinfo/centos
>>
>
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>