Hi Mark, I've exhausted the Java avenues for debugging this issue, but, since my last email, the process I pointed strace at has been killed, but I'm afraid the rather raw format of the strace file is lost on me. The last six lines of the ouput file are: clone(child_stack=0x4202a250, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x4202a9d0, tls=0x4202a940, child_tidptr=0x4202a9d0) = 23241 futex(0x4202a9d0, FUTEX_WAIT, 23241, NULL) = -1 EINTR (Interrupted system call) --- SIGHUP (Hangup) @ 0 (0) --- futex(0x2ab0b620a000, FUTEX_WAKE_PRIVATE, 1) = 1 rt_sigreturn(0x2ab0b620a000) = -1 EINTR (Interrupted system call) futex(0x4202a9d0, FUTEX_WAIT, 23241, NULL <unfinished ... exit status 129> The SIGHUP is new information, and appears to be what's causing the java app to exit. Surely Java should be aware of the Interrupted system call? There are no other signals in the output file, and the only EINTRs are in the passage above. Looks like I need to delve back into Java... Martin On 10 February 2011 19:37, <m.roth at 5-cent.us> wrote: > Hey, Martin, > > Martin Hewitt wrote: >> >> Thanks, I didn't know about the strace command, so that's useful. >> Fortunately, this is on a dedicated server, so there's a fair amount >> of free disk. > <snip> > If you can do the code changes (and the try/catch is *supposed* to be in > there, according to java style), work your way down, y'know... > > main > > ... > try { > First actual call to do the job > } catch > writeln error; > > And if it fails there, then you know; otherwise, go to the next main call, > sorry, "invocation of a method".... > > Then again, this time in each of the main function calls under that, and > step down until you find the function it's dying in. That'll give you a > much better handle on what's happening. > >> Thanks for the help. >> > Good luck. > > mark >> Martin >> >> On 10 February 2011 18:58, <m.roth at 5-cent.us> wrote: >>> Martin Hewitt wrote: >>>> Hi all, >>>> >>>> I'm running CentOS 5.5 Final, Java version "1.6.0_17" OpenJDK Runtime >>>> Environment (IcedTea6 1.7.5) (rhel-1.16.b17.el5-x86_64) OpenJDK 64-Bit >>>> Server VM (build 14.0-b16, mixed mode) installed via Yum. >>>> >>>> We have a java application, packaged as a jar, running on our servers >>>> which, periodically, crawls RSS feeds and writes the articles to a >>>> database. >>>> >>>> Randomly, and seemingly without cause, these processes will die, not >>>> through the application exiting, or due to my killing it, but due to >>>> something that seems to kill without leaving a trace. >>> <snip> >>> The hard (but correct) way would be to put try {} catch in the code, and >>> work your way down. Trying to debug it using a debugger might be real >>> problematical, if you can't repeatably provoke it. I *suppose* you could >>> attach strace to it, and dump the o/p into a file (on a filesystem with >>> a >>> *lot* of disk space).... >>> >>> mark >>> >>> _______________________________________________ >>> CentOS mailing list >>> CentOS at centos.org >>> http://lists.centos.org/mailman/listinfo/centos >>> >> _______________________________________________ >> CentOS mailing list >> CentOS at centos.org >> http://lists.centos.org/mailman/listinfo/centos >> > > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos >