The root cause turned out to be high heap. The Linux OOM Killer ( killed the process. It took some time to figure out but very interesting. We knew high heap is a problem but had no clue when the actual heap usage was well within limit and the process disappeared. syslog helped figure this out.

About Linux OOM Killer
"It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails"

8 node cluster running in aws. Any pointers where I should start looking?
No kill -9 in history.

You should start looking at instructions as to how to upgrade to at least the top of the 1.1 line... :D


