The root cause turned out to be high heap. The Linux OOM Killer (http://linux-mm.org/OOM_Killer) killed the process. It took some time to figure out but very interesting. We knew high heap is a problem but had no clue when the actual heap usage was well within limit and the process disappeared. syslog helped figure this out.

About Linux OOM Killer
"It is the job of the linux 'oom killer' to sacrifice one or more processes in order to free up memory for the system when all else fails"


On Thu, Jan 2, 2014 at 10:38 AM, Robert Coli <rcoli@eventbrite.com> wrote:
On Thu, Jan 2, 2014 at 8:13 AM, Narendra Sharma <narendra.sharma@gmail.com> wrote:

8 node cluster running in aws. Any pointers where I should start looking?
No kill -9 in history.

You should start looking at instructions as to how to upgrade to at least the top of the 1.1 line... :D

=Rob



--
Narendra Sharma
Software Engineer
http://www.aeris.com
http://narendrasharma.blogspot.com/