mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <benjamin.mah...@gmail.com>
Subject Re: OOM not always detected by Mesos Slave
Date Tue, 02 Sep 2014 18:43:41 GMT
Looks like you're using the JVM, can you set all of your JVM flags to limit
the memory consumption? This would favor an OutOfMemoryError instead of
OOMing the cgroup.


On Thu, Aug 28, 2014 at 5:51 AM, Whitney Sorenson <wsorenson@hubspot.com>
wrote:

> Recently, I've seen at least one case where a process inside of a task
> inside of a cgroup exceeded memory limits and the process was killed
> directly. The executor recognized the process was killed and sent a
> TASK_FAILED. However, it seems far more common to see the executor process
> itself destroyed and the mesos slave (I'm making some assumptions here
> about how it all works) sends a TASK_FAILED which includes information
> about the memory usage.
>
> Is there something we can do to make this behavior more consistent?
>
> Alternatively, can we provide some functionality to hook into so we don't
> need to duplicate the work of the mesos slave in order to provide the same
> information in the TASK_FAILED message? I think users would like to know
> definitively that the task OOM'd, whereas in the case where the underlying
> task is killed it may take a lot of digging to find the underlying cause if
> you aren't looking for it.
>
> -Whitney
>
> Here are relevant lines from messages in case something else is amiss:
>
> Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067321] Task in
> /mesos/2dda5398-6aa6-49bb-8904-37548eae837e killed as a result of limit of
> /mesos/2dda5398-6aa6-49bb-8904-37548eae837e
> Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067334] memory: usage
> 917420kB, limit 917504kB, failcnt 106672
> Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.066947] java7 invoked
> oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
>
>
>
>

Mime
View raw message