mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kone <vinodk...@gmail.com>
Subject Re: Dealing with "run away" task processes after executor terminates
Date Wed, 04 Jun 2014 00:44:49 GMT
+Jie,Ian

Not sure if you've talked to Ian Downes and/or Jie Yu regarding this but
they were discussing the same issue (offline) today.

Just to be sure, if you are using cgroups, the mesos slave will cleanup the
container (and all its processes) when an executor exits. Now there is
definitely a race here, mesos might release the resource to framework
before the container is destroyed. We'll try to fix that really soon. I'll
let Jie/Ian chime in regarding fixes/tickets.


On Tue, Jun 3, 2014 at 4:25 PM, Sharma Podila <spodila@netflix.com> wrote:

> When a framework executor terminates, Mesos sends TASK_LOST status updates
> for tasks that were running. However, if a task had processes that do not
> terminate when the executor dies, then we have a problem since Mesos
> considers the slave resources assigned to those tasks as released. Where
> as, the task processes are running without releasing those resources.
>
> While it is a good practice for the task processes to exit when their
> executor dies, I am not sure that can be guaranteed. I am wondering how
> others are dealing with such "illegal" processes - that is, processes that
> once belonged to Mesos run tasks but not anymore.
>
> Conceivably, a per-slave reaper/GC process can periodically scan the
> slave's process tree to ensure all processes are 'legal'. Assuming that
> such a reaper exists (and could be tricky in a multi-framework environment)
> on the slave and is not risky in killing illegal processes, there is still
> the time window left until the reaper completes its next clean up routine.
> In the mean time, new tasks can land and fail trying to use a resource that
> was assumed to be free by Mesos. Especially problematic for ports. Not as
> much for CPU and memory.
>
> Would love to hear thoughts on how you are handling this scenario.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message