mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Tasks may be explicitly dropped by agent in Mesos 1.5
Date Fri, 02 Mar 2018 00:22:48 GMT
Put another way, we currently don't guarantee in-order task delivery to the
executor. Due to the changes for MESOS-1720, one special case of task
re-ordering now leads to the re-ordered task being dropped (rather than
delivered out-of-order as before). Technically, this is strictly better.

However, we'd like to start guaranteeing in-order task delivery.

On Thu, Mar 1, 2018 at 2:56 PM, Meng Zhu <mzhu@mesosphere.com> wrote:

> Hi all:
>
> TLDR: In Mesos 1.5, tasks may be explicitly dropped by the agent
> if all three conditions are met:
> (1) Several `LAUNCH_TASK` or `LAUNCH_GROUP` calls
>  use the same executor.
> (2) The executor currently does not exist on the agent.
> (3) Due to some race conditions, these tasks are trying to launch
> on the agent in a different order from their original launch order.
>
> In this case, tasks that are trying to launch on the agent
> before the first task in the original order will be explicitly dropped by
> the agent (TASK_DROPPED` or `TASK_LOST` will be sent)).
>
> This bug will be fixed in 1.5.1. It is tracked in
> https://issues.apache.org/jira/browse/MESOS-8624
>
> ----
>
> In https://issues.apache.org/jira/browse/MESOS-1720, we introduced an
> ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
> calls to a new executor. The master would specify that the first call is
> the
> one to launch a new executor through the `launch_executor` field in
> `RunTaskMessage`/`RunTaskGroupMessage`, and the second one should
> use the existing executor launched by the first one.
>
> On the agent side, running a task/task group goes through a series of
> continuations, one is `collect()` on the future that unschedule
> frameworks from
> being GC'ed:
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158
> another is `collect()` on task authorization:
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333
> Since these `collect()` calls run on individual actors, the futures of the
> `collect()` calls for two `LAUNCH`/`LAUNCH_GROUP` calls may return
> out-of-order, even if the futures these two `collect()` wait for are
> satisfied in
> order (which is true in these two cases).
>
> As a result, under some race conditions (probably under some heavy load
> conditions), tasks rely on the previous task to launch executor may
> get processed before the task that is supposed to launch the executor
> first, resulting in the tasks being explicitly dropped by the agent.
>
> -Meng
>
>
>

Mime
View raw message