mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sharma Podila <spod...@netflix.com>
Subject Re: Task serialization per machine?
Date Tue, 01 Jul 2014 17:03:58 GMT
Hi Asim,

I am using (developing) a Java executor. I see a similar strategy in the
Mesos-Hadoop executor.

https://github.com/mesos/hadoop/blob/master/src/main/java/org/apache/hadoop/mapred/MesosExecutor.java

Executor's successful launching of the task (asynchronously) is usually
immediately followed by a TaskState.TASK_RUNNING status message to driver.
It can then return from the launchTask method, but the executor process
shouldn't exit, it will have to remain running for at least the duration of
the task. Upon completion of the task, the executor must notify Mesos of
its completion. A task lost status will be reported by Mesos if the
executor were to exit pre-maturely.

My explanation is from understanding Mesos as a user and framework
developer. Someone from the Mesos dev team may have a better way to explain
this.
I suspect framework callbacks, at least at the executor, aren't done
concurrently. I haven't looked in to the details of why/how/etc.





On Tue, Jul 1, 2014 at 7:48 AM, Asim <linkasim@gmail.com> wrote:

> Thanks for your response!
>
> Yes the executor (launchTask) only gets one task that it executes
> synchronously and finishes. Since launchTask is a callback, my intuition
>  is the scheduler should launch these tasks in parallel (even within a
> single machine) after calculating the resources required. I can create a
> new thread in launchTask() callback and return immediately but that will
> cause a lost slave since the scheduler assumes it is finished but there is
> a zombie thread still around. Hence, I am not completely sure creating new
> threads will solve this issue.
>
> I am using the C++ framework. Is there an example on how this is
> accomplished in current frameworks?  I looked at Spark and it does not seem
> to be doing anything special for its callbacks to ensure that multiple
> tasks on a single machine execute in parallel.
>
> Thanks,
> Asim
>
>
>
>
>
>
>
> On Mon, Jun 30, 2014 at 4:48 PM, Sharma Podila <spodila@netflix.com>
> wrote:
>
>> A likely scenario is that your executor is running the task synchronously
>> inside the callback to launchTask(). If you make it instead run the task
>> asynchronously (e.g., in a separate thread), that should resolve it.
>>
>>
>> On Mon, Jun 30, 2014 at 12:48 PM, Asim <linkasim@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I want to launch multiple tasks on multiple machines (t >> m) that can
>>> run simultaneously. Currently, I find that every machine processes the
>>> tasks in a serial fashion one after another.
>>>
>>> I have written a framework with a scheduler and a executor. The
>>> scheduler launches a task list on a bunch of machines (that show up as
>>> offers). When I send a task list to run
>>> with driver->launchTasks(offers[i].id(), tasks[i]) I find that every
>>> machine picks up one task at a time (and then goes to the next). This
>>> happens even though the offer can accommodate more than one task from this
>>> task list easily.
>>>
>>> Is there something that I am missing?
>>>
>>> Thanks,
>>> Asim
>>>
>>>
>>
>

Mime
View raw message