mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <benjamin.mah...@gmail.com>
Subject Re: High latency when scheduling and executing many tiny tasks.
Date Fri, 17 Jul 2015 22:39:08 GMT
I've filed a ticket to immediately re-offer recovered resources from
terminal tasks / executors:

https://issues.apache.org/jira/browse/MESOS-3078

On Fri, Jul 17, 2015 at 2:24 PM, Philip Weaver <philip.weaver@gmail.com>
wrote:

> Your advice worked and made a huge difference. With
> allocation_interval=50ms, the 1000 tasks now execute in 21s instead of
> 120s. Thanks.
>
> On Fri, Jul 17, 2015 at 2:20 PM, Philip Weaver <philip.weaver@gmail.com>
> wrote:
>
>> Ok, thanks!
>>
>> On Fri, Jul 17, 2015 at 2:18 PM, Alexander Gallego <agallego@concord.io>
>> wrote:
>>
>>> I use a similar pattern.
>>>
>>> I have my own scheduler as you have. I deploy my own executor which
>>> downloads a tar from some storage and effectively ` execvp ( ... ) ` a
>>> proc. It monitors the child proc and reports status of child pid exit
>>> status.
>>>
>>> Check out the Marathon code if you are writing in scala. It is an
>>> excellent example for both scheduler and executor templates.
>>>
>>> -ag
>>>
>>> On Fri, Jul 17, 2015 at 5:06 PM, Philip Weaver <philip.weaver@gmail.com>
>>> wrote:
>>>
>>>> Awesome, I suspected that was the case, but hadn't discovered the
>>>> --allocation_interval flag, so I will use that.
>>>>
>>>> I installed from the mesosphere RPMs and didn't change any flags from
>>>> there. I will try to find some logs that provide some insight into the
>>>> execution times.
>>>>
>>>> I am using a command task. I haven't looked into executors yet; I had a
>>>> hard time finding some examples in my language (Scala).
>>>>
>>>> On Fri, Jul 17, 2015 at 2:00 PM, Benjamin Mahler <
>>>> benjamin.mahler@gmail.com> wrote:
>>>>
>>>>> One other thing, do you use an executor to run many tasks? Or are you
>>>>> using a command task?
>>>>>
>>>>> On Fri, Jul 17, 2015 at 1:54 PM, Benjamin Mahler <
>>>>> benjamin.mahler@gmail.com> wrote:
>>>>>
>>>>>> Currently, recovered resources are not immediately re-offered as
you
>>>>>> noticed, and the default allocation interval is 1 second. I'd recommend
>>>>>> lowering that (e.g. --allocation_interval=50ms), that should improve
the
>>>>>> second bullet you listed. Although, in your case it would be better
to
>>>>>> immediately re-offer recovered resources (feel free to file a ticket
for
>>>>>> supporting that).
>>>>>>
>>>>>> For the first bullet, mind providing some more information? E.g.
>>>>>> master flags, slave flags, scheduler logs, master logs, slave logs,
>>>>>> executor logs? We would need to trace through a task launch to see
where
>>>>>> the latency is being introduced.
>>>>>>
>>>>>> On Fri, Jul 17, 2015 at 12:26 PM, Philip Weaver <
>>>>>> philip.weaver@gmail.com> wrote:
>>>>>>
>>>>>>> I'm trying to understand the behavior of mesos, and if what I
am
>>>>>>> observing is typical or if I'm doing something wrong, and what
options I
>>>>>>> have for improving the performance of how offers are made and
how tasks are
>>>>>>> executed for my particular use case.
>>>>>>>
>>>>>>> I have written a Scheduler that has a queue of very small tasks
(for
>>>>>>> testing, they are "echo hello world", but in production many
of them won't
>>>>>>> be much more expensive than that). Each task is configured to
use 1 cpu
>>>>>>> resource. When resourceOffers is called, I launch as many tasks
as I can in
>>>>>>> the given offers; that is, one call to driver.launchTasks for
each offer,
>>>>>>> with a list of tasks that has one task for each cpu in that offer.
>>>>>>>
>>>>>>> On a cluster of 3 nodes and 4 cores each (12 total cores), it
takes
>>>>>>> 120s to execute 1000 tasks out of the queue. We are evaluting
mesos because
>>>>>>> we want to use it to replace our current homegrown cluster controller,
>>>>>>> which can execute 1000 tasks in way less than 120s.
>>>>>>>
>>>>>>> I am seeing two things that concern me:
>>>>>>>
>>>>>>>    - The time between driver.launchTasks and receiving a callback
>>>>>>>    to statusUpdate when the task completes is typically 200-500ms,
and
>>>>>>>    sometimes even as high as 1000-2000ms.
>>>>>>>    - The time between when a task completes and when I get an
offer
>>>>>>>    for the newly freed resource is another 500ms or so.
>>>>>>>
>>>>>>> These latencies explain why I can only execute tasks at a rate
of
>>>>>>> about 8/s.
>>>>>>>
>>>>>>> It looks like my offers always include all 4 cores on each machine,
>>>>>>> which would indicate that mesos doesn't like to send an offer
as soon as a
>>>>>>> single resource is avaiable, and prefers to delay and send an
offer with
>>>>>>> more resources in it. Is this true?
>>>>>>>
>>>>>>> Thanks in advance for any advice you can offer!
>>>>>>>
>>>>>>> - Phllip
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>

Mime
View raw message