mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Mahler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-3157) only perform batch resource allocations
Date Sat, 05 Sep 2015 00:14:45 GMT

    [ https://issues.apache.org/jira/browse/MESOS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731647#comment-14731647
] 

Benjamin Mahler commented on MESOS-3157:
----------------------------------------

{quote}
 The problem with this is knowing when to trigger the location pass. You want to trigger it
one you have more that a few slaveID's ready, but before the batch allocation kicks in. You
also want to wait as long as possible so that you can batch as many as possible. This seems
tricky; I can't think of a way to know that no more addSlave or updateSlave events are going
to come.
{quote}

This should not be complicated, it's just a matter of doing a deferred allocation (via {{defer}})
as I mentioned in (2) above. This ensures that the allocation occurs after all currently enqueued
events. When any subsequent deferred allocations occur, they don't have to do any "work" since
the set of slaves that require allocation get cleared (as I mentioned in (2)). We could track
the outstanding allocation explicitly, but we already have to deal with the batch allocation
deferral so not sure if there's any value in that.

> only perform batch resource allocations
> ---------------------------------------
>
>                 Key: MESOS-3157
>                 URL: https://issues.apache.org/jira/browse/MESOS-3157
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation
>            Reporter: James Peach
>            Assignee: James Peach
>
> Our deployment environments have a lot of churn, with many short-live frameworks that
often revive offers. Running the allocator takes a long time (from seconds up to minutes).
> In this situation, event-triggered allocation causes the event queue in the allocator
process to get very long, and the allocator effectively becomes unresponsive (eg. a revive
offers message takes too long to come to the head of the queue).
> We have been running a patch to remove all the event-triggered allocations and only allocate
from the batch task {{HierarchicalAllocatorProcess::batch}}. This works great and really improves
responsiveness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message