mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Mahler (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-6904) Perform batching of allocations to reduce allocator queue backlogging.
Date Tue, 17 Jan 2017 20:03:27 GMT

     [ https://issues.apache.org/jira/browse/MESOS-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Benjamin Mahler updated MESOS-6904:
-----------------------------------
    Description: 
Per MESOS-3157:

{quote}
Our deployment environments have a lot of churn, with many short-live frameworks that often
revive offers. Running the allocator takes a long time (from seconds up to minutes).

In this situation, event-triggered allocation causes the event queue in the allocator process
to get very long, and the allocator effectively becomes unresponsive (eg. a revive offers
message takes too long to come to the head of the queue).
{quote}

To remedy the above scenario, it is proposed to perform batching of the enqueued allocation
operations so that a single allocation operation can satisfy N enqueued allocations. This
should reduce the potential for backlogging in the allocator. See the discussion [here|https://issues.apache.org/jira/browse/MESOS-3157?focusedCommentId=14728377&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14728377]
in MESOS-3157.

  was:
"Our deployment environments have a lot of churn, with many short-live frameworks that often
revive offers. Running the allocator takes a long time (from seconds up to minutes).
In this situation, event-triggered allocation causes the event queue in the allocator process
to get very long, and the allocator effectively becomes unresponsive (eg. a revive offers
message takes too long to come to the head of the queue)." - MESOS-3157 

To remedy the above scenario, it is proposed to track allocation candidates and only dispatch
allocation work if there is no pending allocation in the allocator queue. When an enqueued
allocation is processed, the tracked set of candidates is cleared. 

Current behavior will trigger allocation work on cluster events (e.g. `addSlave()`, `addFramework()`,
etc) as well as during the periodic batched allocation running at a defined time interval.


This ticket tracks the new direction the work has taken since discussion in MESOS-3157 where
a previous solution by [~jamespeach] introduced batched allocation only (which we currently
run) as well as an approach to reduce redundancy of work in the queue. 

        Summary: Perform batching of allocations to reduce allocator queue backlogging.  (was:
Track resource allocation candidates and batch allocation work)

> Perform batching of allocations to reduce allocator queue backlogging.
> ----------------------------------------------------------------------
>
>                 Key: MESOS-6904
>                 URL: https://issues.apache.org/jira/browse/MESOS-6904
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation
>            Reporter: Jacob Janco
>            Assignee: Jacob Janco
>              Labels: allocator
>
> Per MESOS-3157:
> {quote}
> Our deployment environments have a lot of churn, with many short-live frameworks that
often revive offers. Running the allocator takes a long time (from seconds up to minutes).
> In this situation, event-triggered allocation causes the event queue in the allocator
process to get very long, and the allocator effectively becomes unresponsive (eg. a revive
offers message takes too long to come to the head of the queue).
> {quote}
> To remedy the above scenario, it is proposed to perform batching of the enqueued allocation
operations so that a single allocation operation can satisfy N enqueued allocations. This
should reduce the potential for backlogging in the allocator. See the discussion [here|https://issues.apache.org/jira/browse/MESOS-3157?focusedCommentId=14728377&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14728377]
in MESOS-3157.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message