mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Sweeney <kevi...@apache.org>
Subject Re: Upcoming change to the Scheduler API
Date Fri, 13 Feb 2015 17:35:01 GMT
Regarding the backwards-compatibility concern, would it make sense to add a
TaskStatusID field to the existing TaskStatus message instead of changing
the Scheduler signature?

On Friday, February 13, 2015, Benjamin Mahler <benjamin.mahler@gmail.com>
wrote:

> Hi all,
>
> As part of https://issues.apache.org/jira/browse/MESOS-2347, there is a
> scalability concern with the reconciliation API. Performing an implicit
> reconciliation results in a status update being sent for each task in the
> cluster. For large clusters in the tens of thousands of slaves, this can be
> begin to approach hundreds of thousands of status updates.
>
> With the current design of the driver, status updates must be persisted
> before the scheduler returns from the 'statusUpdate' callback, as the
> driver sends an acknowledgement implicitly once the call completes. This
> design forces the scheduler to synchronously process individual status
> updates.
>
> To remedy the issue, we're looking to introduce the ability to optionally
> specify whether the implicit acknowledgements are provided (during
> construction of the scheduler driver). If disabled, then the scheduler must
> send acknowledgments through a new 'acknowledgeStatusUpdate' call on the
> driver. Having explicit acknowledgements allows schedulers to process them
> asynchronously outside of the driver thread, and allows them to process
> updates in batch (e.g. 1:N storage operation:status updates).
>
> As part of the change, the underlying UUID of the status update needs to be
> exposed to the scheduler, which requires an update to the signature of
> 'statusUpdate'. What this means is that when schedulers include the new
> headers/JAR/egg, they need to adjust their code to accept the new uuid
> argument, regardless of whether implicit acknowledgements are desired (to
> my knowledge, there is no way to expose the uuid without requiring
> schedulers to update their code, because of Java's interface semantics).
>
> I'd like to get this change landed for 0.22.0 to make reconciliation usable
> for large clusters. The patches are up on MESOS-2347. I've outlined the
> compatibility details and upgrade steps in
> https://reviews.apache.org/r/30978/
>
> Please share any high level feedback or concerns!
>
> Ben
>


-- 
Sent from Gmail Mobile

Mime
View raw message