Return-Path: X-Original-To: apmail-mesos-reviews-archive@minotaur.apache.org Delivered-To: apmail-mesos-reviews-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BDC2518221 for ; Fri, 26 Jun 2015 22:56:39 +0000 (UTC) Received: (qmail 78007 invoked by uid 500); 26 Jun 2015 22:56:39 -0000 Delivered-To: apmail-mesos-reviews-archive@mesos.apache.org Received: (qmail 77981 invoked by uid 500); 26 Jun 2015 22:56:39 -0000 Mailing-List: contact reviews-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@mesos.apache.org Delivered-To: mailing list reviews@mesos.apache.org Received: (qmail 77962 invoked by uid 99); 26 Jun 2015 22:56:39 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Jun 2015 22:56:39 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 1F359AB8E4; Fri, 26 Jun 2015 22:56:38 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============6934198745844374564==" MIME-Version: 1.0 Subject: Re: Review Request 35702: Added /reserve HTTP endpoint to the master. From: "Michael Park" To: "Adam B" , "Benjamin Hindman" , "Jie Yu" , "Joris Van Remoortere" , "Vinod Kone" , "Ben Mahler" Cc: "mesos" , "Michael Park" , "Alexander Rukletsov" , "Mesos ReviewBot" Date: Fri, 26 Jun 2015 22:56:38 -0000 Message-ID: <20150626225638.3113.73666@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: "Michael Park" X-ReviewGroup: mesos X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/35702/ X-Sender: "Michael Park" References: <20150626225548.3113.71045@reviews.apache.org> In-Reply-To: <20150626225548.3113.71045@reviews.apache.org> Reply-To: "Michael Park" X-ReviewRequest-Repository: mesos --===============6934198745844374564== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35702/ ----------------------------------------------------------- (Updated June 26, 2015, 10:56 p.m.) Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Jie Yu, Joris Van Remoortere, and Vinod Kone. Changes ------- Ready for review. Summary (updated) ----------------- Added /reserve HTTP endpoint to the master. Bugs: MESOS-2600 https://issues.apache.org/jira/browse/MESOS-2600 Repository: mesos Description ------- This involved a lot more challenges than I anticipated, I've captured the various approaches and limitations and deal-breakers of those approaches here: [Master Endpoint Implementation Challenges](https://docs.google.com/document/d/1cwVz4aKiCYP9Y4MOwHYZkyaiuEv7fArCye-vPvB2lAI/edit#) Key points: * This is a stop-gap solution until we shift the offer creation/management logic from the master to the allocator. * `updateAvailable` and `updateSlave` are kept separate because (1) `updateAvailable` is allowed to fail whereas `updateSlave` must not. (2) `updateAvailable` returns a `Future` whereas `updateSlave` does not. (3) `updateAvailable` never leaves the allocator in an over-allocated state and must not, whereas `updateSlave` does, and can. * The algorithm: * Initially, the master pessimistically assume that what seems like "available" resources will be gone. This is due to the race between the allocator scheduling an `allocate` call to itself vs master's `allocator->updateAvailable` invocation. As such, we first try to satisfy the request only with the offered resources. * We greedily rescind one offer at a time until we've rescinded sufficiently many offers. IMPORTANT: We perform `recoverResources(..., Filters())` rather than `recoverResources(..., None())` so that we can pretty much always win the race against `allocate`. In the case that we lose, no disaster occurs. We simply fail to satisfy the request. * If we still don't have enough resources after resciding all offers, be optimistic and forward the request to the allocator since there may be available resources to satisfy the request. * If the allocator returns a failure, report the error to the user with `PreconditionFailed`. This could be updated to be `Forbidden`, or `Conflict` maybe as well. We'll pick one eventually. This approach is clearly not ideal, since we would prefer to rescind as little offers as possible. The challenges of implementing the ideal solution in the current state is described in the document above. TODO(mpark): Add more comments and test cases. Diffs ----- src/Makefile.am a064d17a6b62e6e3c8e190135bcc8cbbb0051ed5 src/master/http.cpp 350383362311cfbc830965e1155a8515f0dfb332 src/master/master.hpp af83d3e82d2c161b3cc4583e78a8cbbd2f9a4064 src/master/master.cpp 0782b543b451921d2240958c7ef612a9e30972df src/master/validation.hpp 469d6f56c3de28a34177124aae81ce24cb4ad160 src/master/validation.cpp 9d128aa1b349b018b8e4a1916434d848761ca051 src/tests/reserve_tests.cpp PRE-CREATION Diff: https://reviews.apache.org/r/35702/diff/ Testing ------- Added `src/tests/reserve_tests.cpp`. Thanks, Michael Park --===============6934198745844374564==--