mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <...@jxu.me>
Subject Re: Review Request 45961: Support sharing of resources through reference counting of resources.
Date Mon, 11 Jul 2016 18:17:38 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review141305
-----------------------------------------------------------



Partial review. Let's discuss first.


src/master/master.cpp (lines 3495 - 3498)
<https://reviews.apache.org/r/45961/#comment206863>

    `task.has_executor()` doesn't always lead to additional resources being used/charged against
the framework. See `Master::addTask()`. It will suck a lot if we have to duplicate that logic
here though so we might have to consider doing the pending task refactor first...



src/master/master.cpp (lines 3597 - 3598)
<https://reviews.apache.org/r/45961/#comment206853>

    



src/master/master.cpp (lines 3907 - 3910)
<https://reviews.apache.org/r/45961/#comment206887>

    Need to check whether this executor actually consumes resources.



src/master/allocator/mesos/hierarchical.cpp (line 1271)
<https://reviews.apache.org/r/45961/#comment206973>

    A comment on what it is used for?
    
    ```
    Due to the two stages in the allocation algorithm and the nature of shared resources being
re-offerable even if already allocated, the same shared resources can appear in two distinct
offers in one allocation cycle. This would be bad because the allocator API contract shouldn't
depend on its implementation details. For now we make sure a shared resource is only allocated
once in one offer cycle. To achieve this we use `offeredSharedResources` to keep track of
shared resources already allocated in the current cycle.
    ```



src/master/allocator/mesos/hierarchical.cpp (lines 1331 - 1340)
<https://reviews.apache.org/r/45961/#comment206974>

    Given this is the 1st stage of the allocate cycle there's no chance a shared resource
could be in `offeredSharedResources` already?



src/master/allocator/mesos/hierarchical.cpp (lines 1386 - 1400)
<https://reviews.apache.org/r/45961/#comment206992>

    IIUC the comment still means "this is done to make sure shared resources are counted at
most once in the framework's allocation".
    
    The last revision had:
    
    ```
    slaves[slaveId].allocated -= resources.shared();
    slaves[slaveId].allocated += resources;
    ```
    
    How is this longer version different?



src/master/allocator/mesos/hierarchical.cpp (lines 1499 - 1513)
<https://reviews.apache.org/r/45961/#comment206995>

    Is this equivalent?
    
    ```
    Resources available = slaves[slaveId].total.nonShared() - slaves[slaveId].allocated).nonShared();
    available += slaves[slaveId].total.shared() - offeredSharedResources[slaveId];
    ```



src/master/allocator/mesos/hierarchical.cpp (lines 1570 - 1571)
<https://reviews.apache.org/r/45961/#comment206997>

    Explain the `nonShared()` here.
    
    `remainingClusterResources` doesn't exclude shared resources.



src/master/allocator/mesos/hierarchical.cpp (lines 1595 - 1605)
<https://reviews.apache.org/r/45961/#comment206999>

    Ditto to my comment on the same code above.



src/master/allocator/sorter/drf/sorter.cpp (lines 163 - 170)
<https://reviews.apache.org/r/45961/#comment207005>

    So `DRFSorter::allocated` makes sure there's no duplicate shared resources in `allocations`
so the caller can call it without pruning out the redundant copies. However the `unallocated`
counterpart doesn't do the same? Ideally usage of the pair of API should be symmetric.



src/master/allocator/sorter/drf/sorter.cpp (line 194)
<https://reviews.apache.org/r/45961/#comment207000>

    No longer relevant?



src/master/master.hpp (line 274)
<https://reviews.apache.org/r/45961/#comment206784>

    Add some comment to help explain why we need this method:
    
    ```
      // Return a subset of `resources` offered to `frameworkId` that can 
      // be recovered. Right now we filter out shared resources that are
      // still used by `frameworkId`.
      // NOTE: A framework can reuse a shared resource to launch multiple
      // tasks and the allocator only recovers a shared resource allocated
      // to the framework if it's not used by any task of the framework.
    ```



src/master/master.hpp (line 278)
<https://reviews.apache.org/r/45961/#comment207068>

    s/updated/result/.



src/master/master.hpp (lines 280 - 286)
<https://reviews.apache.org/r/45961/#comment207067>

    If `updated` has multple copies of a shared resource, the `-=` here can't remove it.
    
    We have to add it back. The result would have a single copy of the shared resource that
are unused (by this framework; I think this is what we want).
    
    ```
    Resources recoverable(const Resources& resources)
    {
      Resources recoverable = resources.nonShared();
      foreach (const Resource& resource, resources.shared()) {
        if (!usedResources[frameworkId].contains(resource)) {
          recoverable += resource;
        }
      }
    
      return recoverable;
    }
    ```



src/master/master.hpp (line 343)
<https://reviews.apache.org/r/45961/#comment206780>

    Comment.
    
    ```
    // Resources consumed by pending tasks.
    ```



src/master/master.cpp (line 3409)
<https://reviews.apache.org/r/45961/#comment206787>

    Shouldn't this go through `recoverable()` as well?
    
    It'll be helpful if add some comments on the `recoverResources` to spell out what exactly
is expected.



src/master/master.cpp (line 3488)
<https://reviews.apache.org/r/45961/#comment207073>

    Add to this comment that if the task isn't going to be launched, we don't need to account
for its resources in `pendingResources` either.



src/master/master.cpp (line 3495)
<https://reviews.apache.org/r/45961/#comment207075>

    Checking `task.has_executor()` is insufficient as a condition to add the executor's resources
because they may not consume additional resources.



src/master/master.cpp (line 3497)
<https://reviews.apache.org/r/45961/#comment206794>

    Two space indentation.



src/master/master.cpp (lines 3597 - 3598)
<https://reviews.apache.org/r/45961/#comment207076>

    At least we need a comment on why it's safe to recover all `offeredResources` if `slave
== nullptr`: if slave is nullptr, no `usedResources` is on it so all resources should be recoverable.
    
    But honestly this looks odd and we need to invoke this conditional operator in multiple
places. Adding the same comment to all places this is called is also very redundant.
    
    Perhaps `recoverable` can be moved to Master and hide this detail about `slave == nullptr`?
Perhaps we should just wrap around `allocator->recoverResources` in Master to insert this
logic.



src/master/master.cpp (lines 3639 - 3640)
<https://reviews.apache.org/r/45961/#comment207079>

    Ditto.



src/master/master.cpp (line 3650)
<https://reviews.apache.org/r/45961/#comment207081>

    No need for the underscore and need comments on the variable.



src/master/master.cpp (line 3907)
<https://reviews.apache.org/r/45961/#comment207082>

    Ditto to my comment above, we don't know if the task's executor actually consumes resources
by just checking `task.has_executor()`.



src/master/master.cpp (line 3966)
<https://reviews.apache.org/r/45961/#comment207083>

    Not "original" if it goes through offer operations. How about just "We add back offered
shared resource even if they are consumed by other tasks in the same accept call so multiple
tasks using the same shared resource can be launched with the same offer".
    
    We still need to comment at where the variable is first introduced.



src/master/master.cpp (lines 4042 - 4044)
<https://reviews.apache.org/r/45961/#comment207084>

    Instead of commenting here we should comment on the method because it's called from multiple
callsites.



src/master/master.cpp (lines 4143 - 4144)
<https://reviews.apache.org/r/45961/#comment207085>

    Ditto to comments above about the same code.



src/master/master.cpp (lines 4281 - 4285)
<https://reviews.apache.org/r/45961/#comment207092>

    So `taskId` in `pendingTasks` actually doesn't guarantee `slave != nullptr` because `removeSlave`
doesn't clean it up.



src/master/master.cpp (lines 4287 - 4297)
<https://reviews.apache.org/r/45961/#comment207093>

    Ditto about `task.has_executor()`.



src/master/validation.hpp (lines 157 - 158)
<https://reviews.apache.org/r/45961/#comment207090>

    Don't we always have the two additional arugments, even thought they could be empty (but
not None)?


- Jiang Yan Xu


On July 7, 2016, 8:44 a.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> -----------------------------------------------------------
> 
> (Updated July 7, 2016, 8:44 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, and Jiang
Yan Xu.
> 
> 
> Bugs: MESOS-4431
>     https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending offer to one or
>   more frameworks, resind that offer before processing the DESTROY.
> 
> 
> Diffs
> -----
> 
>   src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.cpp c1e00039606164599e25ff5f76245e4d35ec3112

>   src/master/allocator/sorter/drf/sorter.hpp e29feebd70277c79f7c3f6fb233e7a36501cf220

>   src/master/allocator/sorter/drf/sorter.cpp 7df4dd641b21ea0705368861bf4679fed1ef078d

>   src/master/http.cpp bff1fd53462bfec19e4a025c49a00e2855faf4f3 
>   src/master/master.hpp 60efd22280c62801b080365986fe9773737ca563 
>   src/master/master.cpp 79e3d78ba45060bc2f2532fdc3d105d1cc888d0f 
>   src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
>   src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd 
>   src/tests/master_validation_tests.cpp 9eb82a569f7c48caa96d4d54a93b199ccac74385 
>   src/tests/sorter_tests.cpp 0a921347586808863e615ca3dcc23fae92b629f5 
>   src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> -------
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message