mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiang Yan Xu <...@jxu.me>
Subject Re: Review Request 45961: Support sharing of resources through reference counting of resources.
Date Tue, 26 Jul 2016 16:06:29 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review143040
-----------------------------------------------------------



Haven't commented on tests. Will look at them together with other tests.

---

Chatted with BenM and Anindya. I think we are able to simplify the code and minimize the specially
handling of shared resources by separating the responsibilities between master, allocator
and sorter this way:

- Invariants: All usage of resources (shared or not) should have directly correponding allocations
in the allocator and sorter.
- Allocator:
    - The only thing special about shared resources in `allocate()` is that it's *available*
once created regardless of past allocations (i.e., `total - allocated`), this adheres to the
definition of shared resources and is easily explanable. When multiple instances of the same
shared resources are allocated, all instances/copies are maintained in `slave.allocated` and
other data structures.
    - We just need to tweak `updateAllocation` for the master's special allocation.
- Sorter: Shared resources only needs a little special handling when creating `createStrippedScalarQuantity`,
multiple instances can appear in a framework's allocation but they don't affect sorting.
- Master: The only special handlings are that 
    - Task validation allows tasks that consume shared resources not in `_offeredResources`
as long as they are in the original `offeredResources` or they are created in the same ACCEPT
call. 
    - Tasks may use more shared resources than being allocated, this leads to a special allocation
for additional instances of these shared resources.

These semantics combined should result in simpler and smaller changes than maintaining other
additional variables which are adjusted in special ways due to shared resources.


src/common/resources.cpp (line 1224)
<https://reviews.apache.org/r/45961/#comment209250>

    Add comment.
    
    // Because we currently only allow persistent volumes to be shared, the originally resource
must be non-shared.



src/common/resources.cpp (line 1266)
<https://reviews.apache.org/r/45961/#comment209251>

    Add a comment:
    
    // Because we currently only allow persistent volumes to be shared, we return the resource
to the non-shared state after destroy of the volume.



src/master/allocator/mesos/hierarchical.hpp (line 327)
<https://reviews.apache.org/r/45961/#comment209379>

    s/each copy/the number of copies/



src/master/allocator/mesos/hierarchical.hpp (line 328)
<https://reviews.apache.org/r/45961/#comment209382>

    s/allocated/allocated to/
    s/recovered/recovered from/



src/master/allocator/mesos/hierarchical.cpp (lines 604 - 607)
<https://reviews.apache.org/r/45961/#comment209390>

    Now we have to modify this method to process the LAUNCH call and adjust the API docs accordingly.



src/master/allocator/mesos/hierarchical.cpp (line 1429)
<https://reviews.apache.org/r/45961/#comment209386>

    s/=/-/



src/master/allocator/sorter/drf/sorter.cpp (line 164)
<https://reviews.apache.org/r/45961/#comment209389>

    s/no more/no existing/



src/master/allocator/sorter/drf/sorter.cpp (lines 165 - 166)
<https://reviews.apache.org/r/45961/#comment209388>

    For this you want to exclude shared resources already allocated to the client but `resources`
can have more instances of the same shared resources than `allocations[name].resources[slaveId].shared()`.



src/master/master.hpp (line 324)
<https://reviews.apache.org/r/45961/#comment209066>

    s/shared resources/each shared resource/ (because later in this sentence you are referencing
"this shared resource"



src/master/master.hpp (line 325)
<https://reviews.apache.org/r/45961/#comment209065>

    Say "each copy corresponds to a running task using it" instead?



src/master/master.hpp (line 331)
<https://reviews.apache.org/r/45961/#comment208840>

    s/assigned/requested/ (assigned is a bit ambiguous, at this point resources are still
not given to the task yet).



src/master/master.hpp (line 340)
<https://reviews.apache.org/r/45961/#comment208745>

    s/accept/ACCEPT/



src/master/master.hpp (line 341)
<https://reviews.apache.org/r/45961/#comment208746>

    s/removing it/removing them/



src/master/master.hpp (line 344)
<https://reviews.apache.org/r/45961/#comment209069>

    With `hashmap<FrameworkID, Resources> pendingResources` we are preventing DESTROY
from affecting pending tasks of the same framework but letting DESTROY to go through under
pending tasks of other frameworks (of the same role).
    
    If we use `pendingResources` it should be across frameworks right?
    
    But because we already need to add
    
    ```
    multihashmap<FrameworkID, TaskID> pendingTasks;
    ```
    
    for /r/47082/,
    
    we can just do that instead in this review, the number of tasks pending for each slave
should be small enough to loop through quickly.
    
    Overall this should be more straightfoward than maintaining `pendingResources`.



src/master/master.hpp (lines 817 - 822)
<https://reviews.apache.org/r/45961/#comment209084>

    As commented below, I realized that we may not need this. We shouldn't calculate total
task resources as it's done in `Master::addTask()`.



src/master/master.hpp (lines 879 - 887)
<https://reviews.apache.org/r/45961/#comment209229>

    We may not need this either if we can directly recover all resources.



src/master/master.hpp (line 2483)
<https://reviews.apache.org/r/45961/#comment209085>

    s/shared resources/each shared resource/ (because later in this sentence you are referencing
"this shared resource"



src/master/master.hpp (line 2484)
<https://reviews.apache.org/r/45961/#comment209086>

    Say "each copy corresponds to a running task using it" instead?



src/master/master.hpp (lines 2501 - 2502)
<https://reviews.apache.org/r/45961/#comment208845>

    Prbably can't store them here as we don't aggregate identity-based resources across agents.
    
    Additionally it looks like we don't need them per comments below.



src/master/master.cpp (line 3352)
<https://reviews.apache.org/r/45961/#comment209254>

    Leave this method unchanged now that we no longer use `totalTaskResources()`.



src/master/master.cpp (lines 3514 - 3517)
<https://reviews.apache.org/r/45961/#comment209255>

    Move over /r/47082/'s `slave->pendingTasks.put()` logic.



src/master/master.cpp (lines 3664 - 3665)
<https://reviews.apache.org/r/45961/#comment209301>

    Shouldn't need to keep track of used shared resources.



src/master/master.cpp (line 3672)
<https://reviews.apache.org/r/45961/#comment208867>

    s/offer/offers/



src/master/master.cpp (line 3674)
<https://reviews.apache.org/r/45961/#comment209137>

    s/cycle/call/
    
    s/a shared resource/the same shared resources/



src/master/master.cpp (lines 3674 - 3675)
<https://reviews.apache.org/r/45961/#comment209307>

    "Note that an offer has a single copy of a shared resource." - This is true but here since
multiple offers are combined, there can be multiple instances of the same offered resources
appearing in `offeredResources` and `offeredSharedResources`.



src/master/master.cpp (lines 3676 - 3678)
<https://reviews.apache.org/r/45961/#comment209139>

    It's not just persistent volume creation right? If the framework chooses to destroy a
volume and then use it (by mistake), the mechanism is the same. Here what matters is that
`offeredSharedResources` is updated by CREATE/DESTROY offer operations. We can just sayd that
    
    ```
    Since only persistent volumes can be shared, `offeredSharedResources` can be updated by
CREATE/DESTROY but not by RESERVE/UNRESERVE. Also, unlike `_offeredResources`, shared resources
are not removed from `offeredSharedResources` when tasks are launched.
    ```



src/master/master.cpp (lines 3841 - 3842)
<https://reviews.apache.org/r/45961/#comment209159>

    This should be safe for now but since we have delegated the logic to apply changes to
the resources to `Resources::apply()`, here it's most consistent to use `offeredSharedResources
= _offeredResources.shared()`.
    
    Also, 2 spaces.



src/master/master.cpp (lines 3906 - 3907)
<https://reviews.apache.org/r/45961/#comment209160>

    2 spaces.
    
    Ditto: `offeredSharedResources = _offeredResources.shared()`.



src/master/master.cpp (line 3936)
<https://reviews.apache.org/r/45961/#comment209131>

    `slave->pendingResources[framework->id()]`: We shouldn't be only looking at the
same frameworks for resources that are in use.



src/master/master.cpp (line 3937)
<https://reviews.apache.org/r/45961/#comment209130>

    Using `totalTaskResources` here is problematic: while this task is pending, the master
could be informed that the executor has exited so the method `totalTaskResources()` would
not include the executor's resources, therefore leaving some resources in `pendingResources`.
    
    Taking a step back it seems we actually **always** care about executor's (**intention**
to use these) resources.
    
    Consider the following example:
    
    -> DESTROY arrives at `accept()`, gets sent to the authorizer. 
    -> Task 1 arrives at `accept()` and it requests some shared resources in the executor
but the executor is already running so the executor's resources are not added to pendingResources.
    -> Task 1 enters the authroizer.
    -> Master receives ExitedExecutorMessage for the executor.
    -> DESTROY arrives at `_accept()` and sees the executor is gone and destroys the volume.
    -> Task 1 enters `_accept()` and finds that the volume is detroyed. 
    
    To prevent the DESTROY to go through we do need to always check the pending tasks' executor
resources. As discussed, let's just iterate the list of pending tasks and look at their task
resources and executor resources.



src/master/master.cpp (line 3992)
<https://reviews.apache.org/r/45961/#comment209309>

    s/accept/ACCEPT/



src/master/master.cpp (lines 4025 - 4027)
<https://reviews.apache.org/r/45961/#comment209228>

    As discussed here I think we can do somethig like (just a sketch):
    
    ```
                Offer::Operation operation;
                operation.set_type(Offer::Operation::LAUNCH);
    
                // To allow tasks to use more instances of shared resources
                // than being offered, we need to allocate more instances 
                // of the shared resources when they are requested by the
                // task but no longer in `_offeredResources`. For now we
                // use `allocator->updateAllocation()` to pass these
                // additional resources as part of the LAUNCH operation.
                // TODO: This solution is temporary as the
                // allocator should be the entity that accepts and manages offers
                // after MESOS-4553 is implemented and then the logic that
                // allocates additional shared resources when accepting
                // the LAUNCH call should be moved to the allocator.
                Resources additionalShared;
                foreach (const Resource& resource, taskResources.shared()) {
                  if (!_offeredResources.contains(resource)) {                
                    additionalShared += resource;
                  }
                }
    
                task_.mutable_resources()->CopyFrom(additionalShared);
                operation.mutable_launch()->add_task_infos()->CopyFrom(task_);
                
                allocator->updateAllocation(framework->id(), slave->id, {operation});
    ```
    
    We can do this after sending RunTaskMessage so we can modify freely modify `task_`.
    Note that an ACCEPT call can have mutliple LAUNCH calls but `allocator->updateAllocation()`
is invoked per Call.



src/master/master.cpp (line 4070)
<https://reviews.apache.org/r/45961/#comment209180>

    So, here besides recovering unused resources, here we can add the logic to "allocate additional
instances of shared resources in the master".
    
    ```
    
    allocator->updateAllocation(framework->id(), slave->id, launches);
    ```



src/master/master.cpp (lines 7326 - 7331)
<https://reviews.apache.org/r/45961/#comment209018>

    So we first decided to have a wrapper method before it became apparent that we needed
to recover all resources (with one exception) to the allocator. Under the new circumstance
it seems that this wrapper would be unnecessary.



src/master/validation.hpp (line 158)
<https://reviews.apache.org/r/45961/#comment209378>

    This will not take pendingTasks on the agent as the last argument.


- Jiang Yan Xu


On July 19, 2016, 3:52 p.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> -----------------------------------------------------------
> 
> (Updated July 19, 2016, 3:52 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, and Jiang
Yan Xu.
> 
> 
> Bugs: MESOS-4431
>     https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending offer to one or
>   more frameworks, resind that offer before processing the DESTROY.
> 
> 
> Diffs
> -----
> 
>   src/common/resources.cpp b1bd2784aefdebf91892638b40b215373b998574 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.hpp b72ba16277a3210e4d309b616d185a10e2029a66

>   src/master/allocator/mesos/hierarchical.cpp 7d4064535a20b93950f5a95eef1ad3f0d37d305b

>   src/master/allocator/sorter/drf/sorter.hpp bc6bfb2d5d3b32d55be055a0514861b4e7d889bb

>   src/master/allocator/sorter/drf/sorter.cpp ac85b327fc33d34246788e6a8c8bf5a486c61434

>   src/master/http.cpp 0293e4956afb5968f84b516e3d4b5f6f858ef7a3 
>   src/master/master.hpp bade8af69f567341667b9907368207189d0dab0e 
>   src/master/master.cpp 370fd8712062dc75bb81824cb99ccc7920acbf78 
>   src/master/quota_handler.cpp bf6a613a7bb3c62fd77d1ffde3170749d6c21fa2 
>   src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
>   src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd 
>   src/master/weights_handler.cpp 9a901e71ba2fbf2ca1c02f658a72d44cfaa5ec62 
>   src/tests/master_validation_tests.cpp 9eb82a569f7c48caa96d4d54a93b199ccac74385 
>   src/tests/sorter_tests.cpp b0e5ef8a55bbcca3553a221bd5691a9c801a04f7 
>   src/v1/resources.cpp 6d4ec75fbb7edab013563142aaf13d5302cc50af 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> -------
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message