mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Klues <klue...@gmail.com>
Subject Re: Review Request 50841: Added GPU scheduling logic to docker containerizer process.
Date Thu, 25 Aug 2016 00:32:23 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50841/#review146726
-----------------------------------------------------------




src/slave/containerizer/docker.hpp (lines 505 - 507)
<https://reviews.apache.org/r/50841/#comment213320>

    I would just call this variable `gpus`
    Also the comment should read:
    ```
    // The number of GPUs allocated to the container.
    ```



src/slave/containerizer/docker.cpp (line 653)
<https://reviews.apache.org/r/50841/#comment213322>

    I would probably call this variable `count`. When I saw the name `requestedNvidiaGpu`
I thought it was a specific GPU id being passed in, not a count.
    
    I would also call the function `allocateNvidiaGpus()` since you can allocate more than
one with this function.



src/slave/containerizer/docker.cpp (lines 662 - 664)
<https://reviews.apache.org/r/50841/#comment213326>

    With type `size_t` you can never have a negative value., so just checking `== 0` should
be enough here.



src/slave/containerizer/docker.cpp (lines 666 - 668)
<https://reviews.apache.org/r/50841/#comment213330>

    I would do this as the first check in this function.  If we don't have an allocator set,
then we really shouldn't even be calling this function regardless of anything else that is
going on.
    
    Also, the string should read:
    ```
    "The `allocateNvidiaGpu` function was called without an `NvidiaGpuAllocator` set".
    ```



src/slave/containerizer/docker.cpp (line 681)
<https://reviews.apache.org/r/50841/#comment213343>

    I would rename  this function `deallocateNvidiaGpus()` (i.e. with an `s` on the end).



src/slave/containerizer/docker.cpp (line 684)
<https://reviews.apache.org/r/50841/#comment213340>

    I would move this down just before its first use.



src/slave/containerizer/docker.cpp (lines 690 - 692)
<https://reviews.apache.org/r/50841/#comment213335>

    Again, I would move this up to the top of the function, with a similar Failure string
as before.



src/slave/containerizer/docker.cpp (lines 694 - 696)
<https://reviews.apache.org/r/50841/#comment213350>

    Why do you need this level of indirection? Why not just pass `containers_[containerId]->gpuAllocated`
directly to `nvidiaGpuAllocator->deallocate()`?



src/slave/containerizer/docker.cpp (lines 698 - 710)
<https://reviews.apache.org/r/50841/#comment213348>

    Why don't you just return from the `deallocate()` call with a `.then()`? I.e.
    
    ```
      return nvidiaGpuAllocator->deallocate(deallocated)
        .then(defer(self(), [=](const Nothing& nothing) {
          containers_[containerId]->gpuAllocated.clear();
          return Nothing();
        }));
    ```
    
    If any failures happen in the deallocation, they should get propagated through.



src/slave/containerizer/docker.cpp (lines 1376 - 1379)
<https://reviews.apache.org/r/50841/#comment213366>

    I would probably remove the intermediate variable above called `requested` and instead
save a temporary `Future` to the `allocateNvidiaGpu()` call. I would also move all of the
GPU logic together instead of separating it out.
    
    Something like:
    
    ```
    
    Future<Nothing> allocateGpus = Nothing();
    
    const Option<TaskInfo>& taskInfo = container->task;
    if (taskInfo.isNone()) {
      return Failure("No task information found");
    }
    
    if (taskInfo->resources.gpus().isSome()) {
      // Make sure that the `gpus` resource is not fractional.
      // We rely on scalar resources to only have 3 digits of precision.
      if (static_cast<long long>(gpus.get() * 1000.0) % 1000 != 0) {
        return Failure("The 'gpus' resource must be an unsigned integer");
      }
    
      allocateGpus = allocateNvidiaGpu(gpus.get(), containerId);
    }
    
    ...
    ...
    ...
    
    return allocateGpus
      .then(...)
      .then(...);
    ```



src/slave/containerizer/docker.cpp (line 1555)
<https://reviews.apache.org/r/50841/#comment213368>

    I wouldn't just blindly call this function here. It should be wrapped in some logic that
makes sure it's OK to call it (i.e. checks to make sure that we have the nvidia->allocator
component passed in).
    
    Again, you could have some logic above which saves a temporary `Future` that is set to
`Nothing()` by default and is the result of calling `deallocateNvidiaGpu()` otherwise.



src/slave/containerizer/docker.cpp (line 2035)
<https://reviews.apache.org/r/50841/#comment213372>

    Same here, use the temporary mentioned above to do the deallocation or not.


- Kevin Klues


On Aug. 22, 2016, 10:11 a.m., Yubo Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50841/
> -----------------------------------------------------------
> 
> (Updated Aug. 22, 2016, 10:11 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Guangya Liu, Kevin Klues, and Rajat Phull.
> 
> 
> Bugs: MESOS-5795
>     https://issues.apache.org/jira/browse/MESOS-5795
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added control logic to allocate/deallocate GPUs to GPU-related task
> when the task is started/terminated.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/docker.hpp f2a06065cf99fed934c2c1ffc47461ec8a97f50d 
>   src/slave/containerizer/docker.cpp 5c1ee8e467d1c54c60b67dc5275ef71e1bb90723 
> 
> Diff: https://reviews.apache.org/r/50841/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Yubo Li
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message