mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elizabeth Lingg <elizabeth_li...@apple.com>
Subject Re: [GPU] [Allocation] "Scarce" Resource Allocation
Date Tue, 21 Jun 2016 20:40:05 GMT
Thanks, looking forward to discussion and review on your document. The main use case I see
here is that some of our frameworks will want to request the GPU resources, and we want to
make sure that those frameworks are able to successfully launch tasks on agents with those
resources. We want to be certain that other frameworks that do not require GPU’s will not
request all other resources on those agents (i.e. cpu, disk, memory) which would mean the
GPU resources are not allocated and the frameworks that require them will not receive them.
As Ben Mahler mentioned, "(2) Because we do not have revocation yet, if a framework decides
to consume the non-GPU resources on a GPU machine, it will prevent the GPU workloads from
running!” This will occur for us in clusters where we have higher utilization as well as
different types of workloads running. Smart task placement then becomes more relevant (i.e.
we want to be able to schedule with scarce resources successfully and we may have considerations
like not scheduling too many I/O bound workloads on a single host or more stringent requirements
for scheduling persistent tasks).

 Elizabeth Lingg



> On Jun 20, 2016, at 7:24 PM, Guangya Liu <gyliu513@gmail.com> wrote:
> 
> Had some discussion with Ben M, for the following two solutions:
> 
> 1) Ben M: Create sub-pools of resources based on machine profile and
> perform fair sharing / quota within each pool plus a framework
> capability GPU_AWARE
> to enable allocator filter out scarce resources for some frameworks.
> 2) Guangya: Adding new sorters for non scarce resources plus a framework
> capability GPU_AWARE to enable allocator filter out scarce resources for
> some frameworks.
> 
> Both of the above two solutions are meaning same thing and there is no
> difference between those two solutions: Create sub-pools of resources will
> need to introduce different sorters for each sub-pools, so I will merge
> those two solutions to one.
> 
> Also had some dicsussion with Ben for AlexR's solution of implementing
> "requestResource", this API should be treated as an improvement to the
> issues of doing resource allocation pessimistically. (e.g. we offer/decline
> the GPUs to 1000 frameworks before offering it to the GPU framework that
> wants it). And the "requestResource" is providing *more information* to
> mesos. Namely, it gives us awareness of demand.
> 
> Even though for some cases, we can use the "requestResource" to get all of
> the scarce resources, and then once those scarce resources are in use, then
> the WDRF sorter will sorter non scarce resources as normal, but the problem
> is that we cannot guarantee that the framework which have "requestResource"
> can always consume all of the scarce resources before those scarce resource
> allocated to other frameworks.
> 
> I'm planning to draft a document based on solution 1) "Create sub-pools"
> for the long term solution, any comments are welcome!
> 
> Thanks,
> 
> Guangya
> 
> On Sat, Jun 18, 2016 at 11:58 AM, Guangya Liu <gyliu513@gmail.com> wrote:
> 
>> Thanks Du Fan. So you mean that we should have some clear rules in
>> document or somewhere else to tell or guide cluster admin which resources
>> should be classified as scarce resources, right?
>> 
>> On Sat, Jun 18, 2016 at 2:38 AM, Du, Fan <fan.du@intel.com> wrote:
>> 
>>> 
>>> 
>>> On 2016/6/17 7:57, Guangya Liu wrote:
>>> 
>>>> @Fan Du,
>>>> 
>>>> Currently, I think that the scarce resources should be defined by cluster
>>>> admin, s/he can specify those scarce resources via a flag when master
>>>> start
>>>> up.
>>>> 
>>> 
>>> This is not what I mean.
>>> IMO, it's not cluster admin's call to decide what resources should be
>>> marked as scarce , they can carry out the operation, but should be advised
>>> on based on the clear rule: to what extend the resource is scarce compared
>>> with other resources, and it will affect wDRF by causing starvation for
>>> frameworks which holds scarce resources, that's my point.
>>> 
>>> To my best knowledge here, a quantitative study of how wDRF behaves in
>>> scenario of one/multiple scarce resources first will help to verify the
>>> proposed approach, and guide the user of this functionality.
>>> 
>>> 
>>> 
>>> Regarding to the proposal of generic scarce resources, do you have any
>>>> thoughts on this? I can see that giving framework developers the options
>>>> of
>>>> define scarce resources may bring trouble to mesos, it is better to let
>>>> mesos define those scarce but not framework developer.
>>>> 
>>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message