mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guangya Liu <gyliu...@gmail.com>
Subject Re: [GPU] [Allocation] "Scarce" Resource Allocation
Date Thu, 16 Jun 2016 12:33:27 GMT
Thanks Joris, sorry, I forgot the case when the scarce resources was also
requested by quota.

But after a second thought, not only quota, but also reserved resources,
revocable resources can also be scarce resources, we may need to handle all
of those cases.

I think that in the future, the allocator should allocate resources as this:
1) Allocate resources for quota.
2) Allocate reserved resources
3) Allocate revocable resources - After "revocable by default" project, I
think that we will only have reserved resources and revocable resources.

So we will need three steps to allocate all resources based on above
analysis, but after introduced scarce resources, we need to split all of
above three kind resource to two: one is scare and the other is non scarce.

Then there should be six sorters:
1) quota non scarce sorter
2) non scarce reserved sorter
3) non scarce revocable sorter
4) quota scarce sorter
5) scarce reserved sorter
6) scarce revocable sorter

Since there are not too many hosts have scarce resources, so the last three
sorter for scarce resources may not impact performance much, comments?

Thanks,

Guangya

On Thu, Jun 16, 2016 at 7:30 PM, Joris Van Remoortere <joris@mesosphere.io>
wrote:

> With this 4th sorter approach, how does quota work for scarce resources?
>
> —
> *Joris Van Remoortere*
> Mesosphere
>
> On Thu, Jun 16, 2016 at 11:26 AM, Guangya Liu <gyliu513@gmail.com> wrote:
>
> > Hi Ben,
> >
> > The pre-condition for four stage allocation is that we need to put
> > different resources to different sorters:
> >
> > 1) roleSorter only include non scarce resources.
> > 2) quotaRoleSorter only include non revocable & non scarce resources.
> > 3) revocableSorter only include revocable & non scarce resources. This
> will
> > be handled in MESOS-4923 <
> https://issues.apache.org/jira/browse/MESOS-4923
> > >
> > 4) scarceSorter only include scarce resources.
> >
> > Take your case above:
> > 999 agents with (cpus:4,mem:1024,disk:1024)
> > 1 agent with (gpus:1,cpus:4,mem:1024,disk:1024)
> >
> > The four sorters would be:
> > 1) roleSorter include 1000 agents with (cpus:4,mem:1024,disk:1024)
> > 2) quotaRoleSorter include 1000 agents with (cpus:4,mem:1024,disk:1024)
> > 3) revocableSorter include nothing as I have no revocable resources here.
> > 4) scarceSorter include 1 agent with (gpus:1)
> >
> > When allocate resources, even if a role got the agent with gpu resources,
> > its share will only be counter by scarceSorter but not other sorters, and
> > will not impact other sorters.
> >
> > The above solution is actually kind of enhancement to "exclude scarce
> > resources" as the scarce resources also obey the DRF algorithm with this.
> >
> > This solution can be also treated as diving the whole resources pool
> > logically to scarce and non scarce resource pool. 1), 2) and 3) will
> handle
> > non scarce resources while 4) focus on scarce resources.
> >
> > Thanks,
> >
> > Guangya
> >
> > On Thu, Jun 16, 2016 at 2:10 AM, Benjamin Mahler <bmahler@apache.org>
> > wrote:
> >
> > > Hm.. can you expand on how adding another allocation stage for only
> > scarce
> > > resources would behave well? It seems to have a number of problems
> when I
> > > think through it.
> > >
> > > On Sat, Jun 11, 2016 at 7:59 AM, Guangya Liu <gyliu513@gmail.com>
> wrote:
> > >
> > >> Hi Ben,
> > >>
> > >> For long term goal, instead of creating sub-pool, what about adding a
> > new
> > >> sorter to handle **scare** resources? The current logic in allocator
> was
> > >> divided to two stages: allocation for quota, allocation for non quota
> > >> resources.
> > >>
> > >> I think that the future logic in allocator would be divided to four
> > >> stages:
> > >> 1) allocation for quota
> > >> 2) allocation for reserved resources
> > >> 3) allocation for revocable resources
> > >> 4) allocation for scare resources
> > >>
> > >> Thanks,
> > >>
> > >> Guangy
> > >>
> > >> On Sat, Jun 11, 2016 at 10:50 AM, Benjamin Mahler <bmahler@apache.org
> >
> > >> wrote:
> > >>
> > >>> I wanted to start a discussion about the allocation of "scarce"
> > >>> resources. "Scarce" in this context means resources that are not
> > present on
> > >>> every machine. GPUs are the first example of a scarce resource that
> we
> > >>> support as a known resource type.
> > >>>
> > >>> Consider the behavior when there are the following agents in a
> cluster:
> > >>>
> > >>> 999 agents with (cpus:4,mem:1024,disk:1024)
> > >>> 1 agent with (gpus:1,cpus:4,mem:1024,disk:1024)
> > >>>
> > >>> Here there are 1000 machines but only 1 has GPUs. We call GPUs a
> > >>> "scarce" resource here because they are only present on a small
> > percentage
> > >>> of the machines.
> > >>>
> > >>> We end up with some problematic behavior here with our current
> > >>> allocation model:
> > >>>
> > >>>     (1) If a role wishes to use both GPU and non-GPU resources for
> > >>> tasks, consuming 1 GPU will lead DRF to consider the role to have a
> > 100%
> > >>> share of the cluster, since it consumes 100% of the GPUs in the
> > cluster.
> > >>> This framework will then not receive any other offers.
> > >>>
> > >>>     (2) Because we do not have revocation yet, if a framework decides
> > to
> > >>> consume the non-GPU resources on a GPU machine, it will prevent the
> GPU
> > >>> workloads from running!
> > >>>
> > >>> --------
> > >>>
> > >>> I filed an epic [1] to track this. The plan for the short-term is to
> > >>> introduce two mechanisms to mitigate these issues:
> > >>>
> > >>>     -Introduce a resource fairness exclusion list. This allows the
> > >>> shares of resources like "gpus" to be excluded from the dominant
> share.
> > >>>
> > >>>     -Introduce a GPU_AWARE framework capability. This indicates that
> > the
> > >>> scheduler is aware of GPUs and will schedule tasks accordingly. Old
> > >>> schedulers will not have the capability and will not receive any
> > offers for
> > >>> GPU machines. If a scheduler has the capability, we'll advise that
> they
> > >>> avoid placing their additional non-GPU workloads on the GPU machines.
> > >>>
> > >>> --------
> > >>>
> > >>> Longer term, we'll want a more robust way to manage scarce resources.
> > >>> The first thought we had was to have sub-pools of resources based on
> > >>> machine profile and perform fair sharing / quota within each pool.
> This
> > >>> addresses (1) cleanly, and for (2) the operator needs to explicitly
> > >>> disallow non-GPU frameworks from participating in the GPU pool.
> > >>>
> > >>> Unfortunately, by excluding non-GPU frameworks from the GPU pool we
> may
> > >>> have a lower level of utilization. In the even longer term, as we add
> > >>> revocation it will be possible to allow a scheduler desiring GPUs to
> > revoke
> > >>> the resources allocated to the non-GPU workloads running on the GPU
> > >>> machines. There are a number of things we need to put in place to
> > support
> > >>> revocation ([2], [3], [4], etc), so I'm glossing over the details
> here.
> > >>>
> > >>> If anyone has any thoughts or insight in this area, please share!
> > >>>
> > >>> Ben
> > >>>
> > >>> [1] https://issues.apache.org/jira/browse/MESOS-5377
> > >>> [2] https://issues.apache.org/jira/browse/MESOS-5524
> > >>> [3] https://issues.apache.org/jira/browse/MESOS-5527
> > >>> [4] https://issues.apache.org/jira/browse/MESOS-4392
> > >>>
> > >>
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message