mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Klues (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MESOS-7375) provide additional insight for framework developers re: GPU_RESOURCES capability
Date Tue, 25 Apr 2017 21:24:04 GMT

    [ https://issues.apache.org/jira/browse/MESOS-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983645#comment-15983645
] 

Kevin Klues edited comment on MESOS-7375 at 4/25/17 9:23 PM:
-------------------------------------------------------------

The flag you are thinking of is {{\-\-allocator_fairness_excluded_resource_names}} (i.e. you
can set it as {{\-\-allocator_fairness_excluded_resource_names=gpus}}).

Regarding motivation for the GPU_RESOURCES capability-- here is an excerpt from an email I
sent out recently:
{noformat}
Ideally, marathon (and any other frameworks -- SDK include) should do some sort of preferential
scheduling when they opt-in to use GPUs.  That is, they should *prefer* to run GPU jobs on
GPU machines and non-GPU jobs on non-GPU machines (falling back to running them on GPU machines
only if that is all that is available).

Additionally, we need a way for an operator to indicate whether GPUs are a scarce resource
in their cluster or not. We have a flag in mesos that allows us to set this ( `--allocator_fairness_excluded_resource_names=gpus`),
but we don't yet have a way of setting this through DC/OS. If we don't set this flag, we run
the risk of Mesos's DRF algorithm choosing to very rarely send out offers from GPU machines
once the first GPU job has been launched on them.

As a concrete example, imagine you have a machine with only 1 GPU and you launch a task that
consumes it -- from DRF's perspective that node now has 100% usage of one of its resources.
Even if you have 2 GPUs, and one gets consumed, DRF still thinks you have consumed 50% of
one of its resources. Out of fairness, DRF will choose not to send offers from you until some
other resource on *all* other nodes approaches 50% as well (which may take a while if you
are allocating CPUs, memory, and disk in small increments).

Right now we don't set `--allocator_fairness_excluded_resource_names=gpus` in DC/OS (but maybe
we should?). Is it the case that most DC/OS users only install GPUs on a small number of nodes
in their cluster? If so, we should consider it a scarce resource and set this flag by default.
If not, then GPUs aren't actually a scarce resource and we shouldn't be setting this flag--
DRF will perform as expected without it.
{noformat}


was (Author: klueska):
The flag you are thinking of is {{--allocator_fairness_excluded_resource_names}} (i.e. you
can set it as {{--allocator_fairness_excluded_resource_names=gpus}}).

Regarding motivation for the GPU_RESOURCES capability-- here is an excerpt from an email I
sent out recently:
{noformat}
Ideally, marathon (and any other frameworks -- SDK include) should do some sort of preferential
scheduling when they opt-in to use GPUs.  That is, they should *prefer* to run GPU jobs on
GPU machines and non-GPU jobs on non-GPU machines (falling back to running them on GPU machines
only if that is all that is available).

Additionally, we need a way for an operator to indicate whether GPUs are a scarce resource
in their cluster or not. We have a flag in mesos that allows us to set this ( `--allocator_fairness_excluded_resource_names=gpus`),
but we don't yet have a way of setting this through DC/OS. If we don't set this flag, we run
the risk of Mesos's DRF algorithm choosing to very rarely send out offers from GPU machines
once the first GPU job has been launched on them.

As a concrete example, imagine you have a machine with only 1 GPU and you launch a task that
consumes it -- from DRF's perspective that node now has 100% usage of one of its resources.
Even if you have 2 GPUs, and one gets consumed, DRF still thinks you have consumed 50% of
one of its resources. Out of fairness, DRF will choose not to send offers from you until some
other resource on *all* other nodes approaches 50% as well (which may take a while if you
are allocating CPUs, memory, and disk in small increments).

Right now we don't set `--allocator_fairness_excluded_resource_names=gpus` in DC/OS (but maybe
we should?). Is it the case that most DC/OS users only install GPUs on a small number of nodes
in their cluster? If so, we should consider it a scarce resource and set this flag by default.
If not, then GPUs aren't actually a scarce resource and we shouldn't be setting this flag--
DRF will perform as expected without it.
{noformat}

> provide additional insight for framework developers re: GPU_RESOURCES capability
> --------------------------------------------------------------------------------
>
>                 Key: MESOS-7375
>                 URL: https://issues.apache.org/jira/browse/MESOS-7375
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation
>            Reporter: James DeFelice
>              Labels: mesosphere
>
> On clusters where all nodes are equal and every node has a GPU, frameworks that **don't**
opt-in to the `GPU_RESOURCES` capability won't get any offers. This is surprising for operators.
> Even when a framework doesn't **need** GPU resources, it may make sense for a framework
scheduler to provide a `--gpu-cluster-compat` (or similar) flag that results in the framework
advertising the `GPU_RESOURCES` capability even though it does not intend to consume any GPU.
The effect being that said framework will now receive offers on clusters where all nodes have
GPU resources.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message