hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-972) Allow requests and scheduling for fractional virtual cores
Date Wed, 31 Jul 2013 23:33:50 GMT

    [ https://issues.apache.org/jira/browse/YARN-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725837#comment-13725837
] 

Sandy Ryza commented on YARN-972:
---------------------------------

I should have probably tried to be more clear about what I think the goals of virtual cores
are from a more zoomed-out perspective before arguing about their specifics.

The problems we have been considering solving with virtual cores are:
1. "Many of the jobs on my cluster are computational simulations that use many threads per
task.  Many of the other jobs on my cluster are distcp's that are primarily I/O bound.  Many
of the jobs on my cluster are MapReduce that do something like apply a transformation to text,
which are single-threaded, but can saturate a core.  How can we schedule these to maximize
utilization and minimize harmful interference?" 

2. "I recently added machines with more or beefier CPUs to my cluster.  I would like to run
more concurrent tasks on these machines than on other machines."

3. "I recently added machines with more or beefier CPUs to my cluster.  I would like my jobs
to run at predictable speeds."

4. "CPUs vary widely in the world, but I would like to be able to take my job to another cluster
and have it run at a similar speed."

I think (1) is the main problem we should be trying to solve.  (2) is also important, and
much easier to think about when the new machines have a higher number of cores, but not substantially
more powerful cores.  Luckily, the trend is towards more cores per machine, not more powerful
cores.  I think we should not be trying to solve (3) and (4). There are too many variables,
the real-world utility is too small, and the goals are unrealistic. The features proposed
in YARN-796 are better approaches to handling this.

To these ends, here is how think resource configurations should be used:

A task should request virtual cores equal to the number of cores it thinks it can saturate.
 A task that runs in a single thread, no matter how CPU-intensive it is, should request a
single virtual core.  A task that is inherently I/O-bound, like a distcp or simple grep, should
request less than a single virtual core.  A task that can take advantage of multiple threads
should request a number of cores equal to the number of threads it intends to take advantage
of.

NodeManagers should be configured with virtual cores equal to the number of physical cores
on the node.  If the speed of a aingle core varies widely within a cluster (maybe by a factor
of two or more), an administrator can consider configuing more virtual cores than physical
cores on the faster nodes, with the acknowledgement that task performance will still not be
predictable.

Virtual cores should not be used as a proxy for other resources, such as disk I/O or network
I/O.  We should ultimately add in disk I/O and possibly network I/O as another first-class
resource, but in the mean time a config to limit the number of containers per node seems doesn't
seem unreasonable. 

As Arun points out, we can realize this vision equivalently by saying that one physical core
is always equal to 1000 virtual cores.  However, to me this seems like an unnecessary layer
of indirection for the user, and obscures the fact that virtual cores are meant to model parallelism
before processing power.  If our only reason for considering this is perfomance, we should
and can handle this internally.  I am not obstinately opposed to going this route, but if
we do I think a name like "core thousandths" would be more clear.

Thoughts?
                
> Allow requests and scheduling for fractional virtual cores
> ----------------------------------------------------------
>
>                 Key: YARN-972
>                 URL: https://issues.apache.org/jira/browse/YARN-972
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: api, scheduler
>    Affects Versions: 2.0.5-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>
> As this idea sparked a fair amount of discussion on YARN-2, I'd like to go deeper into
the reasoning.
> Currently the virtual core abstraction hides two orthogonal goals.  The first is that
a cluster might have heterogeneous hardware and that the processing power of different makes
of cores can vary wildly.  The second is that a different (combinations of) workloads can
require different levels of granularity.  E.g. one admin might want every task on their cluster
to use at least a core, while another might want applications to be able to request quarters
of cores.  The former would configure a single vcore per core.  The latter would configure
four vcores per core.
> I don't think that the abstraction is a good way of handling the second goal.  Having
a virtual cores refer to different magnitudes of processing power on different clusters will
make the difficult problem of deciding how many cores to request for a job even more confusing.
> Can we not handle this with dynamic oversubscription?
> Dynamic oversubscription, i.e. adjusting the number of cores offered by a machine based
on measured CPU-consumption, should work as a complement to fine-granularity scheduling. 
Dynamic oversubscription is never going to be perfect, as the amount of CPU a process consumes
can vary widely over its lifetime.  A task that first loads a bunch of data over the network
and then performs complex computations on it will suffer if additional CPU-heavy tasks are
scheduled on the same node because its initial CPU-utilization was low.  To guard against
this, we will need to be conservative with how we dynamically oversubscribe.  If a user wants
to explicitly hint to the scheduler that their task will not use much CPU, the scheduler should
be able to take this into account.
> On YARN-2, there are concerns that including floating point arithmetic in the scheduler
will slow it down.  I question this assumption, and it is perhaps worth debating, but I think
we can sidestep the issue by multiplying CPU-quantities inside the scheduler by a decently
sized number like 1000 and keep doing the computations on integers.
> The relevant APIs are marked as evolving, so there's no need for the change to delay
2.1.0-beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message