hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1024) Define a virtual core unambigiously
Date Tue, 13 Aug 2013 23:04:48 GMT

    [ https://issues.apache.org/jira/browse/YARN-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738985#comment-13738985
] 

Sandy Ryza commented on YARN-1024:
----------------------------------

I've been thinking a lot about this, and wanted to propose a modified approach, inspired by
an offline discussion with Arun and his max-vcores idea (https://issues.apache.org/jira/browse/YARN-1024?focusedCommentId=13730074&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13730074).

First, my assumptions about how CPUs work:
* A CPU is essentially a bathtub full of processing power that can be doled out to threads,
with a limit per thread based on the power of each core within it.
* To give X processing power to a thread means that within a standard unit of time, roughly
some number of instructions proportional to X can be executed for that thread. 
* No more than a certain amount of processing power (the amount of processing power per core)
can be given to each thread.
* We can use CGroups to say that a task gets some fraction of the system's processing power.
* This means that if we have 5 cores with Y processing power each, we can give 5 threads Y
processing power each, or 6 threads 5Y/6 processing power each, but we can't give 4 threads
5Y/4 processing power each.
* It never makes sense to use CGroups assign a higher fraction of the system's processing
power than (numthreads the task can take advantage of / number of cores) to a task.
* Equivalently, if my CPU has X processing power per core, it never makes sense to assign
more than (numthreads the task can take advantage of) * X processing power to a task.

So as long as we account for that last constraint, we can essentially view processing power
as a fluid resource like memory.  With this in mind, we can:
1. Split virtual cores into cores and yarnComputeUnitsPerCore.  Requests can include both
and nodes can be configured with both.
2. Have a cluster-defined maxComputeUnitsPerCore, which would be the smallest yarnComputeUnitsPerCore
on any node.  We min all yarnComputeUnitsPerCore requests with this number when they hit the
RM.
3. Use YCUs, not cores, for scheduling.  I.e. the scheduler thinks of a node's CPU capacity
in terms of the number of YCUs it can handle and thinks of a resource's CPU request in terms
of its (normalized yarnComputeUnitsPerCore * # cores).  We use YCUs for DRF.
4. If we make YCUs small enough, no need for fractional anything.

This reduces to a number-of-cores-based approach if all containers are requested with yarnComputeUnitsPerCore=infinity,
and reduces to a YCU approach if maxComputeUnitsPerCore is set to infinity.  Predictability,
simplicity, and scheduling flexibility can be traded off per cluster without overloading the
same concept with multiple definitions.

This doesn't take into account heteregeneous hardware within a cluster, but I think (2) can
be tweaked to handle this by holding a value for each node  (can elaborate on how this would
work).  It also doesn't take into account pinning threads to CPUs, but I don't think it's
any less extensible for ultimately dealing with this than other proposals.

Sorry for the longwindedness.  Bobby, would this provide the flexibility you're looking for?
                
> Define a virtual core unambigiously
> -----------------------------------
>
>                 Key: YARN-1024
>                 URL: https://issues.apache.org/jira/browse/YARN-1024
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>
> We need to clearly define the meaning of a virtual core unambiguously so that it's easy
to migrate applications between clusters.
> For e.g. here is Amazon EC2 definition of ECU: http://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it
> Essentially we need to clearly define a YARN Virtual Core (YVC).
> Equivalently, we can use ECU itself: *One EC2 Compute Unit provides the equivalent CPU
capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message