hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2) Enhance CS to schedule accounting for both memory and cpu cores
Date Wed, 17 Oct 2012 15:42:06 GMT

    [ https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477965#comment-13477965

Robert Joseph Evans commented on YARN-2:

Arun, I still disagree with the #cores being an int.  

What does requesting 1 CPU really mean and how is it different from requesting 1.8?  To me
1 CPU means that for this particular container I want to be guaranteed that it gets at least
1 full CPU core to itself for computation at any point in time it needs it, very similar to
what requesting 3000MB of memory does.  It is a bit more ambiguous because 1 CPU on box A
is not necessarily equivalent to 1 CPU on box B. But this JIRA already makes the assumption
that they are close enough to being equivalent.  It gives me as a user of the container a
chance to set a lower bound on the amount of resources that I am guaranteed to be able to
use.  In practice this probably means that the kernel will give at least X% of the available
CPU time to the processes running in that container, if those processes are runnable, where
X = CPU requested/Total CPU cores on the box.

1.8 CPUs to me means a few things.  First the person requesting this was either a machine
or was overly ambitious in trying to get an exact value.  Second the container will probably
get 2 CPU cores, because just like with memory I would expect the scheduler to round it up
to the nearest multiple of a scheduling unit.  I proposed initially that quarter or even half
CPU marks are probably sufficient.  We can always round up and remove precision with a float.
 It is very hard to go back the other way though and add precision to an int.  I am fine with
the first go around the CPU number is in float and the scheduling unit is 1 CPU. I just want
the door left open so we can easily adjust things if we find a need to.

Over-subscribing makes since but it also has a lot of pitfalls.  You have to take into account
that resource utilization is not constant.  A process can use very little of a resource and
then all of a sudden it starts to use lots of that resource.  Is the Resource request a guarantee
of those resources, or is it just a good effort to provide those resources?  I see situations
where users would what both, and perhaps if we do support over-subscribing we need to support
something like nice on POSIX.
> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>                 Key: YARN-2
>                 URL: https://issues.apache.org/jira/browse/YARN-2
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: capacityscheduler, scheduler
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 2.0.3-alpha
>         Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch,
MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch,
YARN-2-help.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch
> With YARN being a general purpose system, it would be useful for several applications
(MPI et al) to specify not just memory but also CPU (cores) for their resource requirements.
Thus, it would be useful to the CapacityScheduler to account for both.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message