hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-972) Allow requests and scheduling for fractional virtual cores
Date Thu, 01 Aug 2013 14:07:51 GMT

    [ https://issues.apache.org/jira/browse/YARN-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726452#comment-13726452
] 

Jason Lowe commented on YARN-972:
---------------------------------

bq. What could work is for a YARN app to be able to say "IO intensive, | CPU intensive | Net
intensive" when requesting a node and have that used as a hint in the schedulers. So AW can
deploy Giraph nodes that are CPU & Net hungry, and the scheduler will know that some IO
heavy work can also go there, but not other Net-heavy code. 

I think the issue is to what degree something is IO/CPU/Net intensive.  At some point the
scheduler needs to decide whether to hold off scheduling something on a node or to go ahead
and schedule it.  If a node is running one task that is marked CPU-intensive, should the scheduler
avoid placing any other CPU intensive tasks on that node?  The real answer is it depends.
 Some CPU-intensive tasks are going to fully consume one logical core, others will consume
multiple cores, and yet others might fully consume the node's CPU resources.  If the scheduler
plays it safe then cluster utilization is likely to be very poor in the average case.  That's
why I think in practice the scheduler needs at least some degree of utilization to make good
decisions.  Then it becomes a problem of properly representing that degree to the scheduler
which leads to the vcore debate.

Without some kind of quantitative hint for resource utilization then I think the scheduler
would either have to rely on a feedback mechanism where it schedules tasks then monitors the
utilization reported back by the node to decide whether to gamble on scheduling another task
on that node.  Preemption/task migration (which probably involves killing and restarting the
task from scratch in the worst case) may need to be employed if the utilization shows the
scheduler made a bad decision and should correct it.  This could work reasonably well for
tasks with fairly constant resource utilization throughout their lifetime, but it could turn
ugly if tasks are "bursty" in their utilization and multiple tasks happen to surge simultaneously
long after they've been started.
                
> Allow requests and scheduling for fractional virtual cores
> ----------------------------------------------------------
>
>                 Key: YARN-972
>                 URL: https://issues.apache.org/jira/browse/YARN-972
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: api, scheduler
>    Affects Versions: 2.0.5-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>
> As this idea sparked a fair amount of discussion on YARN-2, I'd like to go deeper into
the reasoning.
> Currently the virtual core abstraction hides two orthogonal goals.  The first is that
a cluster might have heterogeneous hardware and that the processing power of different makes
of cores can vary wildly.  The second is that a different (combinations of) workloads can
require different levels of granularity.  E.g. one admin might want every task on their cluster
to use at least a core, while another might want applications to be able to request quarters
of cores.  The former would configure a single vcore per core.  The latter would configure
four vcores per core.
> I don't think that the abstraction is a good way of handling the second goal.  Having
a virtual cores refer to different magnitudes of processing power on different clusters will
make the difficult problem of deciding how many cores to request for a job even more confusing.
> Can we not handle this with dynamic oversubscription?
> Dynamic oversubscription, i.e. adjusting the number of cores offered by a machine based
on measured CPU-consumption, should work as a complement to fine-granularity scheduling. 
Dynamic oversubscription is never going to be perfect, as the amount of CPU a process consumes
can vary widely over its lifetime.  A task that first loads a bunch of data over the network
and then performs complex computations on it will suffer if additional CPU-heavy tasks are
scheduled on the same node because its initial CPU-utilization was low.  To guard against
this, we will need to be conservative with how we dynamically oversubscribe.  If a user wants
to explicitly hint to the scheduler that their task will not use much CPU, the scheduler should
be able to take this into account.
> On YARN-2, there are concerns that including floating point arithmetic in the scheduler
will slow it down.  I question this assumption, and it is perhaps worth debating, but I think
we can sidestep the issue by multiplying CPU-quantities inside the scheduler by a decently
sized number like 1000 and keep doing the computations on integers.
> The relevant APIs are marked as evolving, so there's no need for the change to delay
2.1.0-beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message