hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers
Date Wed, 06 Jul 2016 21:02:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365070#comment-15365070
] 

Jason Lowe commented on YARN-5215:
----------------------------------

Maybe I'm missing something, but any of the proposed approaches has YARN assuming it can leverage
the unused resources on the node.  That's sort of the whole point, we want YARN to use those
unused resources rather than just hard-partitioning the node between YARN and the other system.
 Some of the approaches start with the assumption that the whole node belongs to YARN and
YARN will scale back usage of the node based on utilization feedback, while other approaches
start with YARN assuming it has a smaller portion of the node and can reach beyond it when
utilization is low.  It's the same scenario from two perspectives.

IIUC any of these approaches can react relatively quickly to the other workload's demands
by having the nodemanager take action directly (by preempting containers) when the periodically
monitored node utilization goes above some configured limit.  The original proposal in this
JIRA doesn't do that, which means it won't be super-responsive to the other subsystem.   The
RM won't allocate any additional containers when the utilization gets high, but some of the
containers would have to exit on their own before YARN's existing utilization would decrease.
 It sounds like the version Inigo has deployed in production does do some sort of preemption,
but it sounded like it was coming from the RM rather than the NM which would be slightly slower
response time than if the NM did it directly.

If the latency demands of the other workload are so severe that it's impossible for YARN to
react quickly enough then I don't see how YARN can leverage those resources when they are
unused.  We'd have to resort to some kind of hard-partitioning (either giving the nodemanager
less resources than the node actually has or using proxy containers in YARN on behalf of the
other workload to reserve the resources) and live with the underutilization of those resources
when the other workload is idle.

> Scheduling containers based on external load in the servers
> -----------------------------------------------------------
>
>                 Key: YARN-5215
>                 URL: https://issues.apache.org/jira/browse/YARN-5215
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Inigo Goiri
>         Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the resources.
The proposal is to use the utilization information in the node and the containers to estimate
how much is consumed by external processes and schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message