hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2888) Corrective mechanisms for rebalancing NM container queues
Date Thu, 05 May 2016 21:15:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273104#comment-15273104

Arun Suresh commented on YARN-2888:

Thanks for the review [~kkaranasos].

I agree with most of your comments and I have addressed them in the latest patch. For the

bq. Rename ContainerQueuingLimit* to NMQueuingLimit*?
Hmmm... I prefer to keep it as ContainerQueuingLimit, since it is a struct that is part of
the NM heartbeat response.. which establishes the 'NM' aspect of it and 'ContainerQueuing'
more explicitly expresses the fact that we are queuing containers.

bq. Why is it needed to change the return type of getContainerManager() to ContainerManager
With this patch, we need to set the queuing limit etc on the ContainerManager. One option
is to introduce the setter etc. method into the Protocol, where I don't think it belongs,
since it is a property of the ContainerManager entity, not the protocol. Another option is
to type cast the return type into the QueuingContainerManagerImpl, which does not seem clean
either. Given all this and considering that we have multiple implementations of the ContainerManager,
I felt this seemed cleaner.

bq. In pruneOpportunisticContainerQueue(), let's use the same logic/code as in the stopContainerInternal()..
I feel this is code patch is a bit simpler.. so Id prefer to leave it as it is.. But yes,
I have changed the variable names and method name for better clarity

In {{QueueLimitCalculator}}
* Ive removed median
* The calculations are now independent of the size of k

> Corrective mechanisms for rebalancing NM container queues
> ---------------------------------------------------------
>                 Key: YARN-2888
>                 URL: https://issues.apache.org/jira/browse/YARN-2888
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Konstantinos Karanasos
>            Assignee: Arun Suresh
>         Attachments: YARN-2888-yarn-2877.001.patch, YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch,
YARN-2888.004.patch, YARN-2888.005.patch
> Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of the scheduling
decisions or due to having a stale image of the system) may lead to an imbalance in the waiting
times of the NM container queues. This can in turn have an impact in job execution times and
cluster utilization.
> To this end, we introduce corrective mechanisms that may remove (whenever needed) container
requests from overloaded queues, adding them to less-loaded ones.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message