hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
Date Thu, 26 Feb 2015 18:13:06 GMT

    [ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338830#comment-14338830

Karthik Kambatla commented on YARN-3231:

Thanks for reporting and working on this, [~l201514]. The approach looks generally good. Few
comments (some nits):
# Rename {{updateRunnabilityonRefreshQueues}} to {{updateRunnabilityOnReload}}? And, add a
javadoc for when it should be called and what it does.
# javadoc for the newly added private method and the significance of the new integer param.
# Call the above method from AllocationReloadListner#onReload after all the other queue configs
are updated.
# The comment here no longer applies. Remove it? 
        // No more than one app per list will be able to be made runnable, so
        // we can stop looking after we've found that many
        if (noLongerPendingApps.size() >= maxRunnableApps) {
# Indentation:
# Newly added tests:
## If it is not too much trouble, can we move them to a new test class (TestAppRunnability?)
mostly because TestFairScheduler has so many tests already. 
## Is it possible to reuse the code between these tests? 
## Should we add tests for when the maxRunnableApps for a user or queue is decreased? If you
think this might need additional work in the logic as well, I am open to filing a follow up
JIRA and addressing it there. 

> FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
> --------------------------------------------------------------------------------------
>                 Key: YARN-3231
>                 URL: https://issues.apache.org/jira/browse/YARN-3231
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Siqi Li
>            Assignee: Siqi Li
>            Priority: Critical
>         Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch
> When a queue is piling up with a lot of pending jobs due to the maxRunningApps limit.
We want to increase this property on the fly to make some of the pending job active. However,
once we increase the limit, all pending jobs were not assigned any resource, and were stuck

This message was sent by Atlassian JIRA

View raw message