hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yufei Gu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5774) MR Job stuck in ACCEPTED status without any progress in Fair Scheduler if set yarn.scheduler.minimum-allocation-mb to 0.
Date Wed, 09 Nov 2016 23:52:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652426#comment-15652426
] 

Yufei Gu commented on YARN-5774:
--------------------------------

Thanks [~templedf] for the review. 
In CS, minimum share will serve as an increment share and it cannot be zero, which is guaranteed
by the CS sanity check. FIFO doesn't have sanity check. I guess it is because of nobody cares
about it. We can definitely add sanity check for FIFO scheduler in this JIRA or followup JIRA.
So that's fine for CS, FIFO and FS.
The real tricky part is in common parts of scheduler(or RM). People who write the code in
common parts might not even notice there is an increment share config because CS and FIFO
don't have it and FS has it. That is how the issue happens in the very beginning.
This patch let {{normalize()}} throw a runtime exception if increment is 0, which no need
to catch and handle. It will fail the RM when it happens. The main reason is that we should
consider 0 increment as an invalid configuration according to the offline discussion with
[~kasha].


> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler if set yarn.scheduler.minimum-allocation-mb
to 0.
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-5774
>                 URL: https://issues.apache.org/jira/browse/YARN-5774
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Yufei Gu
>            Assignee: Yufei Gu
>              Labels: oct16-easy
>         Attachments: YARN-5774.001.patch, YARN-5774.002.patch, YARN-5774.003.patch, YARN-5774.004.patch
>
>
> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler because there
is no resource request for the AM. This happened when you configure {{yarn.scheduler.minimum-allocation-mb}}
to zero.
> The problem is in the code used by both Capacity Scheduler and Fair Scheduler. {{scheduler.increment-allocation-mb}}
is a concept in FS, but not CS. So the common code in class RMAppManager passes the {{yarn.scheduler.minimum-allocation-mb}}
as incremental one because there is no incremental one for CS when it tried to normalize the
resource requests.
> {code}
>      SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
>           scheduler.getClusterResource(),
>           scheduler.getMinimumResourceCapability(),
>           scheduler.getMaximumResourceCapability(),
>           scheduler.getMinimumResourceCapability());  --> incrementResource should
be passed here.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message