hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Templeton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5774) MR Job stuck in ACCEPTED status without any progress in Fair Scheduler if set yarn.scheduler.minimum-allocation-mb to 0.
Date Wed, 09 Nov 2016 20:30:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651964#comment-15651964
] 

Daniel Templeton commented on YARN-5774:
----------------------------------------

bq. So if users misconfigure the increment resource in fair scheduler, a detailed error message
will show up.

That's great for fair scheduler, but what about CS and FIFO?  I'm particularly worried about
those because an increment of 0 was not previously treated as invalid.  What's the intent
of throwing the exception in {{normalize()}}?  You throw an exception when you want to halt
the execution flow and allow some out-of-sequence remedial code to run.  In this case, no
one is expecting to see this exception, so no one will catch it, and there's no action that
needs to be taken in the CS and FIFO cases.  The net result is that things will fail in indirect
ways.  Since it's not a failure for CS and FIFO, I don't think you should throw the exception.
 In the case of FS, as you point out, the only reason to hit the exception is misuse of the
resource calculator.

> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler if set yarn.scheduler.minimum-allocation-mb
to 0.
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-5774
>                 URL: https://issues.apache.org/jira/browse/YARN-5774
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Yufei Gu
>            Assignee: Yufei Gu
>              Labels: oct16-easy
>         Attachments: YARN-5774.001.patch, YARN-5774.002.patch, YARN-5774.003.patch, YARN-5774.004.patch
>
>
> MR Job stuck in ACCEPTED status without any progress in Fair Scheduler because there
is no resource request for the AM. This happened when you configure {{yarn.scheduler.minimum-allocation-mb}}
to zero.
> The problem is in the code used by both Capacity Scheduler and Fair Scheduler. {{scheduler.increment-allocation-mb}}
is a concept in FS, but not CS. So the common code in class RMAppManager passes the {{yarn.scheduler.minimum-allocation-mb}}
as incremental one because there is no incremental one for CS when it tried to normalize the
resource requests.
> {code}
>      SchedulerUtils.normalizeRequest(amReq, scheduler.getResourceCalculator(),
>           scheduler.getClusterResource(),
>           scheduler.getMinimumResourceCapability(),
>           scheduler.getMaximumResourceCapability(),
>           scheduler.getMinimumResourceCapability());  --> incrementResource should
be passed here.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message