hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS
Date Mon, 30 Nov 2015 19:51:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032337#comment-15032337

Wangda Tan commented on YARN-3946:

[~Naganarasimha], thanks for update, some comments:

1) RMAppImpl:
When app goes to final state (FINISHED/KILLEd, etc.), should we simply set AMLaunchDiagnostics
to null?

2) SchedulerApplicationAttempt:
Why need two separate methods: updateDiagnosticsIfNotRunning/updateDiagnostics? They're a
little confusing to me, I think AM launch diagnostics should be updated only if AM container
is not running. If you think it's make sense to you, I suggest to rename/merge them to updateAMContainerDiagnostics.

3) Do you think is it better to rename AMState.PENDING to inactivated? I think "PENDING" could
mean "activated-but-not-activated" to end users (assume users don't have enough background
knownledge about scheduler).

4) Instead of setting AMLaunchDiagnostics to null when RMAppAttempt enters Scheduled state,
do you think is it better to do that in RUNNING and FINAL_SAVING state? Unmanaged AM could
skip the SCHEDULED state.

5) It will be also very usaful if you can update AM launch diagnostics when RMAppAttempt go
to LAUNCHED state, sometimes AM container allocated and sent to NM, but not sucessfully launched/registered
to RM. Currently we don't know if this happens because YarnApplicationState doesn't have a
"launched" state.

[~jianhe], could you take a look at this patch as well?

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS
> --------------------------------------------------------------------------------
>                 Key: YARN-3946
>                 URL: https://issues.apache.org/jira/browse/YARN-3946
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Sumit Nigam
>            Assignee: Naganarasimha G R
>         Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, YARN-3946.v1.002.patch,
YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, YARN-3946.v1.004.patch
> Currently there is no direct way to get the exact reason as to why a submitted app is
still in ACCEPTED state. It should be possible to know through RM REST API as to what aspect
is not being met - say, queue limits being reached, or core/ memory requirement not being
met, or AM limit being reached, etc.

This message was sent by Atlassian JIRA

View raw message