hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2359) Application is hung without timeout and retry after DNS/network is down.
Date Wed, 06 Aug 2014 02:04:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087117#comment-14087117
] 

Hadoop QA commented on YARN-2359:
---------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12660000/YARN-2359.002.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified
test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4526//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4526//console

This message is automatically generated.

> Application is hung without timeout and retry after DNS/network is down. 
> -------------------------------------------------------------------------
>
>                 Key: YARN-2359
>                 URL: https://issues.apache.org/jira/browse/YARN-2359
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>            Priority: Critical
>         Attachments: YARN-2359.000.patch, YARN-2359.001.patch, YARN-2359.002.patch
>
>
> Application is hung without timeout and retry after DNS/network is down. 
> It is because right after the container is allocated for the AM, the DNS/network is down
for the node which has the AM container.
> The application attempt is at state RMAppAttemptState.SCHEDULED, it receive RMAppAttemptEventType.CONTAINER_ALLOCATED
event, because the IllegalArgumentException(due to DNS error) happened, it stay at state RMAppAttemptState.SCHEDULED.
In the state machine, only two events will be processed at this state:
> RMAppAttemptEventType.CONTAINER_ALLOCATED and RMAppAttemptEventType.KILL.
> The code didn't handle the event(RMAppAttemptEventType.CONTAINER_FINISHED) which will
be generated when the node and container timeout. So even the node is removed, the Application
is still hung in this state RMAppAttemptState.SCHEDULED.
> The only way to make the application exit this state is to send RMAppAttemptEventType.KILL
event which will only be generated when you manually kill the application from Job Client
by forceKillApplication.
> To fix the issue, we should add an entry in the state machine table to handle RMAppAttemptEventType.CONTAINER_FINISHED
event at state RMAppAttemptState.SCHEDULED
> add the following code in StateMachineFactory:
> {code}.addTransition(RMAppAttemptState.SCHEDULED, 
>           RMAppAttemptState.FINAL_SAVING,
>           RMAppAttemptEventType.CONTAINER_FINISHED,
>           new FinalSavingTransition(
>             new AMContainerCrashedBeforeRunningTransition(), 
>             RMAppAttemptState.FAILED)){code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message