hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4611) MR AM dies badly when Node is decomissioned
Date Thu, 30 Aug 2012 19:37:08 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445229#comment-13445229

Hadoop QA commented on MAPREDUCE-4611:

+1 overall.  Here are the results of testing the latest attachment 
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 2 new or modified test files.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit

    +1 core tests.  The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2792//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2792//console

This message is automatically generated.
> MR AM dies badly when Node is decomissioned
> -------------------------------------------
>                 Key: MAPREDUCE-4611
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4611
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>         Attachments: MR-4611.txt
> The MR AM always thinks that it is being killed by the RM when it gets a kill signal
and it has not finished processing yet.  In reality the RM kill signal is only sent when the
client cannot communicate directly with the AM, which probably means that the AM is in a bad
state already.  The much more common case is that the node is marked as unhealthy or decomissioned.
> I propose that in the short term the AM will only clean up if 
>  # The process has been asked by the client to exit (kill)
>  # The process job has finished cleanly and is exiting already
>  # This is that last retry of the AM retries.
> The downside here is that the .staging directory will be leaked and the job will not
show up in the history server on an kill from the RM in some cases.
> At least until the full set of AM cleanup issues can be addressed, probably as part of

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message