hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-4611) MR AM dies badly when Node is decomissioned
Date Thu, 30 Aug 2012 14:40:07 GMT
Robert Joseph Evans created MAPREDUCE-4611:
----------------------------------------------

             Summary: MR AM dies badly when Node is decomissioned
                 Key: MAPREDUCE-4611
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4611
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 2.0.0-alpha, 0.23.3, 3.0.0
            Reporter: Robert Joseph Evans
            Assignee: Robert Joseph Evans


The MR AM always thinks that it is being killed by the RM when it gets a kill signal and it
has not finished processing yet.  In reality the RM kill signal is only sent when the client
cannot communicate directly with the AM, which probably means that the AM is in a bad state
already.  The much more common case is that the node is marked as unhealthy or decomissioned.

I propose that in the short term the AM will only clean up if 

 # The process has been asked by the client to exit (kill)
 # The process job has finished cleanly and is exiting already
 # This is that last retry of the AM retries.

The downside here is that the .staging directory will be leaked and the job will not show
up in the history server on an kill from the RM in some cases.

At least until the full set of AM cleanup issues can be addressed, probably as part of MAPREDUCE-4428

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message