hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-3228) MR AM hangs when one node goes bad
Date Thu, 20 Oct 2011 12:13:10 GMT
MR AM hangs when one node goes bad
----------------------------------

                 Key: MAPREDUCE-3228
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3228
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: applicationmaster, mrv2
    Affects Versions: 0.23.0
            Reporter: Vinod Kumar Vavilapalli
            Priority: Blocker
             Fix For: 0.23.0


Found this on one of the gridmix runs, again. One of the nodes went real bad, the job had
three containers running on the node. Eventually, AM marked the tasks as timedout and initiated
cleanup of the failed containers via {{stopContainer()}}. The later got stuck at the faulty
node, the tasks are stuck in FAIL_CONTAINER_CLEANUP stage and the job lies in there waiting
for ever.

Thanks to [~Karams] for helping with this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message