hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3228) MR AM hangs when one node goes bad
Date Fri, 28 Oct 2011 13:48:41 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138368#comment-13138368

Hudson commented on MAPREDUCE-3228:

Integrated in Hadoop-Hdfs-trunk #846 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/846/])
    MAPREDUCE-3228. Fixed MR AM to timeout RPCs to bad NodeManagers. Contributed by Vinod
K V.

acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1189879
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFail.java

> MR AM hangs when one node goes bad
> ----------------------------------
>                 Key: MAPREDUCE-3228
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3228
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>         Attachments: MAPREDUCE-3228-20111020.txt, MAPREDUCE-3228-20111027.txt
> Found this on one of the gridmix runs, again. One of the nodes went real bad, the job
had three containers running on the node. Eventually, AM marked the tasks as timedout and
initiated cleanup of the failed containers via {{stopContainer()}}. The later got stuck at
the faulty node, the tasks are stuck in FAIL_CONTAINER_CLEANUP stage and the job lies in there
waiting for ever.
> Thanks to [~Karams] for helping with this.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message