hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3846) Restarted+Recovered AM hangs in some corner cases
Date Fri, 10 Feb 2012 22:52:59 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205824#comment-13205824
] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3846:
----------------------------------------------------

Sharad, I think MAPREDUCE-3802 is different even though the exception trace is the same.

What is happening here is with the second AM generation itself. For the erring task, there
are multiple attempts. One of the attempts doesn't get logged to JobHistory because the TaskAttempt
fails before launch itself. Today we log TaskAttempts and set start times only after the real
JVM launch (Do you know why? May be we can change this?). Because of this,  JobHistory knows
about, say attempts 0,1 and 3. When we replay the completed tasks, the attempt numbers take
0,1,2 and so we get the NPE.
                
> Restarted+Recovered AM hangs in some corner cases
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3846
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3846
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Critical
>         Attachments: MAPREDUCE-3846-20120210.txt
>
>
> [~karams] found this while testing AM restart/recovery feature. After the first generation
AM crashes (manually killed by kill -9), the second generation AM starts, but hangs after
a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message