hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-3846) Restarted+Recovered AM hangs in some corner cases
Date Fri, 10 Feb 2012 22:53:01 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Vinod Kumar Vavilapalli updated MAPREDUCE-3846:

    Attachment: MAPREDUCE-3846-20120210.txt

If we log all TaskAttempts (even before launch), we may perhaps avoid this, but I am not sure.
So for now, I changed the attemptsNumbers generation during recovery to first use the numbers
from previous generation and then jump after all those numbers are exhausted.

I also made sure that attempts are replayed correctly in the order of original start times,
otherwise (as my test revealed), we may be replaying in wrong order with wrong times.

The test fails without the patch and passes with.

Sharad, can you please look at the patch and see if it makes sense? Thanks in advance!
> Restarted+Recovered AM hangs in some corner cases
> -------------------------------------------------
>                 Key: MAPREDUCE-3846
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3846
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Critical
>         Attachments: MAPREDUCE-3846-20120210.txt
> [~karams] found this while testing AM restart/recovery feature. After the first generation
AM crashes (manually killed by kill -9), the second generation AM starts, but hangs after
a while.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message