hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5718) MR AM should tolerate RM restart/failover during commit
Date Wed, 15 Jan 2014 18:07:29 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872340#comment-13872340

Karthik Kambatla commented on MAPREDUCE-5718:

Thanks for chiming in, Jason. 

Please correct me if I am wrong. Not being able to tolerate node failures (slaves/master)
seems like a major regression from MR1 which tolerates slave failures. I am wondering if there
is a way to solve the crashed commits issue not just for all jobs. For MR, what do you think
of committing to an intermediate location, and renaming it to the output location? If the
output location is missing, the commit can be retried.

> MR AM should tolerate RM restart/failover during commit
> -------------------------------------------------------
>                 Key: MAPREDUCE-5718
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5718
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.4.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>              Labels: ha
>         Attachments: mr-5718-0.patch
> While testing RM HA, we ran into this issue where if the RM fails over while an MR AM
is in the middle of a commit, the subsequent AM gets spawned but dies with a diagnostic message
- "We crashed durring a commit". 

This message was sent by Atlassian JIRA

View raw message