hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
Date Thu, 01 May 2014 22:43:19 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987115#comment-13987115

Ming Ma commented on MAPREDUCE-5652:

Sounds good, we can use a new jira to cover "best effort" work.

The patch looks good. Just to confirm, protobuf should be backward compatible, e.g., the store
state serialized with version 2.4 should be readable by NM/MR compiled with version 2.5.

On an unrelated note, based on how NM's AuxServices' serviceStart handles error for each AuxService'
serviceStart, if one AuxService throws some exception, the rest of AuxServices' serviceStart
will be skipped. That isn't important given we only have one AuxService. Perhaps there is
some policy around that as well, should NM skip failed AuxService? It seems in general we
might need to improve AuxService handling if there are other AuxServices.

> NM Recovery. ShuffleHandler should handle NM restarts
> -----------------------------------------------------
>                 Key: MAPREDUCE-5652
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Jason Lowe
>              Labels: shuffle
>         Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch,
MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch,
MAPREDUCE-5652-v9-and-YARN-1987.patch, MAPREDUCE-5652.patch
> ShuffleHandler should work across NM restarts and not require re-running map-tasks. On
NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should
be avoided.

This message was sent by Atlassian JIRA

View raw message