hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
Date Wed, 23 Apr 2014 07:32:16 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977938#comment-13977938

Ming Ma commented on MAPREDUCE-5652:

Nice work. Jason, I would like to clarify how the following scenarios are handled. Perhaps
they are covered at the YARN layer as part of https://issues.apache.org/jira/browse/YARN-1336.

1. NM crash scenario. There is a corner case, after RM notifies NM regarding the completion
of a specific application, right before AuxServices get the chance to process the event, NM
crashes. The app entry won't be removed after the recovery store after NM is restarted, as
APPLICATION_STOP won't be delivered to NM for that application after NM restart.

2. NM graceful shutdown. It seems ContainerManagerImpl's serviceStop will generate ContainerManagerEventType.FINISH_APPS
event. That means AuxServices could clean up and remove it from the recovery store as part
of NM shutdown.

> NM Recovery. ShuffleHandler should handle NM restarts
> -----------------------------------------------------
>                 Key: MAPREDUCE-5652
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Jason Lowe
>              Labels: shuffle
>         Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch,
MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652.patch
> ShuffleHandler should work across NM restarts and not require re-running map-tasks. On
NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should
be avoided.

This message was sent by Atlassian JIRA

View raw message