hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5652) ShuffleHandler should handle NM restarts
Date Thu, 09 Jan 2014 14:04:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866655#comment-13866655

Jason Lowe commented on MAPREDUCE-5652:

I've largely implemented this as part of the prototype for YARN-1336.  I actually have two
versions, one that uses FileSystem to store the shuffle tokens and job-to-user mappings and
another that uses leveldb.  (The prototype currently has a  leveldb back-end store to simplify
some of the race conditions during store and recovery.)  It shouldn't be too much effort to
extricate just the ShuffleHandler changes, although there aren't any unit tests for it yet.

As Alejandro pointed out it also needs some help from the NodeManager to keep it from cleaning
up the local directories and removing the shuffle output after restarting.  That's also been
done as part of the prototype and is relatively straightforward, but we're still missing a
mechanism for distinguishing the restart case vs. shutdown/decommission case and some other

> ShuffleHandler should handle NM restarts
> ----------------------------------------
>                 Key: MAPREDUCE-5652
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>              Labels: shuffle
> ShuffleHandler should work across NM restarts and not require re-running map-tasks. On
NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should
be avoided.

This message was sent by Atlassian JIRA

View raw message