hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
Date Thu, 01 May 2014 14:12:18 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jason Lowe updated MAPREDUCE-5652:

    Attachment: MAPREDUCE-5652-v9-and-YARN-1987.patch

Filed YARN-1987 to cover the DBIterator wrapper and updating the patch to use that new wrapper
class.  Note that the patch includes YARN-1987 so Jenkins can comment.

bq. If ShuffleHandler gets DBException during recoverState as part of serviceStart, should
ShuffleHandler ignore the exception and continue like the store doesn't exist?

Failure to recover should be a rare situation where the DB is corrupted/inaccessible or there's
some schema incompatibility between versions if an upgrade occurs during the NM downtime.
 It should be investigated and corrected, otherwise the errors will likely be glossed over
and we will continue to fail to shuffle across NM restarts from that point forward despite
the user specifying otherwise.

We could add a config to request a "best effort" mode where it will continue despite the inability
to recover, but is that an NM-wide config, a config just for the shuffle handler, or something
else?  If we want a config to control this I propose we address it in a followup JIRA.

> NM Recovery. ShuffleHandler should handle NM restarts
> -----------------------------------------------------
>                 Key: MAPREDUCE-5652
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Jason Lowe
>              Labels: shuffle
>         Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch,
MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch,
MAPREDUCE-5652-v9-and-YARN-1987.patch, MAPREDUCE-5652.patch
> ShuffleHandler should work across NM restarts and not require re-running map-tasks. On
NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should
be avoided.

This message was sent by Atlassian JIRA

View raw message