hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts
Date Thu, 24 Apr 2014 15:54:19 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jason Lowe updated MAPREDUCE-5652:

    Attachment: MAPREDUCE-5652-v7.patch

bq. 1. Does leveDB's delete method throw exception? JNI has some exception handling and the
caller needs to retrieve the exceptions, etc.

Nice catch!  I didn't notice there were _two_ DBExceptions flying around in leveldb code.
 org.fusesource.leveldbjni.internal.NativeDB.DBException comes from the JNI layer and derives
from IOException, and it was the one I was familiar with.  However the wrapper code around
the JNI layer catches that exception and rethrows it as org.iq80.leveldb.DBException which
is a RuntimeException.  That means we need to wrap all calls that can throw the runtime form
and either handle them directly or rethrow as an IOException if it's not appropriate to let
the RuntimeException leak out of the method.

Updated the patch to deal with the runtime DBException when necessary.  I'll also have to
make similar changes in the NMLevelDBStateStore for the other NM restart patches.

bq. 2. It seems like recover/restore are common in NM/RM restart. Any abstract interface defined
for that?

They both support recovery but the forms in which they do it are very different (e.g.: types
of state persisted are significantly different, backing store types have no overlap, etc.)
 There could be a generic Recoverable interface that supports a recover() method, but I'm
not sure what value that adds.  Did you have a particular interface in mind or ideas on how
it would be used?

> NM Recovery. ShuffleHandler should handle NM restarts
> -----------------------------------------------------
>                 Key: MAPREDUCE-5652
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Jason Lowe
>              Labels: shuffle
>         Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch,
MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652.patch
> ShuffleHandler should work across NM restarts and not require re-running map-tasks. On
NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should
be avoided.

This message was sent by Atlassian JIRA

View raw message