hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4819) AM can rerun job after reporting final job status to the client
Date Thu, 03 Jan 2013 16:22:13 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543034#comment-13543034

Robert Joseph Evans commented on MAPREDUCE-4819:

Wow lots of comments.  Thanks for everyone looking at the patch.
bq. I had observed that if I made my AM crash (by putting an exit(1) in shutdownJob() then
the history files would get orphaned and not cleaned up. Or something like that.

Thanks for the heads up. I will look into that.

bq. Why not end in success if the staging dir was cleaned up by the last attempt?

Because we crashed somewhere after staging was cleaned up and before we unregistered.  Crashing
seems like an error to me, but I suppose we could change it.  As for what the client ultimately
sees for success or failure, we will rely on the history server to report that.

bq. I am guessing that this code wont be necessary after we move the unregister to RM before
the staging dir cleanup in MAPREDUCE-4841, right?
Yes and No.  Once MAPREDUCE-4841 goes in there is an increased possibility of leaking staging
directories.  I have seen users in 1.0 blow away their staging directory to clean up, and
caused jobs to fail.  Granted they are more likely to get errors from the distributed cache
not finding the files it needs, but in either case I would like to be paranoid and guard against

bq. Why are we only eating/ignoring the JobEvents in the dispatcher? So that the JobImpl state
machine is not triggered?

In the new code path we have not wired up everything.  JobImpl is created but the JobEventDispatcher
is not.  I did not want to have to deal with recovering the complete state of the job.  Which
in some cases may not even be possible.  This is also why I am not brining up the RPC server.
 Which now that you mention it I probably also need to update the UI/client to deal with that
appropriately. The typo you found was just there for debugging this situation.  (I'll fix
the typo by the way)

bq. This might be a question of personal preference. I think an explicit transition to from
the INIT to final state is cleaner than overriding the state in the getter.

I actually wanted to put in a stubbed out Job instead, but there are too many places that
Job is cast to JobImpl just to get the state making it difficult to do so.  I will look again
to see if I can split the two apart, or add in a state transition.

bq. Oozie handles duplicate notifications correctly doing a NOP.
Great.  I will look at the javadocs for job end notification again to be sure that we can
default to notify instead.

bq. Using separate files for marking success / failure - am guessing this is to have a smaller
change of a failing persist, as compared to persisting events via the HistoryFile, which may
already have a backlog of events?

It was also a much smaller change to make.  The HistoryFile would be preferable if we wanted
to guarantee at most once commit of the tasks, because there are so many of them.

bq. Wondering if it's possible to achieve the same checks via the CommitterEventHandler instead
of checking in the MRAppMaster class. i.e follow the regular recovery path - except the CommitHandler
emits success / failed / abort events depending on the presence of these files / (history
bq. Alternately, the current implementation could be simplified by using a custom RMCommunicator
- which does not depend on JobImpl. i.e. the history copier and an RMCommunicator to unregister
from the RM.
Both of those seem like valid things to investigate.  I feel like I am close on this and want
to get this working as is first and then I will look at the other approaches you suggested.
 I do like the first one as it seems like it would be a lot simpler to implement, but I want
a backup that I know functions before making drastic changes to the design.

bq. If the last AM attempt were to crash - data exists since the SUCCESS file exists, RPC
will not see SUCCESS.
We have a lot of problems in general if the last AM were to crash.  It is possible that the
history server would have no knowledge of the job what so ever even if it finished successfully.
 This patch is not attempting to address those problems.

bq. While the new AM is running - it will not be able to handle status, counter etc requests.
This seems a little problematic if a success has been reported over RPC from the previous
AM. Since this AM is dealing with the history file - could possibly have it return information
from the history file ? History commit before SUCCESS may help with the previous 2 points.

Yes History commit before returning success would help with those problems. I will look into
it as an alternative approach.  my initial thought was to update the client/UI to wait for
the AM to report a valid address so that no client is trying to get counters etc from an AM
in this situation.

bq. If the recovered AppMaster is not the last retry - looks like the RM unregistration will
not happen. (isLastAMRetry)
isLastAMRetry is set in a number of places, including in the init method if we notice that
the previous Job ended but the AM crashed.

bq. Is a KILLED status also required - KILLED during commit should not be reported as FAILED.
That would be nice.  We would have to put it in as part of CommiterEventHandler.cancelJobCommit().
I will look into that.

bq. CommitEventHandler.touchz could throw an exception if the file already exists - to prevent
lost AMs from committing. (maybe not required after MAPREDUCE-4832 ?)
I think it already will.  We are not opening the file for append, we are trying to create

bq. historyService creation - can move into the common if (copyHistory) check

bq. Don't think "AMStartedEvent" cannot be ignored - the history server will have no info
about past AMs. I think only the current AM needs to be ignored.

The AMStartedEvent is ignored by the copy service but not by the MRAppMaster.  The MRAppMaster
will read the history file just like it did before and extract the AMStartedEvents, it will
add in another one for itself, and then the copyHistoryService will read the rest of the history

bq. Wondering if it's possible to use HDFS dirs and timestamps to co-ordinate between an active
AM and lost AMs.
Also, are hdfs dir operations cheaper than file create operations (NN only / NN +DN) ? Nor
sure if mkdir / 0 length file creation are NN only ops.

I thought that they were NN only ops, but I will check with an HDFS person to know for sure.

> AM can rerun job after reporting final job status to the client
> ---------------------------------------------------------------
>                 Key: MAPREDUCE-4819
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4819
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 0.23.3, 2.0.1-alpha
>            Reporter: Jason Lowe
>            Assignee: Bikas Saha
>            Priority: Critical
>         Attachments: MAPREDUCE-4819.1.patch, MAPREDUCE-4819.2.patch, MAPREDUCE-4819.3.patch,
MR-4819-bobby-trunk.txt, MR-4819-bobby-trunk.txt
> If the AM reports final job status to the client but then crashes before unregistering
with the RM then the RM can run another AM attempt.  Currently AM re-attempts assume that
the previous attempts did not reach a final job state, and that causes the job to rerun (from
scratch, if the output format doesn't support recovery).
> Re-running the job when we've already told the client the final status of the job is
bad for a number of reasons.  If the job failed, it's confusing at best since the client was
already told the job failed but the subsequent attempt could succeed.  If the job succeeded
there could be data loss, as a subsequent job launched by the client tries to consume the
job's output as input just as the re-attempt starts removing output files in preparation for
the output commit.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message