hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
Date Wed, 11 Nov 2015 19:17:11 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000916#comment-15000916

Junping Du commented on MAPREDUCE-5485:

bq. About the overall test. The main overall change is to allow the retry AM to continue after
seeing an in-progress commit from the previous AM. It seems incomplete to not have a test
for that. 
I agree that it is better to add as many cases as possible in unit test. But due to limitations
of our current unit test framework, we could miss many functional tests, especially related
to MR AM failed/restart, like: in rolling upgrade story, we don't have tests to check AM failed
over during NM/RM restart. Instead, we may have to split the whole functionality into pieces
and test each piece. Sometime it is sad that this may not be good enough and that's why we
still need to test/verify the feature works end to end on a real cluster.

bq. However if you think that we dont have existing infra for that code path then we should
create a follow up jira to add that infra and relevant tests. I have not followed the MR AM
code changes for a while and so I cannot recall of the top of my head about any existing test
cases. Maybe other committers may have some ideas.
Just filed MAPREDUCE-6545 to track more test effort that comes later.

bq. With that caveat, the latest patch looks good to me. Thanks for your patience through
the reviews.
Thanks Bikas for your carefully review.

> Allow repeating job commit by extending OutputCommitter API
> -----------------------------------------------------------
>                 Key: MAPREDUCE-5485
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.1.0-beta
>            Reporter: Nemon Lou
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, MAPREDUCE-5485-v1.patch,
MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch,
MAPREDUCE-5485-v4.patch, MAPREDUCE-5485-v5.patch
> There are chances MRAppMaster crush during job committing,or NodeManager restart cause
the committing AM exit due to container expire.In these cases ,the job will fail.
> However,some jobs can redo commit so failing the job becomes unnecessary.
> Let clients tell AM to allow redo commit or not is a better choice.
> This idea comes from Jason Lowe's comments in MAPREDUCE-4819 

This message was sent by Atlassian JIRA

View raw message