hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Bacsko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
Date Fri, 04 Aug 2017 10:58:01 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114240#comment-16114240

Peter Bacsko commented on MAPREDUCE-6870:

What's your suggestion to the variables?
My ideas: {{finishJobWhenReducersDone}}, {{MRJobConfig.FINISH_JOB_WHEN_REDUCERS_DONE}}

Preventing {{TA_KILL}} events: basically I just store a state information in each {{MapTaskImpl}}.
But it's unnecessary since you can store this in a single variable after sending the kill
events. So your approach is better.

New test: in {{TestJobImpl}}, certain events are coming from {{TaskImpl}} and {{TaskAttemptImpl}}.
However these are mocked inside {{JobImpl}}, so you have to generate them manually. To properly
test the behavior of this change, it might make sense to use the real impl classes instead
of mocks.

bq. Also, do we expect the job the succeed even when killMappers is set to false?

Only if we send the completion events. If we don't, then of course it stays in RUNNING. I
took the idea of finishing mappers/reducers from this test: https://github.com/apache/hadoop/blob/78b487bde175544ebe40e4dafab35569baa1d79e/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java#L597-L625

> Add configuration for MR job to finish when all reducers are complete (even with unfinished
> ----------------------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-6870
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.6.1
>            Reporter: Zhe Zhang
>            Assignee: Peter Bacsko
>         Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, MAPREDUCE-6870-003.patch
> Even with MAPREDUCE-5817, there could still be cases where mappers get scheduled before
all reducers are complete, but those mappers run for long time, even after all reducers are
complete. This could hurt the performance of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than providing intermediate
data to reducers. In that case, the job owner should have the config option to finish the
job once all reducers are complete.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message