hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Krogen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
Date Fri, 11 Aug 2017 18:19:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123784#comment-16123784

Erik Krogen commented on MAPREDUCE-6870:

Given its rarity and that the worst case scenario is {{(expected execution time) + (single
mapper execution time)}} I would consider it not a severe issue, which leans me towards compatibility.
However the current behavior is pretty confusing for an average user, so, tough call.

We would like to backport this to older release lines, in which case we definitely need to
maintain compatibility and thus have default = false. As for trunk I am on the fence.

> Add configuration for MR job to finish when all reducers are complete (even with unfinished
> ----------------------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-6870
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.6.1
>            Reporter: Zhe Zhang
>            Assignee: Peter Bacsko
>             Fix For: 3.0.0-beta1
>         Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, MAPREDUCE-6870-003.patch,
MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
> Even with MAPREDUCE-5817, there could still be cases where mappers get scheduled before
all reducers are complete, but those mappers run for long time, even after all reducers are
complete. This could hurt the performance of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than providing intermediate
data to reducers. In that case, the job owner should have the config option to finish the
job once all reducers are complete.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message