hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haibo Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
Date Tue, 08 Aug 2017 17:17:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118661#comment-16118661
] 

Haibo Chen commented on MAPREDUCE-6870:
---------------------------------------

A few knits:
1) Use block comment /* **/ for checkReadyForCompletionWhenAllReducersDone()
2) We can avoid iterating over all map tasks if job. completingJob is true, that is,
{code}
if (totalReduces > 0 && totalReduces == completedReduces) {
    if (!job.completingJob) {
        for(task: mapTasks) {
            kill if task is not finished.
        }
        job.completingJob = true;
    }
}
{code}

3) Can we remove "assertJobState(job, JobStateInternal.RUNNING)" in TestJobImpl.testRunningMapperPreemptionWhenReducerIsFinished()
since it is not doing anything, and add a comment before the "if(killMappers)" statement saying
that the stubbed job cannot finish and we therefore verify task kill events instead?

4) The description of mapreduce.job.finish-when-all-reducers-done in mapred-default.xml stills
says terminate running map tasks. I think we should say something like 
'Specifies whether the job should complete once all reducers have finished, regardless of
whether there are still running mappers', which is closer to what really matters to end users.
Related to this, we can rename testRunningMappersPreemptedWhenReducerIsFinished and testRunningMappersNotPreemptedWhenReducerIsFinished
to something like 'testJobCompletedWhenAllReducersAreFinished' , 'testJobNotCompletedWhenAllReducersAreFinished'.


> Add configuration for MR job to finish when all reducers are complete (even with unfinished
mappers)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6870
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.6.1
>            Reporter: Zhe Zhang
>            Assignee: Peter Bacsko
>         Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, MAPREDUCE-6870-003.patch,
MAPREDUCE-6870-004.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get scheduled before
all reducers are complete, but those mappers run for long time, even after all reducers are
complete. This could hurt the performance of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than providing intermediate
data to reducers. In that case, the job owner should have the config option to finish the
job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message