hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Chen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs
Date Wed, 31 Mar 2010 17:33:27 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851983#action_12851983
] 

Scott Chen commented on MAPREDUCE-1463:
---------------------------------------

I think improving the timing for launching reducers is not just for small jobs.
In the case of FairSchduler, for larger jobs with 10000+ mappers, the mappers needs several
batches to be fully scheduled.
In this case if we launch the reducer when 5% mapper finished, those reducers will just be
idling.

Here is the trade-off.
If we launch the reducer too late, we lose the parallel execution for the mapper execution
and reducer shuffling.
But if we launch the reducer too early, we waste the reducer slots because they have to wait
the mappers to finish.

The optimal case for this is that we launch the reducers as late as possible while the reducer
shuffling phase finishes right after the last mapper finished.

The goal is to somehow estimate the mapper finish time based on the information we have and
launch the reducers at the right moment.
I think this decision should depend on TaskScheduler because different scheduling policy affects
the mapper finish time.

Thoughts?

> Reducer should start faster for smaller jobs
> --------------------------------------------
>
>                 Key: MAPREDUCE-1463
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>            Reporter: Scott Chen
>            Assignee: Scott Chen
>         Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, MAPREDUCE-1463-v3.patch
>
>
> Our users often complain about the slowness of smaller ad-hoc jobs.
> The overhead to wait for the reducers to start in this case is significant.
> It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message