hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Chen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs
Date Wed, 10 Feb 2010 01:10:28 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831797#action_12831797

Scott Chen commented on MAPREDUCE-1463:

Yes, you're right. The logic in the patch is wrong. The one you post is the correct logic.
Sorry about the mistake.

How do you define small jobs. Shouldnt it be based on total number of tasks instead of considering
maps and reduces individually?
We want to start reducer faster in both the fewer mapper and fewer reducer cases.
Because for fewer reducer case, starting reducer earlier is cheap anyway. And for fewer mapper
case, the mapper finishes faster.
But I think it may not be a bad idea if we take the total instead (it is simpler at least).

Why do we need special case for small jobs? If its for fairness then this piece of code rightly
belongs to contrib/fairscheduler, no?
If not for fairness then what is the problem with the current framework w.r.t small jobs?
Handling the special case for small jobs increase the overall latency which gives the users
better experience.
Can be fixed by simple (configuration-like) tweaking?
If not then whats the right fix.
For experienced users,  setting completedmaps=0 does fix this problem. But it will be nice
if this can be automatically done for other users who do not know how to configure hadoop.

Thanks for the comments. I agree. Tweaking mapreduce.job.reduce.slowstart.completedmaps in
the job client side should be a cleaner way for this one. For experienced users, settting
completedmaps to 0 in the client side will make their small jobs finish faster.  But it would
be nice if some automatic decision can be done here such that the normal users don't have
to learn how to configure an extra parameter.

The point here is that for some cases (small job, small number of mappers or reducers) we
should not be spending time on waiting the reducers to start because the waiting time is significant
(or it is cheap to start the reducer earlier). Automatically reducing the latency makes our
user happy.

> Reducer should start faster for smaller jobs
> --------------------------------------------
>                 Key: MAPREDUCE-1463
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/fair-share
>            Reporter: Scott Chen
>            Assignee: Scott Chen
>         Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch
> Our users often complain about the slowness of smaller ad-hoc jobs.
> The overhead to wait for the reducers to start in this case is significant.
> It will be good if we can start the reducer sooner in this case.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message