hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-936) Allow a load difference in fairshare scheduler
Date Fri, 28 Aug 2009 23:20:32 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749020#action_12749020

Matei Zaharia commented on MAPREDUCE-936:

Hi Zheng,

For issue 1, the provided patch looks good. It might be nice to add a unit test for it though.

For issue 2, I believe the implementation of locality waits in MAPREDUCE-706 has solved the
issue. In that implementation, once a job has launched a non-local task, it can keep launching
non-local tasks right away without further waits. However, if it ever manages to launch a
local task again, it needs to wait to start launching non-local tasks. The reasoning for this
is that maybe the job had just been unlucky earlier and still has lots of tasks left to launch,
and we don't want it to stay stuck at the non-local level.

I think the locality wait code you guys are running at Facebook is much older than the one
in MAPREDUCE-706, so it would be nice if you could upgrade to MAPREDUCE-706 when you upgrade
Hadoop in general. I believe it would not be too difficult to port the trunk version of the
fair scheduler to 0.20 and get all the architectural changes and improvements in 706 with


> Allow a load difference in fairshare scheduler
> ----------------------------------------------
>                 Key: MAPREDUCE-936
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-936
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/fair-share
>            Reporter: Zheng Shao
>         Attachments: MAPREDUCE-936.1.patch
> The problem we are facing: It takes a long time for all tasks of a job to get scheduled
on the cluster, even if the cluster is almost empty.
> There are two reasons that together lead to this situation:
> 1. The load factor makes sure each TT runs the same number of tasks. (This is the part
that this patch tries to change).
> 2. The scheduler tries to schedule map tasks locally (first node-local, then rack-local).
There is a wait time (mapred.fairscheduler.localitywait.node and mapred.fairscheduler.localitywait.rack,
both are around 10 sec in our conf), and accumulated wait time (JobInfo.localityWait). The
accumulated wait time is reset to 0 whenever a non-local map task is scheduled. That means
it takes N * wait_time to schedule N non-local map tasks.
> Because of 1, a lot of TT will not be able to take more tasks, even if they have free
slots. As a result, a lot of the map tasks cannot be scheduled locally.
> Because of 2, it's really hard to schedule a non-local task.
> As a result, sometimes we are seeing that it takes more than 2 minutes to schedule all
the mappers of a job.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message