hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-4667) Global scheduling in the Fair Scheduler
Date Thu, 08 Jan 2009 07:04:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matei Zaharia updated HADOOP-4667:
----------------------------------

    Attachment: fs-global-v0.patch

Here's a preliminary version of this patch. It includes the patches for HADOOP-4789 and https://issues.apache.org/jira/browse/HADOOP-4665
because it depends on those. This may make it confusing to read but I will post simpler versions
once those patches are in. The code of interest here is really just in assignTasks and getAllowedLocalityLevel
in FairScheduler.java. In addition, doing this requires a change to the JobInProgress API
to have an obtainNewMapTask version that takes a locality level (distance up the topology).
This was already used internally in findMapTask but there is now a package-visible method
that exposes it to the scheduler.

> Global scheduling in the Fair Scheduler
> ---------------------------------------
>
>                 Key: HADOOP-4667
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4667
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/fair-share
>            Reporter: Matei Zaharia
>         Attachments: fs-global-v0.patch
>
>
> The current schedulers in Hadoop all examine a single job on every heartbeat when choosing
which tasks to assign, choosing the job based on FIFO or fair sharing. There are inherent
limitations to this approach. For example, if the job at the front of the queue is small (e.g.
10 maps, in a cluster of 100 nodes), then on average it will launch only one local map on
the first 10 heartbeats while it is at the head of the queue. This leads to very poor locality
for small jobs. Instead, we need a more "global" view of scheduling that can look at multiple
jobs. To resolve the locality problem, we will use the following algorithm:
> - If the job at the head of the queue has no local task to launch, skip it and look through
other jobs.
> - If a job has been skipped for at least T seconds while waiting for a local task, stop
skipping it and allow it to launch non-local tasks.
> - If no job can launch a task at all, return to the head of the queue and launch a non-local
task from the first job.
> This algorithm improves locality while bounding the delay that any job experiences in
launching a task.
> We will actually provide two values of T - one for data-local tasks and a longer wait
for rack-local tasks. It also turns out that whether waiting is useful depends on how many
tasks are left in the job - the probability of getting a heartbeat from a node with a local
task. Thus there may be logic for removing the wait on the last few tasks in the job.
> As a related issue, once we allow global scheduling, we can launch multiple tasks per
heartbeat, as in HADOOP-3136. The initial implementation of HADOOP-3136 adversely affected
performance because it only launched multiple tasks from the same job, but with the wait rule
above, we will only do this for jobs that are allowed to launch non-local tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message