hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Ratan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5199) A proposal to merge common functionality of various Schedulers
Date Mon, 09 Feb 2009 12:40:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671830#action_12671830
] 

Vivek Ratan commented on HADOOP-5199:
-------------------------------------

The big difference I see between the three Schedulers is in the way each of them sorts jobs
when considering which job to look at. The Fair Share Scheduler (FS) sorts globally, based
on a number of parameters that affect the weight of a job. The Capacity Scheduler (CS) sorts
queues based on need, then sorts jobs based on priority and/or job submission time. The Default
Scheduler (DS) sorts jobs based on priority and job submission times. 

I suggest we have one scheduler: call it the Hadoop Scheduler (HS). HS has a notion of pools
similar to that in FS, where a pool is a collection of jobs. As in FS, pools can be mapped
to any job property. Every pool has a capacity. This notion exists in both CS and FS, and
is quite similar in both. We may choose to make this capacity 'guaranteed', ala CS, if we
implement preemption. 

If we have multiple pools, it probably makes sense when processing a heartbeat to first decide
which pool to look at, and then order jobs in that pools based on some ordering criteria,
rather than looking at a global ordering of jobs. Sorting jobs only within a single pool per
hearbeat provides much better performance, especially for large clusters with lots of submitted
jobs. 

Hadoop Scheduler's _assignTasks()_ method is the central scheduling algorithm. It looks as
follows: 
{code}
assignTasks() {

  figure out how many M/R tasks to assign; 
  // this is already done by JobQueueTaskScheduler.assignTasks(), which computes cluster M/R
loads and uses a padding constant to determine 
  // how many map and reduce tasks to assign in this heartbeat. FS does something similar,
though there are suggestions to improve this  
  // (see the global scheduling Jira - HADOOP 4667)

  for (each task to assign) {

    sort the pools based on some criteria;
    // we can use the criteria in CS, where we sort pools based on how much they're lagging
behind, i.e., based on the ratio of # of running tasks and 
    // the pool's capacity. 

    for (each pool in the sorted collection) {

      ask the pool for a sorted collection of jobs;
      // this is where we can use different sorting criteria. An FS-based class can return
jobs sorted on their notion of fairness. A class based on CS or DS 
      // sorts jobs based on priorities and then on submission time. Users can pick which
mechanism they want, or define one of their own. 

      for (each job in the sorted collection) {

        if (user limits are OK) && (memory requirements are met) {

          get node-local or rack-local task;
          // see the code used in DS where we get one or more node-local map task, up to one
rack-local map task, or up to one reduce task

          if (we have a task) 
            break;
        }
      }
    }
  }
}
{code}

Features such as looking at free TT memory and a job's memory requirements when scheduling,
or looking at user job/task limits, can be enabled/disabled through configuration, but can
be part of HS' _assignTasks()_ methods. It may not be clear with some features whether we
want them in the core algorithm or whether they are specific to individual cases. In the former
case, we can put them in HS' _assignTasks()_, enable/disabled through configuration, while
in the latter case, they can be part of the functionality specific to FS or CS-like behavior.
I will attach a patch to demonstrate how this works. 




> A proposal to merge common functionality of various Schedulers
> --------------------------------------------------------------
>
>                 Key: HADOOP-5199
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5199
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Vivek Ratan
>
> There are at least 3 Schedulers in Hadoop today: Default, Capacity, and Fairshare. Over
time, we're seeing a lot of functionality common to all three. Many bug fixes, improvements
to existing functionality, and new functionality are applicable to all three schedulers. This
trend seems to be getting stronger, as we notice similar problems, solutions, and ideas. This
is a proposal to detect and consolidate such common functionality, while at the same time,
allowing for differences in behavior and leaving the door open for other schedulers to be
built, as per HADOOP-3412. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message