hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Ratan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3746) A fair sharing job scheduler
Date Fri, 11 Jul 2008 14:44:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612895#action_12612895
] 

Vivek Ratan commented on HADOOP-3746:
-------------------------------------

Matei, sounds like you're extending the current Scheduler in some really beneficial ways.
Look forward to seeing your patch. 

I just wanted to add: HADOOP-3445 supports user limits (a single user can only use up to a
proportion of the queue), which, we think, will prevent starvation (when combined with capacities
for queues).  This feature should also solve HADOOP-2573.  

It'll be really interesting to see how these two schedulers handle user problems today, such
as starvation of jobs, or fairness problems. I think there're a lot of similarities in the
approaches, but perhaps enough differences that will manifest in different ways on a cluster.
Hopefully, once we have patches out for both Jiras, and enough experience using them, we can
find some common solutions to common problems with Hadoop scheduling. One of our goals, driven
by the work for 3412, is to make the scheduler for 3421 extensible, so folks can tweak algorithms
at different levels and find what works best for them, as well as enhance the default algorithm.
I hope that the same is possible with your design - it'll be a very useful feature. 

My real point is, between HADOOP-3412, HADOOP-3421 (and its implementation offshoots, including
HADOOP-3445), and HADOOP-3746, as well as other Jiras these might influence, there's a lot
of good work going on now in dealing with Hadoop scheduling and looking at effective ways
to improve the utilization and usability of Hadoop. It's about time! We need this effort.


> A fair sharing job scheduler
> ----------------------------
>
>                 Key: HADOOP-3746
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3746
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Matei Zaharia
>            Assignee: Matei Zaharia
>            Priority: Minor
>             Fix For: 0.19.0
>
>
> The default job scheduler in Hadoop has a first-in-first-out queue of jobs for each priority
level. The scheduler always assigns task slots to the first job in the highest-level priority
queue that is in need of tasks. This makes it difficult to share a MapReduce cluster between
users because a large job will starve subsequent jobs in its queue, but at the same time,
giving lower priorities to large jobs would cause them to be starved by a stream of higher-priority
jobs. Today one solution to this problem is to create separate MapReduce clusters for different
user groups with Hadoop On-Demand, but this hurts system utilization because a group's cluster
may be mostly idle for long periods of time. HADOOP-3445 also addresses this problem by sharing
a cluster between different queues, but still provides only FIFO scheduling within a queue.
> This JIRA proposes a job scheduler based on fair sharing. Fair sharing splits up compute
time proportionally between jobs that have been submitted, emulating an "ideal" scheduler
that gives each job 1/Nth of the available capacity. When there is a single job running, that
job receives all the capacity. When other jobs are submitted, tasks slots that free up are
assigned to the new jobs, so that everyone gets roughly the same amount of compute time. This
lets short jobs finish in reasonable amounts of time while not starving long jobs. This is
the type of scheduling used or emulated by operating systems - e.g. the Completely Fair Scheduler
in Linux. Fair sharing can also work with job priorities - the priorities are used as weights
to determine the fraction of total compute time that a job should get. 
> In addition, the scheduler will support a way to guarantee capacity for particular jobs
or user groups. A job can be marked as belonging to a "pool" using a parameter in the jobconf.
An "allocations" file on the JobTracker can assign a minimum allocation to each pool, which
is a minimum number of map slots and reduce slots that the pool must be guaranteed to get
when it contains jobs. The scheduler will ensure that each pool gets at least its minimum
allocation when it contains jobs, but it will use fair sharing to assign any excess capacity,
as well as the capacity within each pool. This lets an organization divide a cluster between
groups similarly to the job queues in HADOOP-3445.
> *Implementation Status:*
> I've implemented this scheduler using a version of the pluggable scheduler API in HADOOP-3412
that works with Hadoop 0.17. The scheduler supports fair sharing, pools, priorities for weighing
job shares, and a text-based allocation config file that is reloaded at runtime whenever it
has changed to make it possible to change allocations without restarting the cluster. I will
also create a patch for trunk that works with the latest interface in the patch submitted
for HADOOP-3412.
> The actual implementation is simple. To implement fair sharing, the scheduler keeps track
of a "deficit" for each job - the difference between the amount of compute time it should
have gotten on an ideal scheduler, and the amount of compute time it actually got. This is
a measure of how "unfair" we've been to the job. Every few hundred milliseconds, the scheduler
updates the deficit of each job by looking at how many tasks each job had running during this
interval vs. how many it should have had given its weight and the set of jobs that were running
in this period. Whenever a task slot becomes available, it is assigned to the job with the
highest deficit - unless there were one or more jobs who were not meeting their pool capacity
guarantees, in which case we choose among those "needy" jobs based again on their deficit.
> *Extensions:*
> Once we keep track of pools, weights and deficits, we can do a lot of interesting things
with a fair scheduler. One feature I will probably add is an option to give brand new jobs
a priority boost until they have run for, say, 10 minutes, to reduce response times even further
for short jobs such as ad-hoc queries, while still being fair to longer-running jobs. It would
also be easy to add a "maximum number of tasks" cap for each job as in HADOOP-2573 (although
with priorities and pools, this JIRA reduces the need for such a cap - you can put a job in
its own pool to give it a minimum share, and set its priority to VERY_LOW so it never takes
excess capacity if there are other jobs in the cluster). Finally, I may implement "hierarchical
pools" - the ability for a group to create pools within its pool, so that it can guarantee
minimum allocations to various types of jobs but ensure that together, its jobs get capacity
equal to at least its full pool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message