hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Sandholm (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4768) Dynamic Priority Scheduler that allows queue shares to be controlled dynamically by a currency
Date Sat, 06 Dec 2008 07:29:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654038#action_12654038

Thomas Sandholm commented on HADOOP-4768:

Hi Matei,

when I was implementing this I played around with a number of different approaches. The goals
were to make the dynamic scheduler as independent of the underlying schedulers as possible,
and to require as little changes as possible to them. I didn't want to add a seperate deployed
service as this introduces another point of failure and maintenanance. So I hooked into the
schedulers regular event/bookkeeping loop without even requiring a seperate thread to be spawn
instead. In terms of updating the config files of the schedulers (pools file and capaity-scheduler)
asynchronously without requiring any changes at all to the schedulers, it is something I also
tried, and it was quite simple for the fairshare scheduler but it introduces a dependency
on the xml format if you don't want to do some xpath like replacement (whih turned out to
be both too complex and too slow for our purpose). Updating the config files would lead to
more I/O overhead too, now the shares are communicated directly to the shedulers in memory.
I don't think the reverse dependeny is too bad either, the scheduler just get a list of queue/share
values from a config property and can then utilize those in whatever way makes sense to the
local scheduler. My patches to the capacity scheduler and the fairshare scheduler should rather
be seen as examples for scheduler developers how to utilize the dynamic scheduler rather than
final solutions. 

The important thing is that the dynamic scheduler allows control over and accounts for budget
spent on different levels of quality of service/priority. This QoS/priority can then be enforced
and implemented in any number of ways, the dynamic scheduler doesn't care, as long as spending
more currency per time unit will give you better performance. 

Thanks for the more detailed info on the fairshare scheduler, I still think that the guaranteed
allocations were the best match, but if it makes sense to pay more currency for higher fair-shares
you could enforce the shares granted by the dynamic scheduler in a more sophisticated way.
I don't think the interface between the schedulers has to change for this to be done though.

One use case is that you could hook this feature into a secure banking system where budgets
can be transferred from the user to the cluster owner automatically. We have used this approach
successfully in a system called Tycoon (http://tycoon.hpl.hp.com) but instead of allocating
map/reduce task slots it allocates virtual machine shares using Xen (like EC2 but with variable
pricing and finer grained resource control). 

Another use case is a cloud computing test bed that we are designing together with Intel and
Yahoo (that I presented at the venues mentioned in the patch description). In this scenario
researchers are granted some quota, e.g. based on their contribution to the testbed. The quota
can then be used by them to obtain resources when they need them and at a QoS level that matches
their needs.

Hope this clarifies things a bit. If you want more info on the big picture you can look at
some of the papers and presentations on the tycoon site mentioned above or the test bed site,
www.opencirrus.org (under construction). 

> Dynamic Priority Scheduler that allows queue shares to be controlled dynamically by a
> ----------------------------------------------------------------------------------------------
>                 Key: HADOOP-4768
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4768
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/capacity-sched, contrib/fair-share
>    Affects Versions: 0.20.0
>            Reporter: Thomas Sandholm
>            Assignee: Thomas Sandholm
>             Fix For: 0.20.0
>         Attachments: HADOOP-4768-capacity-scheduler.patch, HADOOP-4768-dynamic-scheduler.patch,
> Contribution based on work presented at the Hadoop User Group meeting in Santa Clara
in September and the HadoopCamp in New Orleans in November.
> From README:
> This package implements dynamic priority scheduling for MapReduce jobs.
> Overview
> --------
> The purpose of this scheduler is to allow users to increase and decrease
> their queue priorities continuosly to meet the requirements of their
> current workloads. The scheduler is aware of the current demand and makes
> it more expensive to boost the priority under peak usage times. Thus
> users who move their workload to low usage times are rewarded with
> discounts. Priorities can only be boosted within a limited quota.
> All users are given a quota or a budget which is deducted periodically
> in configurable accounting intervals. How much of the budget is 
> deducted is determined by a per-user spending rate, which may
> be modified at any time directly by the user. The cluster slots 
> share allocated to a particular user is computed as that users
> spending rate over the sum of all spending rates in the same accounting
> period.
> Configuration
> -------------
> This scheduler has been designed as a meta-scheduler on top of 
> existing MapReduce schedulers, which are responsible for enforcing
> shares computed by the dynamic scheduler in the cluster. Thie configuration
> of this MapReduce scheduler does not have to change when deploying
> the dynamic scheduler.
> Hadoop Configuration (e.g. hadoop-site.xml):
> mapred.jobtracker.taskScheduler      This needs to be set to 
>                                      org.apache.hadoop.mapred.DynamicPriorityScheduler
>                                      to use the dynamic scheduler.
> mapred.queue.names                   All queues managed by the dynamic scheduler must
be listed
>                                      here (comma separated no spaces)
> Scheduler Configuration:
> mapred.dynamic-scheduler.scheduler   The Java path of the MapReduce scheduler that should
>                                      enforce the allocated shares.
>                                      Has been tested with:
>                                      org.apache.hadoop.mapred.FairScheduler
>                                      and
>                                      org.apache.hadoop.mapred.CapacityTaskScheduler
> mapred.dynamic-scheduler.budgetfile  The full OS path of the file from which the
>                                      budgets are read. The synatx of this file is:
>                                      <queueName> <budget>
>                                      separated by newlines where budget can be specified
>                                      as a Java float
> mapred.dynamic-scheduler.spendfile   The full OS path of the file from which the
>                                      user/queue spending rate is read. It allows
>                                      the queue name to be placed into the path
>                                      at runtime, e.g.:
>                                      /home/%QUEUE%/.spending
>                                      Only the user(s) who submit jobs to the
>                                      specified queue should have write access
>                                      to this file. The syntax of the file is
>                                      just:
>                                      <spending rate>
>                                      where the spending rate is specified as a
>                                      Java float. If no spending rate is specified
>                                      the rate defaults to budget/1000.
> mapred.dynamic-scheduler.alloc       Allocation interval, when the scheduler rereads
>                                      spending rates and recalculates the cluster shares.
>                                      Specified as seconds between allocations.
>                                      Default is 20 seconds.
> mapred.dynamic-scheduler.budgetset   Boolean which is true if the budget should be deducted

>                                      by the scheduler and the updated budget written
to the
>                                      budget file. Default is true. Setting this to false
>                                      useful if there is a tool that controls budgets
>                                      spending rates externally to the scheduler.
> Runtime Configuration:
> mapred.scheduler.shares              The shares that should be allocated to the specified
>                                      The configuration property is a comma separated
list of
>                                      strings where the odd positioned elements are the

>                                      queue names and the even positioned elements are
the shares
>                                      as Java floats of the preceding queue name. It is
>                                      for all the queues atomically in each allocation
pass. MapReduce
>                                      schedulers such as the Fair and CapacityTask schedulers
>                                      are expected to read from this property periodically.
>                                      Example property value: "queue1,45.0,queue2,55.0"

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message