hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Sandholm (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-4768) Dynamic Priority Scheduler that allows queue shares to be controlled dynamically by a currency
Date Tue, 07 Apr 2009 18:14:13 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thomas Sandholm updated HADOOP-4768:
------------------------------------

          Description: 
Dynamic (economic) priority scheduler based on work presented at the Hadoop User Group meeting
in Santa Clara in September and the HadoopCamp in New Orleans in November 2008.



  was:
Contribution based on work presented at the Hadoop User Group meeting in Santa Clara in September
and the HadoopCamp in New Orleans in November.

>From README:
This package implements dynamic priority scheduling for MapReduce jobs.

Overview
--------
The purpose of this scheduler is to allow users to increase and decrease
their queue priorities continuosly to meet the requirements of their
current workloads. The scheduler is aware of the current demand and makes
it more expensive to boost the priority under peak usage times. Thus
users who move their workload to low usage times are rewarded with
discounts. Priorities can only be boosted within a limited quota.
All users are given a quota or a budget which is deducted periodically
in configurable accounting intervals. How much of the budget is 
deducted is determined by a per-user spending rate, which may
be modified at any time directly by the user. The cluster slots 
share allocated to a particular user is computed as that users
spending rate over the sum of all spending rates in the same accounting
period.

Configuration
-------------
This scheduler has been designed as a meta-scheduler on top of 
existing MapReduce schedulers, which are responsible for enforcing
shares computed by the dynamic scheduler in the cluster. Thie configuration
of this MapReduce scheduler does not have to change when deploying
the dynamic scheduler.

Hadoop Configuration (e.g. hadoop-site.xml):
mapred.jobtracker.taskScheduler      This needs to be set to 
                                     org.apache.hadoop.mapred.DynamicPriorityScheduler
                                     to use the dynamic scheduler.
mapred.queue.names                   All queues managed by the dynamic scheduler must be listed
                                     here (comma separated no spaces)
Scheduler Configuration:
mapred.dynamic-scheduler.scheduler   The Java path of the MapReduce scheduler that should
                                     enforce the allocated shares.
                                     Has been tested with:
                                     org.apache.hadoop.mapred.FairScheduler
                                     and
                                     org.apache.hadoop.mapred.CapacityTaskScheduler

mapred.dynamic-scheduler.budgetfile  The full OS path of the file from which the
                                     budgets are read. The synatx of this file is:
                                     <queueName> <budget>
                                     separated by newlines where budget can be specified
                                     as a Java float

mapred.dynamic-scheduler.spendfile   The full OS path of the file from which the
                                     user/queue spending rate is read. It allows
                                     the queue name to be placed into the path
                                     at runtime, e.g.:
                                     /home/%QUEUE%/.spending
                                     Only the user(s) who submit jobs to the
                                     specified queue should have write access
                                     to this file. The syntax of the file is
                                     just:
                                     <spending rate>
                                     where the spending rate is specified as a
                                     Java float. If no spending rate is specified
                                     the rate defaults to budget/1000.
mapred.dynamic-scheduler.alloc       Allocation interval, when the scheduler rereads the
                                     spending rates and recalculates the cluster shares.
                                     Specified as seconds between allocations.
                                     Default is 20 seconds.
mapred.dynamic-scheduler.budgetset   Boolean which is true if the budget should be deducted

                                     by the scheduler and the updated budget written to the
                                     budget file. Default is true. Setting this to false is
                                     useful if there is a tool that controls budgets and
                                     spending rates externally to the scheduler.
Runtime Configuration:
mapred.scheduler.shares              The shares that should be allocated to the specified
queue.
                                     The configuration property is a comma separated list
of
                                     strings where the odd positioned elements are the 
                                     queue names and the even positioned elements are the
shares
                                     as Java floats of the preceding queue name. It is updated
                                     for all the queues atomically in each allocation pass.
MapReduce
                                     schedulers such as the Fair and CapacityTask schedulers
                                     are expected to read from this property periodically.
                                     Example property value: "queue1,45.0,queue2,55.0"

    Affects Version/s:     (was: 0.20.0)
                       0.21.0
         Release Note: 
This package implements dynamic priority scheduling for MapReduce jobs.

Overview
--------
The purpose of this scheduler is to allow users to increase and decrease
their queue priorities continuosly to meet the requirements of their
current workloads. The scheduler is aware of the current demand and makes
it more expensive to boost the priority under peak usage times. Thus
users who move their workload to low usage times are rewarded with
discounts. Priorities can only be boosted within a limited quota.
All users are given a quota or a budget which is deducted periodically
in configurable accounting intervals. How much of the budget is
deducted is determined by a per-user spending rate, which may
be modified at any time directly by the user. The cluster slots
share allocated to a particular user is computed as that users
spending rate over the sum of all spending rates in the same accounting
period.

Configuration
-------------
This scheduler comprises two components, an accounting or resource allocation part that
manages and bills for queue shares, and a scheduler that
enforces the queue shares in the form of map and reduce slots of running jobs.

Hadoop Configuration (e.g. hadoop-site.xml):
mapred.jobtracker.taskScheduler
    This needs to be set to
    org.apache.hadoop.mapred.DynamicPriorityScheduler
    to use the dynamic scheduler.
Scheduler Configuration:
mapred.dynamic-scheduler.scheduler
    The Java path of the MapReduce scheduler that should
    enforce the allocated shares.
    Has been tested with (which is the default):
    org.apache.hadoop.mapred.PriorityScheduler
mapred.priority-scheduler.acl-file
    Full path of ACL with syntax:
      <user> <role> <secret key>
    separated by line feeds
mapred.dynamic-scheduler.budget-file
    The full OS path of the file from which the
    budgets are read and stored. The syntax of this file is:
    <queue name> <budget> <spending rate>
    separated by newlines where budget can be specified
    as a Java float. The file should not be edited
    directly, if the server is running, but through the
    servlet API to ensure proper synchronization.

mapred.dynamic-scheduler.alloc-interval
    Allocation interval, when the scheduler rereads the
    spending rates and recalculates the cluster shares.
    Specified as seconds between allocations.
    Default is 20 seconds.



  was:Dynamic priority scheduler allowing shares in a scheduler such as the CapacityTaskScheduler
and the FairScheduler to be controlled dynamically by a currency (consumable quota)


latest version tested with 0.21.0 trunk

> Dynamic Priority Scheduler that allows queue shares to be controlled dynamically by a
currency
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4768
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4768
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/capacity-sched, contrib/fair-share
>    Affects Versions: 0.21.0
>            Reporter: Thomas Sandholm
>            Assignee: Thomas Sandholm
>         Attachments: HADOOP-4768-2.patch, HADOOP-4768-capacity-scheduler.patch, HADOOP-4768-dynamic-scheduler.patch,
HADOOP-4768-fairshare.patch, HADOOP-4768.patch
>
>
> Dynamic (economic) priority scheduler based on work presented at the Hadoop User Group
meeting in Santa Clara in September and the HadoopCamp in New Orleans in November 2008.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message