Author: yhemanth
Date: Tue May 5 08:42:35 2009
New Revision: 771622
URL: http://svn.apache.org/viewvc?rev=771622&view=rev
Log:
HADOOP-5736. Update the capacity scheduler documentation for features like memory based scheduling, job initialization and removal of pre-emption. Contributed by Sreekanth Ramakrishnan.
Modified:
hadoop/core/trunk/CHANGES.txt
hadoop/core/trunk/conf/capacity-scheduler.xml.template
hadoop/core/trunk/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml
hadoop/core/trunk/src/docs/src/documentation/content/xdocs/cluster_setup.xml
hadoop/core/trunk/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
Modified: hadoop/core/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/CHANGES.txt?rev=771622&r1=771621&r2=771622&view=diff
==============================================================================
--- hadoop/core/trunk/CHANGES.txt (original)
+++ hadoop/core/trunk/CHANGES.txt Tue May 5 08:42:35 2009
@@ -536,6 +536,10 @@
HADOOP-5711. Change Namenode file close log to info. (szetszwo)
+ HADOOP-5736. Update the capacity scheduler documentation for features
+ like memory based scheduling, job initialization and removal of pre-emption.
+ (Sreekanth Ramakrishnan via yhemanth)
+
OPTIMIZATIONS
BUG FIXES
Modified: hadoop/core/trunk/conf/capacity-scheduler.xml.template
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/conf/capacity-scheduler.xml.template?rev=771622&r1=771621&r2=771622&view=diff
==============================================================================
--- hadoop/core/trunk/conf/capacity-scheduler.xml.template (original)
+++ hadoop/core/trunk/conf/capacity-scheduler.xml.template Tue May 5 08:42:35 2009
@@ -60,16 +60,13 @@
mapred.capacity-scheduler.task.default-pmem-percentage-in-vmem
-1
- If mapred.task.maxpmem is set to -1, this configuration will
- be used to calculate job's physical memory requirements as a percentage of
- the job's virtual memory requirements set via mapred.task.maxvmem. This
- property thus provides default value of physical memory for job's that
- don't explicitly specify physical memory requirements.
-
- If not explicitly set to a valid value, scheduler will not consider
- physical memory for scheduling even if virtual memory based scheduling is
- enabled(by setting valid values for both mapred.task.default.maxvmem and
- mapred.task.limit.maxvmem).
+ A percentage (float) of the default VM limit for jobs
+ (mapred.task.default.maxvm). This is the default RAM task-limit
+ associated with a task. Unless overridden by a job's setting, this
+ number defines the RAM task-limit.
+
+ If this property is missing, or set to an invalid value, scheduling
+ based on physical memory, RAM, is disabled.
Modified: hadoop/core/trunk/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml?rev=771622&r1=771621&r2=771622&view=diff
==============================================================================
--- hadoop/core/trunk/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml (original)
+++ hadoop/core/trunk/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml Tue May 5 08:42:35 2009
@@ -29,7 +29,9 @@
Purpose
- This document describes the Capacity Scheduler, a pluggable Map/Reduce scheduler for Hadoop which provides a way to share large clusters.
+ This document describes the Capacity Scheduler, a pluggable
+ Map/Reduce scheduler for Hadoop which provides a way to share
+ large clusters.
@@ -41,19 +43,17 @@
Support for multiple queues, where a job is submitted to a queue.
- Queues are guaranteed a fraction of the capacity of the grid (their
- 'guaranteed capacity') in the sense that a certain capacity of
- resources will be at their disposal. All jobs submitted to a
- queue will have access to the capacity guaranteed to the queue.
+ Queues are allocated a fraction of the capacity of the grid in the
+ sense that a certain capacity of resources will be at their
+ disposal. All jobs submitted to a queue will have access to the
+ capacity allocated to the queue.
- Free resources can be allocated to any queue beyond its guaranteed
- capacity. These excess allocated resources can be reclaimed and made
- available to another queue in order to meet its capacity guarantee.
-
-
- The scheduler guarantees that excess resources taken from a queue
- will be restored to it within N minutes of its need for them.
+ Free resources can be allocated to any queue beyond it's capacity.
+ When there is demand for these resources from queues running below
+ capacity at a future point in time, as tasks scheduled on these
+ resources complete, they will be assigned to jobs on queues
+ running below the capacity.
Queues optionally support job priorities (disabled by default).
@@ -61,7 +61,9 @@
Within a queue, jobs with higher priority will have access to the
queue's resources before jobs with lower priority. However, once a
- job is running, it will not be preempted for a higher priority job.
+ job is running, it will not be preempted for a higher priority job,
+ though new tasks from the higher priority job will be
+ preferentially scheduled.
In order to prevent one or more users from monopolizing its
@@ -84,59 +86,34 @@
Note that many of these steps can be, and will be, enhanced over time
to provide better algorithms.
- Whenever a TaskTracker is free, the Capacity Scheduler first picks a
- queue that needs to reclaim any resources the earliest (this is a queue
- whose resources were temporarily being used by some other queue and now
- needs access to those resources). If no such queue is found, it then picks
+
Whenever a TaskTracker is free, the Capacity Scheduler picks
a queue which has most free space (whose ratio of # of running slots to
- guaranteed capacity is the lowest).
+ capacity is the lowest).
- Once a queue is selected, the scheduler picks a job in the queue. Jobs
+
Once a queue is selected, the Scheduler picks a job in the queue. Jobs
are sorted based on when they're submitted and their priorities (if the
queue supports priorities). Jobs are considered in order, and a job is
selected if its user is within the user-quota for the queue, i.e., the
user is not already using queue resources above his/her limit. The
- scheduler also makes sure that there is enough free memory in the
+ Scheduler also makes sure that there is enough free memory in the
TaskTracker to tun the job's task, in case the job has special memory
requirements.
- Once a job is selected, the scheduler picks a task to run. This logic
+
Once a job is selected, the Scheduler picks a task to run. This logic
to pick a task remains unchanged from earlier versions.
- Reclaiming capacity
-
- Periodically, the scheduler determines:
-
- -
- if a queue needs to reclaim capacity. This happens when a queue has
- at least one task pending and part of its guaranteed capacity is
- being used by some other queue. If this happens, the scheduler notes
- the amount of resources it needs to reclaim for this queue within a
- specified period of time (the reclaim time).
-
- -
- if a queue has not received all the resources it needed to reclaim,
- and its reclaim time is about to expire. In this case, the scheduler
- needs to kill tasks from queues running over capacity. This it does
- by killing the tasks that started the latest.
-
-
-
-
-
-
Installation
- The capacity scheduler is available as a JAR file in the Hadoop
+
The Capacity Scheduler is available as a JAR file in the Hadoop
tarball under the contrib/capacity-scheduler directory. The name of
the JAR file would be on the lines of hadoop-*-capacity-scheduler.jar.
- You can also build the scheduler from source by executing
+
You can also build the Scheduler from source by executing
ant package, in which case it would be available under
build/contrib/capacity-scheduler.
- To run the capacity scheduler in your Hadoop installation, you need
+
To run the Capacity Scheduler in your Hadoop installation, you need
to put it on the CLASSPATH. The easiest way is to copy the
hadoop-*-capacity-scheduler.jar from
to HADOOP_HOME/lib. Alternatively, you can modify
@@ -148,9 +125,9 @@
Configuration
- Using the capacity scheduler
+ Using the Capacity Scheduler
- To make the Hadoop framework use the capacity scheduler, set up
+ To make the Hadoop framework use the Capacity Scheduler, set up
the following property in the site configuration:
@@ -168,7 +145,7 @@
Setting up queues
You can define multiple queues to which users can submit jobs with
- the capacity scheduler. To define multiple queues, you should edit
+ the Capacity Scheduler. To define multiple queues, you should edit
the site configuration for Hadoop and modify the
mapred.queue.names property.
@@ -186,8 +163,8 @@
Configuring properties for queues
- The capacity scheduler can be configured with several properties
- for each queue that control the behavior of the scheduler. This
+
The Capacity Scheduler can be configured with several properties
+ for each queue that control the behavior of the Scheduler. This
configuration is in the conf/capacity-scheduler.xml. By
default, the configuration is set up for one queue, named
default.
@@ -195,10 +172,10 @@
configuration, you should use the property name as
mapred.capacity-scheduler.queue.<queue-name>.<property-name>.
- For example, to define the property guaranteed-capacity
+
For example, to define the property capacity
for queue named research, you should specify the property
name as
- mapred.capacity-scheduler.queue.research.guaranteed-capacity.
+ mapred.capacity-scheduler.queue.research.capacity.
The properties defined for queues and their descriptions are
@@ -206,15 +183,10 @@
| Name | Description |
- | mapred.capacity-scheduler.queue.<queue-name>.guaranteed-capacity |
- Percentage of the number of slots in the cluster that are
- guaranteed to be available for jobs in this queue.
- The sum of guaranteed capacities for all queues should be less
- than or equal 100. |
-
- | mapred.capacity-scheduler.queue.<queue-name>.reclaim-time-limit |
- The amount of time, in seconds, before which resources
- distributed to other queues will be reclaimed. |
+
| mapred.capacity-scheduler.queue.<queue-name>.capacity |
+ Percentage of the number of slots in the cluster that are made
+ to be available for jobs in this queue. The sum of capacities
+ for all queues should be less than or equal 100. |
| mapred.capacity-scheduler.queue.<queue-name>.supports-priority |
If true, priorities of jobs will be taken into account in scheduling
@@ -237,27 +209,133 @@
- Configuring the capacity scheduler
- The capacity scheduler's behavior can be controlled through the
- following properties.
+ Memory management
+
+ The Capacity Scheduler supports scheduling of tasks on a
+ TaskTracker(TT) based on a job's memory requirements
+ and the availability of RAM and Virtual Memory (VMEM) on the TT node.
+ See the Hadoop
+ Map/Reduce tutorial for details on how the TT monitors
+ memory usage.
+ Currently the memory based scheduling is only supported
+ in Linux platform.
+ Memory-based scheduling works as follows:
+
+ - The absence of any one or more of three config parameters
+ or -1 being set as value of any of the parameters,
+
mapred.tasktracker.vmem.reserved,
+ mapred.task.default.maxvmem, or
+ mapred.task.limit.maxvmem, disables memory-based
+ scheduling, just as it disables memory monitoring for a TT. These
+ config parameters are described in the
+ Hadoop Map/Reduce
+ tutorial. The value of
+ mapred.tasktracker.vmem.reserved is
+ obtained from the TT via its heartbeat.
+
+ - If all the three mandatory parameters are set, the Scheduler
+ enables VMEM-based scheduling. First, the Scheduler computes the free
+ VMEM on the TT. This is the difference between the available VMEM on the
+ TT (the node's total VMEM minus the offset, both of which are sent by
+ the TT on each heartbeat)and the sum of VMs already allocated to
+ running tasks (i.e., sum of the VMEM task-limits). Next, the Scheduler
+ looks at the VMEM requirements for the job that's first in line to
+ run. If the job's VMEM requirements are less than the available VMEM on
+ the node, the job's task can be scheduled. If not, the Scheduler
+ ensures that the TT does not get a task to run (provided the job
+ has tasks to run). This way, the Scheduler ensures that jobs with
+ high memory requirements are not starved, as eventually, the TT
+ will have enough VMEM available. If the high-mem job does not have
+ any task to run, the Scheduler moves on to the next job.
+
+ - In addition to VMEM, the Capacity Scheduler can also consider
+ RAM on the TT node. RAM is considered the same way as VMEM. TTs report
+ the total RAM available on their node, and an offset. If both are
+ set, the Scheduler computes the available RAM on the node. Next,
+ the Scheduler figures out the RAM requirements of the job, if any.
+ As with VMEM, users can optionally specify a RAM limit for their job
+ (
mapred.task.maxpmem, described in the Map/Reduce
+ tutorial). The Scheduler also maintains a limit for this value
+ (mapred.capacity-scheduler.task.default-pmem-percentage-in-vmem,
+ described below). All these three values must be set for the
+ Scheduler to schedule tasks based on RAM constraints.
+
+ - The Scheduler ensures that jobs cannot ask for RAM or VMEM higher
+ than configured limits. If this happens, the job is failed when it
+ is submitted.
+
+
+
+ As described above, the additional scheduler-based config
+ parameters are as follows:
+
+
+ | Name | Description |
+ | mapred.capacity-scheduler.task.default-pmem-percentage-in-vmem |
+ A percentage of the default VMEM limit for jobs
+ (mapred.task.default.maxvmem). This is the default
+ RAM task-limit associated with a task. Unless overridden by a
+ job's setting, this number defines the RAM task-limit. |
+
+ | mapred.capacity-scheduler.task.limit.maxpmem |
+ Configuration which provides an upper limit to maximum physical
+ memory which can be specified by a job. If a job requires more
+ physical memory than what is specified in this limit then the same
+ is rejected. |
+
+
+
+
+ Job Initialization Parameters
+ Capacity scheduler lazily initializes the jobs before they are
+ scheduled, for reducing the memory footprint on jobtracker.
+ Following are the parameters, by which you can control the laziness
+ of the job initialization. The following parameters can be
+ configured in capacity-scheduler.xml
+
+ | Name | Description |
+
+ |
+ mapred.capacity-scheduler.queue.<queue-name>.maximum-initialized-jobs-per-user
+ |
+
+ Maximum number of jobs which are allowed to be pre-initialized for
+ a particular user in the queue. Once a job is scheduled, i.e.
+ it starts running, then that job is not considered
+ while scheduler computes the maximum job a user is allowed to
+ initialize.
+ |
+
- | Name | Description |
+
+ mapred.capacity-scheduler.init-poll-interval
+ |
+
+ Amount of time in miliseconds which is used to poll the scheduler
+ job queue to look for jobs to be initialized.
+ |
- | mapred.capacity-scheduler.reclaimCapacity.interval |
- The time interval, in seconds, between which the scheduler
- periodically determines whether capacity needs to be reclaimed for
- any queue. The default value is 5 seconds.
- |
+
+ mapred.capacity-scheduler.init-worker-threads
+ |
+
+ Number of worker threads which would be used by Initialization
+ poller to initialize jobs in a set of queue. If number mentioned
+ in property is equal to number of job queues then a thread is
+ assigned jobs from one queue. If the number configured is lesser than
+ number of queues, then a thread can get jobs from more than one queue
+ which it initializes in a round robin fashion. If the number configured
+ is greater than number of queues, then number of threads spawned
+ would be equal to number of job queues.
+ |
-
-
-
+
- Reviewing the configuration of the capacity scheduler
+ Reviewing the configuration of the Capacity Scheduler
Once the installation and configuration is completed, you can review
it after starting the Map/Reduce cluster from the admin UI.
@@ -271,7 +349,8 @@
Information column against each queue.
-
+
+
|