hadoop-mapreduce-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ma...@apache.org
Subject svn commit: r795454 - in /hadoop/mapreduce/trunk: CHANGES.txt conf/fair-scheduler.xml.template src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/PoolManager.java src/docs/src/documentation/content/xdocs/fair_scheduler.xml
Date Sun, 19 Jul 2009 00:27:06 GMT
Author: matei
Date: Sun Jul 19 00:27:05 2009
New Revision: 795454

URL: http://svn.apache.org/viewvc?rev=795454&view=rev
Log:
MAPREDUCE-546. Provide sample fair scheduler config file in conf/ and use it by default if
no other config file is specified.


Modified:
    hadoop/mapreduce/trunk/CHANGES.txt
    hadoop/mapreduce/trunk/conf/fair-scheduler.xml.template
    hadoop/mapreduce/trunk/src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/PoolManager.java
    hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/fair_scheduler.xml

Modified: hadoop/mapreduce/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/CHANGES.txt?rev=795454&r1=795453&r2=795454&view=diff
==============================================================================
--- hadoop/mapreduce/trunk/CHANGES.txt (original)
+++ hadoop/mapreduce/trunk/CHANGES.txt Sun Jul 19 00:27:05 2009
@@ -13,6 +13,9 @@
 
   NEW FEATURES
 
+    MAPREDUCE-546. Provide sample fair scheduler config file in conf/ and use
+    it by default if no other config file is specified. (Matei Zaharia)
+
     MAPREDUCE-551. Preemption support in the Fair Scheduler. (Matei Zaharia)
 
     HADOOP-5887. Sqoop should create tables in Hive metastore after importing

Modified: hadoop/mapreduce/trunk/conf/fair-scheduler.xml.template
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/conf/fair-scheduler.xml.template?rev=795454&r1=795453&r2=795454&view=diff
==============================================================================
--- hadoop/mapreduce/trunk/conf/fair-scheduler.xml.template (original)
+++ hadoop/mapreduce/trunk/conf/fair-scheduler.xml.template Sun Jul 19 00:27:05 2009
@@ -1,70 +1,12 @@
 <?xml version="1.0"?>
 
 <!--
-  This is a sample configuration file for the Fair Scheduler. For details
-  on the options, please refer to the fair scheduler documentation at
+  This file contains pool and user allocations for the Fair Scheduler.
+  Its format is explained in the Fair Scheduler documentation at
   http://hadoop.apache.org/core/docs/r0.21.0/fair_scheduler.html.
-
-  To create your own configuration, copy this file to conf/fair-scheduler.xml
-  and add the following property in mapred-site.xml to point Hadoop to the
-  file, replacing [HADOOP_HOME] with the path to your installation directory:
-    <property>
-      <name>mapred.fairscheduler.allocation.file</name>
-      <value>[HADOOP_HOME]/conf/fair-scheduler.xml</value>
-    </property>
-
-  Note that all the parameters in the configuration file below are optional,
-  including the parameters inside <pool> and <user> elements. It is only
-  necessary to set the ones you want to differ from the defaults.
+  The documentation also includes a sample config file.
 -->
 
 <allocations>
 
-  <!-- Example element for configuring a pool -->
-  <pool name="pool1">
-    <!-- Minimum shares of map and reduce slots. Defaults to 0. -->
-    <minMaps>10</minMaps>
-    <minReduces>5</minReduces>
-
-    <!-- Limit on running jobs in the pool. If more jobs are submitted,
-      only the first <maxRunningJobs> will be scheduled at any given time.
-      Defaults to infinity or the global poolMaxJobsDefault value below. -->
-    <maxRunningJobs>5</maxRunningJobs>
-
-    <!-- Number of seconds after which the pool can preempt other pools'
-      tasks to achieve its min share. Requires preemption to be enabled in
-      mapred-site.xml by setting mapred.fairscheduler.preemption to true.
-      Defaults to infinity (no preemption). -->
-    <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
-
-    <!-- Pool's weight in fair sharing calculations. Defaulti is 1.0. -->
-    <weight>1.0</weight>
-  </pool>
-
-  <!-- Example element for configuring a user -->
-  <user name="user1">
-    <!-- Limit on running jobs for the user across all pools. If more
-      jobs than this are submitted, only the first <maxRunningJobs> will
-      be scheduled at any given time. Defaults to infinity or the
-      userMaxJobsDefault value set below. -->
-    <maxRunningJobs>10</maxRunningJobs>
-  </user>
-
-  <!-- Default running job limit pools where it is not explicitly set. -->
-  <poolMaxJobsDefault>20</poolMaxJobsDefault>
-
-  <!-- Default running job limit users where it is not explicitly set. -->
-  <userMaxJobsDefault>10</userMaxJobsDefault>
-
-  <!-- Default min share preemption timeout for pools where it is not
-    explicitly configured, in seconds. Requires mapred.fairscheduler.preemption
-    to be set to true in your mapred-site.xml. -->
-  <defaultMinSharePreemptionTimeout>600</defaultMinSharePreemptionTimeout>
-
-  <!-- Preemption timeout for jobs below their fair share, in seconds. 
-    If a job is below half its fair share for this amount of time, it
-    is allowed to kill tasks from other jobs to go up to its fair share.
-    Requires mapred.fairscheduler.preemption to be true in mapred-site.xml. -->
-  <fairSharePreemptionTimeout>600</fairSharePreemptionTimeout>
-
 </allocations>

Modified: hadoop/mapreduce/trunk/src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/PoolManager.java
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/PoolManager.java?rev=795454&r1=795453&r2=795454&view=diff
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/PoolManager.java
(original)
+++ hadoop/mapreduce/trunk/src/contrib/fairscheduler/src/java/org/apache/hadoop/mapred/PoolManager.java
Sun Jul 19 00:27:05 2009
@@ -20,6 +20,8 @@
 
 import java.io.File;
 import java.io.IOException;
+import java.net.URL;
+import java.net.URLConnection;
 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.Collections;
@@ -87,7 +89,11 @@
   // below half its fair share for this long, it is allowed to preempt tasks.
   private long fairSharePreemptionTimeout = Long.MAX_VALUE;
   
-  private String allocFile; // Path to XML file containing allocations
+  private Object allocFile; // Path to XML file containing allocations. This
+                            // is either a URL to specify a classpath resource
+                            // (if the fair-scheduler.xml on the classpath is
+                            // used) or a String to specify an absolute path (if
+                            // mapred.fairscheduler.allocation.file is used).
   private String poolNameProperty; // Jobconf property to use for determining a
                                    // job's pool name (default: user.name)
   
@@ -103,8 +109,14 @@
         "mapred.fairscheduler.poolnameproperty", "user.name");
     this.allocFile = conf.get("mapred.fairscheduler.allocation.file");
     if (allocFile == null) {
-      LOG.warn("No mapred.fairscheduler.allocation.file given in jobconf - " +
-          "the fair scheduler will not use any queues.");
+      // No allocation file specified in jobconf. Use the default allocation
+      // file, fair-scheduler.xml, looking for it on the classpath.
+      allocFile = new Configuration().getResource("fair-scheduler.xml");
+      if (allocFile == null) {
+        LOG.error("The fair scheduler allocation file fair-scheduler.xml was "
+            + "not found on the classpath, and no other config file is given "
+            + "through mapred.fairscheduler.allocation.file.");
+      }
     }
     reloadAllocs();
     lastSuccessfulReload = System.currentTimeMillis();
@@ -133,8 +145,16 @@
     if (time > lastReloadAttempt + ALLOC_RELOAD_INTERVAL) {
       lastReloadAttempt = time;
       try {
-        File file = new File(allocFile);
-        long lastModified = file.lastModified();
+        // Get last modified time of alloc file depending whether it's a String
+        // (for a path name) or an URL (for a classloader resource)
+        long lastModified;
+        if (allocFile instanceof String) {
+          File file = new File((String) allocFile);
+          lastModified = file.lastModified();
+        } else { // allocFile is an URL
+          URLConnection conn = ((URL) allocFile).openConnection();
+          lastModified = conn.getLastModified();
+        }
         if (lastModified > lastSuccessfulReload &&
             time > lastModified + ALLOC_RELOAD_WAIT) {
           reloadAllocs();
@@ -197,7 +217,12 @@
       DocumentBuilderFactory.newInstance();
     docBuilderFactory.setIgnoringComments(true);
     DocumentBuilder builder = docBuilderFactory.newDocumentBuilder();
-    Document doc = builder.parse(new File(allocFile));
+    Document doc;
+    if (allocFile instanceof String) {
+      doc = builder.parse(new File((String) allocFile));
+    } else {
+      doc = builder.parse(allocFile.toString());
+    }
     Element root = doc.getDocumentElement();
     if (!"allocations".equals(root.getTagName()))
       throw new AllocationConfigurationException("Bad fair scheduler config " + 

Modified: hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/fair_scheduler.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/fair_scheduler.xml?rev=795454&r1=795453&r2=795454&view=diff
==============================================================================
--- hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/fair_scheduler.xml (original)
+++ hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/fair_scheduler.xml Sun
Jul 19 00:27:05 2009
@@ -110,12 +110,14 @@
       <p>
        You will also need to set the following property in the Hadoop config 
        file  <em>HADOOP_CONF_DIR/mapred-site.xml</em> to have Hadoop use 
-       the fair scheduler: <br/><br/>
-       <code>&lt;property&gt;</code><br/> 
-       <code>&nbsp;&nbsp;&lt;name&gt;mapred.jobtracker.taskScheduler&lt;/name&gt;</code><br/>
-       <code>&nbsp;&nbsp;&lt;value&gt;org.apache.hadoop.mapred.FairScheduler&lt;/value&gt;</code><br/>
-       <code>&lt;/property&gt;</code>
+       the fair scheduler:
       </p>
+<source>
+&lt;property&gt;
+  &lt;name&gt;mapred.jobtracker.taskScheduler&lt;/name&gt;
+  &lt;value&gt;org.apache.hadoop.mapred.FairScheduler&lt;/value&gt;
+&lt;/property&gt;
+</source>
       <p>
         Once you restart the cluster, you can check that the fair scheduler 
         is running by going to <em>http://&lt;jobtracker URL&gt;/scheduler</em>

@@ -133,8 +135,10 @@
       <title>Configuration</title>
       <p>
         The Fair Scheduler contains configuration in two places -- algorithm
-        parameters are set in <em>mapred-site.xml</em>, while a separate XML
-        file called the <em>allocation file</em> can be used to configure
+        parameters are set in <em>HADOOP_CONF_DIR/mapred-site.xml</em>, while

+        a separate XML file called the <em>allocation file</em>, 
+        located by default in
+        <em>HADOOP_CONF_DIR/fair-scheduler.xml</em>, is used to configure
         pools, minimum shares, running job limits and preemption timeouts.
         The allocation file is reloaded periodically at runtime, 
         allowing you to change pool settings without restarting 
@@ -142,10 +146,7 @@
       </p>
       <p>
         For a minimal installation, to just get equal sharing between users,
-        you will not need to set up an allocation file. If you do set up an
-        allocation file, you will need to tell the scheduler where to
-        find it by setting the <em>mapred.fairscheduler.allocation.file</em>
-        parameter in <em>mapred-site.xml</em> as described below.
+        you will not need to edit the allocation file.
       </p>
       <section>
       <title>Scheduler Parameters in mapred-site.xml</title>
@@ -160,19 +161,6 @@
           </tr>
           <tr>
           <td>
-            mapred.fairscheduler.allocation.file
-          </td>
-          <td>
-            Specifies an absolute path to an XML file which contains minimum
-            shares for each pool, per-pool and per-user limits on number of
-            running jobs, and preemption timeouts. If this property is not 
-            set, these features are not used.
-            The <a href="#Allocation+File+Format">allocation file
-            format</a> is described later.
-          </td>
-          </tr>
-          <tr>
-          <td>
             mapred.fairscheduler.preemption
           </td>
           <td>
@@ -202,6 +190,16 @@
             pool.
           </td>
           </tr>
+          <tr>
+          <td>
+            mapred.fairscheduler.allocation.file
+          </td>
+          <td>
+            Can be used to have the scheduler use a different allocation file
+            than the default one (<em>HADOOP_CONF_DIR/fair-scheduler.xml</em>).
+            Must be an absolute path to the allocation file.
+          </td>
+          </tr>
         </table>
         <p><strong>Advanced Parameters:</strong></p>
         <table>
@@ -342,13 +340,15 @@
         </table>
       </section>  
       <section>
-        <title>Allocation File Format</title>
+        <title>Allocation File (fair-scheduler.xml)</title>
         <p>
         The allocation file configures minimum shares, running job
         limits, weights and preemption timeouts for each pool.
-        An example is provided in 
-        <em>HADOOP_HOME/conf/fair-scheduler.xml.template</em>.
-        The allocation file can contain the following types of elements:
+        Only users/pools whose values differ from the defaults need to be
+        explicitly configured in this file.
+        The allocation file is located in
+        <em>HADOOP_HOME/conf/fair-scheduler.xml</em>.
+        It can contain the following types of elements:
         </p>
         <ul>
         <li><em>pool</em> elements, which configure each pool.
@@ -360,7 +360,7 @@
           to limit the number of jobs from the 
           pool to run at once (defaults to infinite).</li>
           <li><em>weight</em>, to share the cluster 
-          non-proportionally with other pools (defaults to 1.0).</li>
+          non-proportionally with other pools. For example, a pool with weight 2.0 will get
a 2x higher share than other pools. The default weight is 1.0.</li>
           <li><em>minSharePreemptionTimeout</em>, the
             number of seconds the pool will wait before
             killing other pools' tasks if it is below its minimum share
@@ -391,35 +391,41 @@
         </p>
         <p>
         An example allocation file is given below : </p>
-        <p>
-        <code>&lt;?xml version="1.0"?&gt; </code> <br/>
-        <code>&lt;allocations&gt;</code> <br/> 
-        <code>&nbsp;&nbsp;&lt;pool name="sample_pool"&gt;</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;&lt;minMaps&gt;5&lt;/minMaps&gt;</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;&lt;minReduces&gt;5&lt;/minReduces&gt;</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;&lt;weight&gt;2.0&lt;/weight&gt;</code><br/>
-        <code>&nbsp;&nbsp;&lt;/pool&gt;</code><br/>
-        <code>&nbsp;&nbsp;&lt;user name="sample_user"&gt;</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;&lt;maxRunningJobs&gt;6&lt;/maxRunningJobs&gt;</code><br/>
-        <code>&nbsp;&nbsp;&lt;/user&gt;</code><br/>
-        <code>&nbsp;&nbsp;&lt;userMaxJobsDefault&gt;3&lt;/userMaxJobsDefault&gt;</code><br/>
-        <code>&lt;/allocations&gt;</code>
-        </p>
+<source>
+&lt;?xml version="1.0"?&gt;  
+&lt;allocations&gt;  
+  &lt;pool name="sample_pool"&gt;
+    &lt;minMaps&gt;5&lt;/minMaps&gt;
+    &lt;minReduces&gt;5&lt;/minReduces&gt;
+    &lt;minSharePreemptionTimeout&gt;300&lt;/minSharePreemptionTimeout&gt;
+  &lt;/pool&gt;
+  &lt;user name="sample_user"&gt;
+    &lt;maxRunningJobs&gt;6&lt;/maxRunningJobs&gt;
+  &lt;/user&gt;
+  &lt;userMaxJobsDefault&gt;3&lt;/userMaxJobsDefault&gt;
+  &lt;fairSharePreemptionTimeout&gt;600&lt;/fairSharePreemptionTimeout&gt;
+&lt;/allocations&gt;
+</source>
         <p>
         This example creates a pool sample_pool with a guarantee of 5 map 
-        slots and 5 reduce slots. The pool also has a weight of 2.0, meaning 
-        it has a 2x higher share of the cluster than other pools (the default 
-        weight is 1). Finally, the example limits the number of running jobs 
+        slots and 5 reduce slots. The pool also has a minimum share preemption
+        timeout of 300 seconds (5 minutes), meaning that if it does not get its
+        guaranteed share within this time, it is allowed to kill tasks from
+        other pools to achieve its share.
+        The example also limits the number of running jobs 
         per user to 3, except for sample_user, who can run 6 jobs concurrently. 
+        Finally, the example sets a fair share preemption timeout of 600 seconds
+        (10 minutes). If a job is below half its fair share for 10 minutes, it
+        will be allowed to kill tasks from other jobs to achieve its share.
+        Note that the preemption settings require preemption to be
+        enabled in <em>mapred-site.xml</em> as described earlier.
+        </p>
+        <p>
         Any pool not defined in the allocation file will have no guaranteed 
-        capacity and a weight of 1.0. Also, any pool or user with no max 
+        capacity and no preemption timeout. Also, any pool or user with no max 
         running jobs set in the file will be allowed to run an unlimited 
         number of jobs.
         </p>
-        <p>
-        A more detailed example file, setting preemption timeouts as well,
-        is available in <em>HADOOP_HOME/conf/fair-scheduler.xml.template</em>.
-        </p>
       </section>
     </section>
     <section>



Mime
View raw message