hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tomwh...@apache.org
Subject svn commit: r951482 - in /hadoop/common/branches/branch-0.21: ./ src/docs/src/documentation/content/xdocs/
Date Fri, 04 Jun 2010 16:35:48 GMT
Author: tomwhite
Date: Fri Jun  4 16:35:47 2010
New Revision: 951482

URL: http://svn.apache.org/viewvc?rev=951482&view=rev
Log:
Merge -r 951479:951480 from trunk to branch-0.21. Fixes: HADOOP-6738

Added:
    hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/commands_manual.xml
      - copied unchanged from r951480, hadoop/common/trunk/src/docs/src/documentation/content/xdocs/commands_manual.xml
    hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/hod_scheduler.xml
      - copied unchanged from r951480, hadoop/common/trunk/src/docs/src/documentation/content/xdocs/hod_scheduler.xml
Modified:
    hadoop/common/branches/branch-0.21/CHANGES.txt
    hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/cluster_setup.xml
    hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/single_node_setup.xml
    hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml

Modified: hadoop/common/branches/branch-0.21/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.21/CHANGES.txt?rev=951482&r1=951481&r2=951482&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.21/CHANGES.txt (original)
+++ hadoop/common/branches/branch-0.21/CHANGES.txt Fri Jun  4 16:35:47 2010
@@ -867,6 +867,9 @@ Release 0.21.0 - Unreleased
     HADOOP-6585.  Add FileStatus#isDirectory and isFile.  (Eli Collins via
     tomwhite)
 
+    HADOOP-6738.  Move cluster_setup.xml from MapReduce to Common.
+    (Tom White via tomwhite)
+
   OPTIMIZATIONS
 
     HADOOP-5595. NameNode does not need to run a replicator to choose a

Modified: hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/cluster_setup.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/cluster_setup.xml?rev=951482&r1=951481&r2=951482&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/cluster_setup.xml (original)
+++ hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/cluster_setup.xml Fri Jun  4 16:35:47 2010
@@ -33,20 +33,20 @@
       Hadoop clusters ranging from a few nodes to extremely large clusters with 
       thousands of nodes.</p>
       <p>
-      To play with Hadoop, you may first want to install Hadoop on a single machine (see <a href="single_node_setup.html"> Single Node Setup</a>).
+      To play with Hadoop, you may first want to install Hadoop on a single machine (see <a href="single_node_setup.html"> Hadoop Quick Start</a>).
       </p>
     </section>
     
     <section>
-      <title>Prerequisites</title>
+      <title>Pre-requisites</title>
       
       <ol>
         <li>
-          Make sure all <a href="single_node_setup.html#PreReqs">required software</a> 
+          Make sure all <a href="single_node_setup.html#PreReqs">requisite</a> software 
           is installed on all nodes in your cluster.
         </li>
         <li>
-          <a href="single_node_setup.html#Download">Download</a> the Hadoop software.
+          <a href="single_node_setup.html#Download">Get</a> the Hadoop software.
         </li>
       </ol>
     </section>
@@ -81,21 +81,23 @@
         <ol>
           <li>
             Read-only default configuration - 
-            <a href="ext:common-default">src/common/common-default.xml</a>, 
-            <a href="ext:hdfs-default">src/hdfs/hdfs-default.xml</a> and 
-            <a href="ext:mapred-default">src/mapred/mapred-default.xml</a>.
+            <a href="ext:common-default">src/core/core-default.xml</a>, 
+            <a href="ext:hdfs-default">src/hdfs/hdfs-default.xml</a>, 
+            <a href="ext:mapred-default">src/mapred/mapred-default.xml</a> and
+            <a href="ext:mapred-queues">conf/mapred-queues.xml.template</a>.
           </li>
           <li>
             Site-specific configuration - 
-            <em>conf/core-site.xml</em>, 
-            <em>conf/hdfs-site.xml</em> and 
-            <em>conf/mapred-site.xml</em>.
+            <a href="#core-site.xml">conf/core-site.xml</a>, 
+            <a href="#hdfs-site.xml">conf/hdfs-site.xml</a>, 
+            <a href="#mapred-site.xml">conf/mapred-site.xml</a> and
+            <a href="#mapred-queues.xml">conf/mapred-queues.xml</a>.
           </li>
         </ol>
       
         <p>To learn more about how the Hadoop framework is controlled by these 
-        configuration files see
-        <a href="ext:api/org/apache/hadoop/conf/configuration">Class Configuration</a>.</p>
+        configuration files, look 
+        <a href="ext:api/org/apache/hadoop/conf/configuration">here</a>.</p>
       
         <p>Additionally, you can control the Hadoop scripts found in the 
         <code>bin/</code> directory of the distribution, by setting site-specific 
@@ -163,9 +165,8 @@
           <title>Configuring the Hadoop Daemons</title>
           
           <p>This section deals with important parameters to be specified in the
-          following:
-          <br/>
-          <code>conf/core-site.xml</code>:</p>
+          following:</p>
+          <anchor id="core-site.xml"/><p><code>conf/core-site.xml</code>:</p>
 
 		  <table>
   		    <tr>
@@ -180,7 +181,7 @@
             </tr>
           </table>
 
-      <p><br/><code>conf/hdfs-site.xml</code>:</p>
+      <anchor id="hdfs-site.xml"/><p><code>conf/hdfs-site.xml</code>:</p>
           
       <table>   
         <tr>
@@ -212,7 +213,7 @@
 		    </tr>
       </table>
 
-      <p><br/><code>conf/mapred-site.xml</code>:</p>
+      <anchor id="mapred-site.xml"/><p><code>conf/mapred-site.xml</code>:</p>
 
       <table>
           <tr>
@@ -221,12 +222,12 @@
           <th>Notes</th>
         </tr>
         <tr>
-          <td>mapred.job.tracker</td>
+          <td>mapreduce.jobtracker.address</td>
           <td>Host or IP and port of <code>JobTracker</code>.</td>
           <td><em>host:port</em> pair.</td>
         </tr>
 		    <tr>
-		      <td>mapred.system.dir</td>
+		      <td>mapreduce.jobtracker.system.dir</td>
 		      <td>
 		        Path on the HDFS where where the Map/Reduce framework stores 
 		        system files e.g. <code>/hadoop/mapred/system/</code>.
@@ -237,7 +238,7 @@
 		      </td>
 		    </tr>
 		    <tr>
-		      <td>mapred.local.dir</td>
+		      <td>mapreduce.cluster.local.dir</td>
 		      <td>
 		        Comma-separated list of paths on the local filesystem where 
 		        temporary Map/Reduce data is written.
@@ -264,7 +265,7 @@
 		      </td>
 		    </tr>
 		    <tr>
-		      <td>mapred.hosts/mapred.hosts.exclude</td>
+		      <td>mapreduce.jobtracker.hosts.filename/mapreduce.jobtracker.hosts.exclude.filename</td>
 		      <td>List of permitted/excluded TaskTrackers.</td>
 		      <td>
 		        If necessary, use these files to control the list of allowable 
@@ -272,82 +273,331 @@
 		      </td>
   		    </tr>
         <tr>
-          <td>mapred.queue.names</td>
-          <td>Comma separated list of queues to which jobs can be submitted.</td>
+          <td>mapreduce.cluster.job-authorization-enabled</td>
+          <td>Boolean, specifying whether job ACLs are supported for 
+              authorizing view and modification of a job</td>
           <td>
-            The Map/Reduce system always supports atleast one queue
-            with the name as <em>default</em>. Hence, this parameter's
-            value should always contain the string <em>default</em>.
-            Some job schedulers supported in Hadoop, like the 
-            <a href="http://hadoop.apache.org/mapreduce/docs/current/capacity_scheduler.html">Capacity Scheduler</a>, 
-            support multiple queues. If such a scheduler is
-            being used, the list of configured queue names must be
-            specified here. Once queues are defined, users can submit
-            jobs to a queue using the property name 
-            <em>mapred.job.queue.name</em> in the job configuration.
-            There could be a separate 
-            configuration file for configuring properties of these 
-            queues that is managed by the scheduler. 
-            Refer to the documentation of the scheduler for information on 
-            the same.
+            If <em>true</em>, job ACLs would be checked while viewing or
+            modifying a job. More details are available at 
+            <a href ="ext:mapred-tutorial/JobAuthorization">Job Authorization</a>. 
           </td>
         </tr>
-        <tr>
-          <td>mapred.acls.enabled</td>
-          <td>Specifies whether ACLs are supported for controlling job
-              submission and administration</td>
-          <td>
-            If <em>true</em>, ACLs would be checked while submitting
-            and administering jobs. ACLs can be specified using the
-            configuration parameters of the form
-            <em>mapred.queue.queue-name.acl-name</em>, defined below.
-          </td>
-        </tr>
-		  </table>
-      
-      <p><br/><code> conf/mapred-queue-acls.xml</code></p>
-      
-      <table>
-       <tr>
-          <th>Parameter</th>
-          <th>Value</th> 
-          <th>Notes</th>
-       </tr>
-        <tr>
-          <td>mapred.queue.<em>queue-name</em>.acl-submit-job</td>
-          <td>List of users and groups that can submit jobs to the
-              specified <em>queue-name</em>.</td>
-          <td>
-            The list of users and groups are both comma separated
-            list of names. The two lists are separated by a blank.
-            Example: <em>user1,user2 group1,group2</em>.
-            If you wish to define only a list of groups, provide
-            a blank at the beginning of the value.
-          </td>
-        </tr>
-        <tr>
-          <td>mapred.queue.<em>queue-name</em>.acl-administer-job</td>
-          <td>List of users and groups that can change the priority
-              or kill jobs that have been submitted to the
-              specified <em>queue-name</em>.</td>
-          <td>
-            The list of users and groups are both comma separated
-            list of names. The two lists are separated by a blank.
-            Example: <em>user1,user2 group1,group2</em>.
-            If you wish to define only a list of groups, provide
-            a blank at the beginning of the value. Note that an
-            owner of a job can always change the priority or kill
-            his/her own job, irrespective of the ACLs.
-          </td>
-        </tr>
-      </table>
-      
+  		    
+		  </table>      
 
           <p>Typically all the above parameters are marked as 
           <a href="ext:api/org/apache/hadoop/conf/configuration/final_parameters">
           final</a> to ensure that they cannot be overriden by user-applications.
           </p>
 
+          <anchor id="mapred-queues.xml"/><p><code>conf/mapred-queues.xml
+          </code>:</p>
+          <p>This file is used to configure the queues in the Map/Reduce
+          system. Queues are abstract entities in the JobTracker that can be
+          used to manage collections of jobs. They provide a way for 
+          administrators to organize jobs in specific ways and to enforce 
+          certain policies on such collections, thus providing varying
+          levels of administrative control and management functions on jobs.
+          </p> 
+          <p>One can imagine the following sample scenarios:</p>
+          <ul>
+            <li> Jobs submitted by a particular group of users can all be 
+            submitted to one queue. </li> 
+            <li> Long running jobs in an organization can be submitted to a
+            queue. </li>
+            <li> Short running jobs can be submitted to a queue and the number
+            of jobs that can run concurrently can be restricted. </li> 
+          </ul> 
+          <p>The usage of queues is closely tied to the scheduler configured
+          at the JobTracker via <em>mapreduce.jobtracker.taskscheduler</em>.
+          The degree of support of queues depends on the scheduler used. Some
+          schedulers support a single queue, while others support more complex
+          configurations. Schedulers also implement the policies that apply 
+          to jobs in a queue. Some schedulers, such as the Fairshare scheduler,
+          implement their own mechanisms for collections of jobs and do not rely
+          on queues provided by the framework. The administrators are 
+          encouraged to refer to the documentation of the scheduler they are
+          interested in for determining the level of support for queues.</p>
+          <p>The Map/Reduce framework supports some basic operations on queues
+          such as job submission to a specific queue, access control for queues,
+          queue states, viewing configured queues and their properties
+          and refresh of queue properties. In order to fully implement some of
+          these operations, the framework takes the help of the configured
+          scheduler.</p>
+          <p>The following types of queue configurations are possible:</p>
+          <ul>
+            <li> Single queue: The default configuration in Map/Reduce comprises
+            of a single queue, as supported by the default scheduler. All jobs
+            are submitted to this default queue which maintains jobs in a priority
+            based FIFO order.</li>
+            <li> Multiple single level queues: Multiple queues are defined, and
+            jobs can be submitted to any of these queues. Different policies
+            can be applied to these queues by schedulers that support this 
+            configuration to provide a better level of support. For example,
+            the <a href="ext:capacity-scheduler">capacity scheduler</a>
+            provides ways of configuring different 
+            capacity and fairness guarantees on these queues.</li>
+            <li> Hierarchical queues: Hierarchical queues are a configuration in
+            which queues can contain other queues within them recursively. The
+            queues that contain other queues are referred to as 
+            container queues. Queues that do not contain other queues are 
+            referred as leaf or job queues. Jobs can only be submitted to leaf
+            queues. Hierarchical queues can potentially offer a higher level 
+            of control to administrators, as schedulers can now build a
+            hierarchy of policies where policies applicable to a container
+            queue can provide context for policies applicable to queues it
+            contains. It also opens up possibilities for delegating queue
+            administration where administration of queues in a container queue
+            can be turned over to a different set of administrators, within
+            the context provided by the container queue. For example, the
+            <a href="ext:capacity-scheduler">capacity scheduler</a>
+            uses hierarchical queues to partition capacity of a cluster
+            among container queues, and allowing queues they contain to divide
+            that capacity in more ways.</li> 
+          </ul>
+
+          <p>Most of the configuration of the queues can be refreshed/reloaded
+          without restarting the Map/Reduce sub-system by editing this
+          configuration file as described in the section on
+          <a href="commands_manual.html#RefreshQueues">reloading queue 
+          configuration</a>.
+          Not all configuration properties can be reloaded of course,
+          as will description of each property below explain.</p>
+
+          <p>The format of conf/mapred-queues.xml is different from the other 
+          configuration files, supporting nested configuration
+          elements to support hierarchical queues. The format is as follows:
+          </p>
+
+          <source>
+          &lt;queues aclsEnabled="$aclsEnabled"&gt;
+            &lt;queue&gt;
+              &lt;name&gt;$queue-name&lt;/name&gt;
+              &lt;state&gt;$state&lt;/state&gt;
+              &lt;queue&gt;
+                &lt;name&gt;$child-queue1&lt;/name&gt;
+                &lt;properties&gt;
+                   &lt;property key="$key" value="$value"/&gt;
+                   ...
+                &lt;/properties&gt;
+                &lt;queue&gt;
+                  &lt;name&gt;$grand-child-queue1&lt;/name&gt;
+                  ...
+                &lt;/queue&gt;
+              &lt;/queue&gt;
+              &lt;queue&gt;
+                &lt;name&gt;$child-queue2&lt;/name&gt;
+                ...
+              &lt;/queue&gt;
+              ...
+              ...
+              ...
+              &lt;queue&gt;
+                &lt;name&gt;$leaf-queue&lt;/name&gt;
+                &lt;acl-submit-job&gt;$acls&lt;/acl-submit-job&gt;
+                &lt;acl-administer-jobs&gt;$acls&lt;/acl-administer-jobs&gt;
+                &lt;properties&gt;
+                   &lt;property key="$key" value="$value"/&gt;
+                   ...
+                &lt;/properties&gt;
+              &lt;/queue&gt;
+            &lt;/queue&gt;
+          &lt;/queues&gt;
+          </source>
+          <table>
+            <tr>
+              <th>Tag/Attribute</th>
+              <th>Value</th>
+              <th>
+              	<a href="commands_manual.html#RefreshQueues">Refresh-able?</a>
+              </th>
+              <th>Notes</th>
+            </tr>
+
+            <tr>
+              <td><anchor id="queues_tag"/>queues</td>
+              <td>Root element of the configuration file.</td>
+              <td>Not-applicable</td>
+              <td>All the queues are nested inside this root element of the
+              file. There can be only one root queues element in the file.</td>
+            </tr>
+
+            <tr>
+              <td>aclsEnabled</td>
+              <td>Boolean attribute to the
+              <a href="#queues_tag"><em>&lt;queues&gt;</em></a> tag
+              specifying whether ACLs are supported for controlling job
+              submission and administration for <em>all</em> the queues
+              configured.
+              </td>
+              <td>Yes</td>
+              <td>If <em>false</em>, ACLs are ignored for <em>all</em> the
+              configured queues. <br/><br/>
+              If <em>true</em>, the user and group details of the user
+              are checked against the configured ACLs of the corresponding
+              job-queue while submitting and administering jobs. ACLs can be
+              specified for each queue using the queue-specific tags
+              "acl-$acl_name", defined below. ACLs are checked only against
+              the job-queues, i.e. the leaf-level queues; ACLs configured
+              for the rest of the queues in the hierarchy are ignored.
+              </td>
+            </tr>
+
+            <tr>
+              <td><anchor id="queue_tag"/>queue</td>
+              <td>A child element of the
+              <a href="#queues_tag"><em>&lt;queues&gt;</em></a> tag or another
+              <a href="#queue_tag"><em>&lt;queue&gt;</em></a>. Denotes a queue
+              in the system.
+              </td>
+              <td>Not applicable</td>
+              <td>Queues can be hierarchical and so this element can contain
+              children of this same type.</td>
+            </tr>
+
+            <tr>
+              <td>name</td>
+              <td>Child element of a 
+              <a href="#queue_tag"><em>&lt;queue&gt;</em></a> specifying the
+              name of the queue.</td>
+              <td>No</td>
+              <td>Name of the queue cannot contain the character <em>":"</em>
+              which is reserved as the queue-name delimiter when addressing a
+              queue in a hierarchy.</td>
+            </tr>
+
+            <tr>
+              <td>state</td>
+              <td>Child element of a
+              <a href="#queue_tag"><em>&lt;queue&gt;</em></a> specifying the
+              state of the queue.
+              </td>
+              <td>Yes</td>
+              <td>Each queue has a corresponding state. A queue in
+              <em>'running'</em> state can accept new jobs, while a queue in
+              <em>'stopped'</em> state will stop accepting any new jobs. State
+              is defined and respected by the framework only for the
+              leaf-level queues and is ignored for all other queues.
+              <br/><br/>
+              The state of the queue can be viewed from the command line using
+              <code>'bin/mapred queue'</code> command and also on the the Web
+              UI.<br/><br/>
+              Administrators can stop and start queues at runtime using the
+              feature of <a href="commands_manual.html#RefreshQueues">reloading
+              queue configuration</a>. If a queue is stopped at runtime, it
+              will complete all the existing running jobs and will stop
+              accepting any new jobs.
+              </td>
+            </tr>
+
+            <tr>
+              <td>acl-submit-job</td>
+              <td>Child element of a
+              <a href="#queue_tag"><em>&lt;queue&gt;</em></a> specifying the
+              list of users and groups that can submit jobs to the specified
+              queue.</td>
+              <td>Yes</td>
+              <td>
+              Applicable only to leaf-queues.<br/><br/>
+              The list of users and groups are both comma separated
+              list of names. The two lists are separated by a blank.
+              Example: <em>user1,user2 group1,group2</em>.
+              If you wish to define only a list of groups, provide
+              a blank at the beginning of the value.
+              <br/><br/>
+              </td>
+            </tr>
+
+            <tr>
+              <td>acl-administer-job</td>
+              <td>Child element of a
+              <a href="#queue_tag"><em>&lt;queue&gt;</em></a> specifying the
+              list of users and groups that can change the priority of a job
+              or kill a job that has been submitted to the specified queue.
+              </td>
+              <td>Yes</td>
+              <td>
+              Applicable only to leaf-queues.<br/><br/>
+              The list of users and groups are both comma separated
+              list of names. The two lists are separated by a blank.
+              Example: <em>user1,user2 group1,group2</em>.
+              If you wish to define only a list of groups, provide
+              a blank at the beginning of the value. Note that an
+              owner of a job can always change the priority or kill
+              his/her own job, irrespective of the ACLs.
+              </td>
+            </tr>
+
+            <tr>
+              <td><anchor id="properties_tag"/>properties</td>
+              <td>Child element of a 
+              <a href="#queue_tag"><em>&lt;queue&gt;</em></a> specifying the
+              scheduler specific properties.</td>
+              <td>Not applicable</td>
+              <td>The scheduler specific properties are the children of this
+              element specified as a group of &lt;property&gt; tags described
+              below. The JobTracker completely ignores these properties. These
+              can be used as per-queue properties needed by the scheduler
+              being configured. Please look at the scheduler specific
+              documentation as to how these properties are used by that
+              particular scheduler.
+              </td>
+            </tr>
+
+            <tr>
+              <td><anchor id="property_tag"/>property</td>
+              <td>Child element of
+              <a href="#properties_tag"><em>&lt;properties&gt;</em></a> for a
+              specific queue.</td>
+              <td>Not applicable</td>
+              <td>A single scheduler specific queue-property. Ignored by
+              the JobTracker and used by the scheduler that is configured.</td>
+            </tr>
+
+            <tr>
+              <td>key</td>
+              <td>Attribute of a
+              <a href="#property_tag"><em>&lt;property&gt;</em></a> for a
+              specific queue.</td>
+              <td>Scheduler-specific</td>
+              <td>The name of a single scheduler specific queue-property.</td>
+            </tr>
+
+            <tr>
+              <td>value</td>
+              <td>Attribute of a
+              <a href="#property_tag"><em>&lt;property&gt;</em></a> for a
+              specific queue.</td>
+              <td>Scheduler-specific</td>
+              <td>The value of a single scheduler specific queue-property.
+              The value can be anything that is left for the proper
+              interpretation by the scheduler that is configured.</td>
+            </tr>
+
+         </table>
+
+          <p>Once the queues are configured properly and the Map/Reduce
+          system is up and running, from the command line one can
+          <a href="commands_manual.html#QueuesList">get the list
+          of queues</a> and
+          <a href="commands_manual.html#QueuesInfo">obtain
+          information specific to each queue</a>. This information is also
+          available from the web UI. On the web UI, queue information can be
+          seen by going to queueinfo.jsp, linked to from the queues table-cell
+          in the cluster-summary table. The queueinfo.jsp prints the hierarchy
+          of queues as well as the specific information for each queue.
+          </p>
+
+          <p> Users can submit jobs only to a
+          leaf-level queue by specifying the fully-qualified queue-name for
+          the property name <em>mapreduce.job.queuename</em> in the job
+          configuration. The character ':' is the queue-name delimiter and so,
+          for e.g., if one wants to submit to a configured job-queue 'Queue-C'
+          which is one of the sub-queues of 'Queue-B' which in-turn is a
+          sub-queue of 'Queue-A', then the job configuration should contain
+          property <em>mapreduce.job.queuename</em> set to the <em>
+          &lt;value&gt;Queue-A:Queue-B:Queue-C&lt;/value&gt;</em></p>
+         </section>
           <section>
             <title>Real-World Cluster Configurations</title>
             
@@ -383,7 +633,7 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.reduce.parallel.copies</td>
+                    <td>mapreduce.reduce.shuffle.parallelcopies</td>
                     <td>20</td>
                     <td>
                       Higher number of parallel copies run by reduces to fetch
@@ -392,7 +642,7 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.map.child.java.opts</td>
+                    <td>mapreduce.map.java.opts</td>
                     <td>-Xmx512M</td>
                     <td>
                       Larger heap-size for child jvms of maps. 
@@ -400,7 +650,7 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.reduce.child.java.opts</td>
+                    <td>mapreduce.reduce.java.opts</td>
                     <td>-Xmx512M</td>
                     <td>
                       Larger heap-size for child jvms of reduces. 
@@ -417,13 +667,13 @@
                   </tr>
                   <tr>
                     <td>conf/core-site.xml</td>
-                    <td>io.sort.factor</td>
+                    <td>mapreduce.task.io.sort.factor</td>
                     <td>100</td>
                     <td>More streams merged at once while sorting files.</td>
                   </tr>
                   <tr>
                     <td>conf/core-site.xml</td>
-                    <td>io.sort.mb</td>
+                    <td>mapreduce.task.io.sort.mb</td>
                     <td>200</td>
                     <td>Higher memory-limit while sorting data.</td>
                   </tr>
@@ -448,7 +698,7 @@
 		          </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.job.tracker.handler.count</td>
+                    <td>mapreduce.jobtracker.handler.count</td>
                     <td>60</td>
                     <td>
                       More JobTracker server threads to handle RPCs from large 
@@ -457,13 +707,13 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.reduce.parallel.copies</td>
+                    <td>mapreduce.reduce.shuffle.parallelcopies</td>
                     <td>50</td>
                     <td></td>
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>tasktracker.http.threads</td>
+                    <td>mapreduce.tasktracker.http.threads</td>
                     <td>50</td>
                     <td>
                       More worker threads for the TaskTracker's http server. The
@@ -473,7 +723,7 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.map.child.java.opts</td>
+                    <td>mapreduce.map.java.opts</td>
                     <td>-Xmx512M</td>
                     <td>
                       Larger heap-size for child jvms of maps. 
@@ -481,7 +731,7 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.reduce.child.java.opts</td>
+                    <td>mapreduce.reduce.java.opts</td>
                     <td>-Xmx1024M</td>
                     <td>Larger heap-size for child jvms of reduces.</td>
                   </tr>
@@ -500,11 +750,11 @@
         or equal to the -Xmx passed to JavaVM, else the VM might not start. 
         </p>
         
-        <p>Note: <code>mapred.child.java.opts</code> are used only for 
+        <p>Note: <code>mapred.{map|reduce}.child.java.opts</code> are used only for 
         configuring the launched child tasks from task tracker. Configuring 
-        the memory options for daemons is documented under 
+        the memory options for daemons is documented in 
         <a href="cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons">
-        Configuring the Environment of the Hadoop Daemons</a>.</p>
+        cluster_setup.html </a></p>
         
         <p>The memory available to some parts of the framework is also
         configurable. In map and reduce tasks, performance may be influenced
@@ -558,7 +808,7 @@
 
     <table>
           <tr><th>Name</th><th>Type</th><th>Description</th></tr>
-          <tr><td>mapred.tasktracker.taskmemorymanager.monitoring-interval</td>
+          <tr><td>mapreduce.tasktracker.taskmemorymanager.monitoringinterval</td>
             <td>long</td>
             <td>The time interval, in milliseconds, between which the TT 
             checks for any memory violation. The default value is 5000 msec
@@ -668,10 +918,11 @@
             the tasks. For maximum security, this task controller 
             sets up restricted permissions and user/group ownership of
             local files and directories used by the tasks such as the
-            job jar files, intermediate files and task log files. Currently
-            permissions on distributed cache files are opened up to be
-            accessible by all users. In future, it is expected that stricter
-            file permissions are set for these files too.
+            job jar files, intermediate files, task log files and distributed
+            cache files. Particularly note that, because of this, except the
+            job owner and tasktracker, no other user can access any of the
+            local files/directories including those localized as part of the
+            distributed cache.
             </td>
             </tr>
             </table>
@@ -684,7 +935,7 @@
             <th>Property</th><th>Value</th><th>Notes</th>
             </tr>
             <tr>
-            <td>mapred.task.tracker.task-controller</td>
+            <td>mapreduce.tasktracker.taskcontroller</td>
             <td>Fully qualified class name of the task controller class</td>
             <td>Currently there are two implementations of task controller
             in the Hadoop system, DefaultTaskController and LinuxTaskController.
@@ -715,21 +966,35 @@
             <p>
             The executable must have specific permissions as follows. The
             executable should have <em>6050 or --Sr-s---</em> permissions
-            user-owned by root(super-user) and group-owned by a group 
-            of which only the TaskTracker's user is the sole group member. 
+            user-owned by root(super-user) and group-owned by a special group 
+            of which the TaskTracker's user is the group member and no job 
+            submitter is. If any job submitter belongs to this special group,
+            security will be compromised. This special group name should be
+            specified for the configuration property 
+            <em>"mapreduce.tasktracker.group"</em> in both mapred-site.xml and 
+            <a href="#task-controller.cfg">task-controller.cfg</a>.  
             For example, let's say that the TaskTracker is run as user
             <em>mapred</em> who is part of the groups <em>users</em> and
-            <em>mapredGroup</em> any of them being the primary group.
+            <em>specialGroup</em> any of them being the primary group.
             Let also be that <em>users</em> has both <em>mapred</em> and
-            another user <em>X</em> as its members, while <em>mapredGroup</em>
-            has only <em>mapred</em> as its member. Going by the above
+            another user (job submitter) <em>X</em> as its members, and X does
+            not belong to <em>specialGroup</em>. Going by the above
             description, the setuid/setgid executable should be set
             <em>6050 or --Sr-s---</em> with user-owner as <em>mapred</em> and
-            group-owner as <em>mapredGroup</em> which has
-            only <em>mapred</em> as its member(and not <em>users</em> which has
+            group-owner as <em>specialGroup</em> which has
+            <em>mapred</em> as its member(and not <em>users</em> which has
             <em>X</em> also as its member besides <em>mapred</em>).
             </p>
+
+            <p>
+            The LinuxTaskController requires that paths including and leading up
+            to the directories specified in
+            <em>mapreduce.cluster.local.dir</em> and <em>hadoop.log.dir</em> to
+            be set 755 permissions.
+            </p>
             
+            <section>
+            <title>task-controller.cfg</title>
             <p>The executable requires a configuration file called 
             <em>taskcontroller.cfg</em> to be
             present in the configuration directory passed to the ant target 
@@ -747,8 +1012,8 @@
             </p>
             <table><tr><th>Name</th><th>Description</th></tr>
             <tr>
-            <td>mapred.local.dir</td>
-            <td>Path to mapred local directories. Should be same as the value 
+            <td>mapreduce.cluster.local.dir</td>
+            <td>Path to mapreduce.cluster.local.directories. Should be same as the value 
             which was provided to key in mapred-site.xml. This is required to
             validate paths passed to the setuid executable in order to prevent
             arbitrary paths being passed to it.</td>
@@ -760,14 +1025,16 @@
             permissions on the log files so that they can be written to by the user's
             tasks and read by the TaskTracker for serving on the web UI.</td>
             </tr>
+            <tr>
+            <td>mapreduce.tasktracker.group</td>
+            <td>Group to which the TaskTracker belongs. The group owner of the
+            taskcontroller binary should be this group. Should be same as
+            the value with which the TaskTracker is configured. This 
+            configuration is required for validating the secure access of the
+            task-controller binary.</td>
+            </tr>
             </table>
-
-            <p>
-            The LinuxTaskController requires that paths including and leading up to
-            the directories specified in
-            <em>mapred.local.dir</em> and <em>hadoop.log.dir</em> to be set 755
-            permissions.
-            </p>
+            </section>
             </section>
             
           </section>
@@ -800,7 +1067,7 @@
             monitoring script in <em>mapred-site.xml</em>.</p>
             <table>
             <tr><th>Name</th><th>Description</th></tr>
-            <tr><td><code>mapred.healthChecker.script.path</code></td>
+            <tr><td><code>mapreduce.tasktracker.healthchecker.script.path</code></td>
             <td>Absolute path to the script which is periodically run by the 
             TaskTracker to determine if the node is 
             healthy or not. The file should be executable by the TaskTracker.
@@ -809,18 +1076,18 @@
             is not started.</td>
             </tr>
             <tr>
-            <td><code>mapred.healthChecker.interval</code></td>
+            <td><code>mapreduce.tasktracker.healthchecker.interval</code></td>
             <td>Frequency at which the node health script is run, 
             in milliseconds</td>
             </tr>
             <tr>
-            <td><code>mapred.healthChecker.script.timeout</code></td>
+            <td><code>mapreduce.tasktracker.healthchecker.script.timeout</code></td>
             <td>Time after which the node health script will be killed by
             the TaskTracker if unresponsive.
             The node is marked unhealthy. if node health script times out.</td>
             </tr>
             <tr>
-            <td><code>mapred.healthChecker.script.args</code></td>
+            <td><code>mapreduce.tasktracker.healthchecker.script.args</code></td>
             <td>Extra arguments that can be passed to the node health script 
             when launched.
             These should be comma separated list of arguments. </td>
@@ -857,17 +1124,17 @@
             <title>History Logging</title>
             
             <p> The job history files are stored in central location 
-            <code> hadoop.job.history.location </code> which can be on DFS also,
+            <code> mapreduce.jobtracker.jobhistory.location </code> which can be on DFS also,
             whose default value is <code>${HADOOP_LOG_DIR}/history</code>. 
             The history web UI is accessible from job tracker web UI.</p>
             
             <p> The history files are also logged to user specified directory
-            <code>hadoop.job.history.user.location</code> 
+            <code>mapreduce.job.userhistorylocation</code> 
             which defaults to job output directory. The files are stored in
             "_logs/history/" in the specified directory. Hence, by default 
-            they will be in "mapred.output.dir/_logs/history/". User can stop
+            they will be in "mapreduce.output.fileoutputformat.outputdir/_logs/history/". User can stop
             logging by giving the value <code>none</code> for 
-            <code>hadoop.job.history.user.location</code> </p>
+            <code>mapreduce.job.userhistorylocation</code> </p>
             
             <p> User can view the history logs summary in specified directory 
             using the following command <br/>
@@ -880,7 +1147,6 @@
             <code>$ bin/hadoop job -history all output-dir</code><br/></p> 
           </section>
         </section>
-      </section>
       
       <p>Once all the necessary configuration is complete, distribute the files
       to the <code>HADOOP_CONF_DIR</code> directory on all the machines, 
@@ -891,9 +1157,9 @@
       <section>
         <title>Map/Reduce</title>
         <p>The job tracker restart can recover running jobs if 
-        <code>mapred.jobtracker.restart.recover</code> is set true and 
+        <code>mapreduce.jobtracker.restart.recover</code> is set true and 
         <a href="#Logging">JobHistory logging</a> is enabled. Also 
-        <code>mapred.jobtracker.job.history.block.size</code> value should be 
+        <code>mapreduce.jobtracker.jobhistory.block.size</code> value should be 
         set to an optimal value to dump job history to disk as soon as 
         possible, the typical value is 3145728(3MB).</p>
       </section>
@@ -951,7 +1217,7 @@
       and starts the <code>TaskTracker</code> daemon on all the listed slaves.
       </p>
     </section>
-    
+
     <section>
       <title>Hadoop Shutdown</title>
       

Modified: hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/single_node_setup.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/single_node_setup.xml?rev=951482&r1=951481&r2=951482&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/single_node_setup.xml (original)
+++ hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/single_node_setup.xml Fri Jun  4 16:35:47 2010
@@ -97,7 +97,7 @@
       
     </section>
     
-    <section>
+    <section id="Download">
       <title>Download</title>
       
       <p>

Modified: hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml?rev=951482&r1=951481&r2=951482&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ hadoop/common/branches/branch-0.21/src/docs/src/documentation/content/xdocs/site.xml Fri Jun  4 16:35:47 2010
@@ -39,9 +39,11 @@ See http://forrest.apache.org/docs/linki
   </docs>	
 		
  <docs label="Guides">
+		<commands_manual 				label="Hadoop Commands"  href="commands_manual.html" />
 		<fsshell				        label="File System Shell"               href="file_system_shell.html" />
 		<SLA					 	label="Service Level Authorization" 	href="service_level_auth.html"/>
 		<native_lib    				label="Native Libraries" 					href="native_libraries.html" />
+		<hod_scheduler 			label="Hadoop On Demand"            href="hod_scheduler.html"/>
    </docs>
 
    <docs label="Miscellaneous"> 
@@ -68,6 +70,15 @@ See http://forrest.apache.org/docs/linki
     <hdfs-default href="http://hadoop.apache.org/hdfs/docs/current/hdfs-default.html" />
     <mapred-default href="http://hadoop.apache.org/mapreduce/docs/current/mapred-default.html" />
     
+    <mapred-queues href="http://hadoop.apache.org/mapreduce/docs/current/mapred_queues.xml" />
+    <capacity-scheduler href="http://hadoop.apache.org/mapreduce/docs/current/capacity_scheduler.html" />
+    <mapred-tutorial href="http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html" >
+        <JobAuthorization href="#Job+Authorization" />
+    </mapred-tutorial>
+    <streaming href="http://hadoop.apache.org/mapreduce/docs/current/streaming.html" />
+    <distcp href="http://hadoop.apache.org/mapreduce/docs/current/distcp.html" />
+    <hadoop-archives href="http://hadoop.apache.org/mapreduce/docs/current/hadoop_archives.html" />
+    
     <zlib      href="http://www.zlib.net/" />
     <gzip      href="http://www.gzip.org/" />
     <bzip      href="http://www.bzip.org/" />



Mime
View raw message