hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tomwh...@apache.org
Subject svn commit: r951480 [1/2] - in /hadoop/common/trunk: ./ src/docs/src/documentation/content/xdocs/
Date Fri, 04 Jun 2010 16:34:18 GMT
Author: tomwhite
Date: Fri Jun  4 16:34:18 2010
New Revision: 951480

URL: http://svn.apache.org/viewvc?rev=951480&view=rev
Log:
HADOOP-6738.  Move cluster_setup.xml, hod_scheduler, commands_manual from MapReduce to Common.

Added:
    hadoop/common/trunk/src/docs/src/documentation/content/xdocs/commands_manual.xml   (with props)
    hadoop/common/trunk/src/docs/src/documentation/content/xdocs/hod_scheduler.xml   (with props)
Modified:
    hadoop/common/trunk/CHANGES.txt
    hadoop/common/trunk/src/docs/src/documentation/content/xdocs/cluster_setup.xml
    hadoop/common/trunk/src/docs/src/documentation/content/xdocs/single_node_setup.xml
    hadoop/common/trunk/src/docs/src/documentation/content/xdocs/site.xml

Modified: hadoop/common/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/CHANGES.txt?rev=951480&r1=951479&r2=951480&view=diff
==============================================================================
--- hadoop/common/trunk/CHANGES.txt (original)
+++ hadoop/common/trunk/CHANGES.txt Fri Jun  4 16:34:18 2010
@@ -932,6 +932,9 @@ Release 0.21.0 - Unreleased
     HADOOP-6585.  Add FileStatus#isDirectory and isFile.  (Eli Collins via
     tomwhite)
 
+    HADOOP-6738.  Move cluster_setup.xml from MapReduce to Common.
+    (Tom White via tomwhite)
+
   OPTIMIZATIONS
 
     HADOOP-5595. NameNode does not need to run a replicator to choose a

Modified: hadoop/common/trunk/src/docs/src/documentation/content/xdocs/cluster_setup.xml
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/docs/src/documentation/content/xdocs/cluster_setup.xml?rev=951480&r1=951479&r2=951480&view=diff
==============================================================================
--- hadoop/common/trunk/src/docs/src/documentation/content/xdocs/cluster_setup.xml (original)
+++ hadoop/common/trunk/src/docs/src/documentation/content/xdocs/cluster_setup.xml Fri Jun  4 16:34:18 2010
@@ -33,20 +33,20 @@
       Hadoop clusters ranging from a few nodes to extremely large clusters with 
       thousands of nodes.</p>
       <p>
-      To play with Hadoop, you may first want to install Hadoop on a single machine (see <a href="single_node_setup.html"> Single Node Setup</a>).
+      To play with Hadoop, you may first want to install Hadoop on a single machine (see <a href="single_node_setup.html"> Hadoop Quick Start</a>).
       </p>
     </section>
     
     <section>
-      <title>Prerequisites</title>
+      <title>Pre-requisites</title>
       
       <ol>
         <li>
-          Make sure all <a href="single_node_setup.html#PreReqs">required software</a> 
+          Make sure all <a href="single_node_setup.html#PreReqs">requisite</a> software 
           is installed on all nodes in your cluster.
         </li>
         <li>
-          <a href="single_node_setup.html#Download">Download</a> the Hadoop software.
+          <a href="single_node_setup.html#Download">Get</a> the Hadoop software.
         </li>
       </ol>
     </section>
@@ -81,21 +81,23 @@
         <ol>
           <li>
             Read-only default configuration - 
-            <a href="ext:common-default">src/common/common-default.xml</a>, 
-            <a href="ext:hdfs-default">src/hdfs/hdfs-default.xml</a> and 
-            <a href="ext:mapred-default">src/mapred/mapred-default.xml</a>.
+            <a href="ext:common-default">src/core/core-default.xml</a>, 
+            <a href="ext:hdfs-default">src/hdfs/hdfs-default.xml</a>, 
+            <a href="ext:mapred-default">src/mapred/mapred-default.xml</a> and
+            <a href="ext:mapred-queues">conf/mapred-queues.xml.template</a>.
           </li>
           <li>
             Site-specific configuration - 
-            <em>conf/core-site.xml</em>, 
-            <em>conf/hdfs-site.xml</em> and 
-            <em>conf/mapred-site.xml</em>.
+            <a href="#core-site.xml">conf/core-site.xml</a>, 
+            <a href="#hdfs-site.xml">conf/hdfs-site.xml</a>, 
+            <a href="#mapred-site.xml">conf/mapred-site.xml</a> and
+            <a href="#mapred-queues.xml">conf/mapred-queues.xml</a>.
           </li>
         </ol>
       
         <p>To learn more about how the Hadoop framework is controlled by these 
-        configuration files see
-        <a href="ext:api/org/apache/hadoop/conf/configuration">Class Configuration</a>.</p>
+        configuration files, look 
+        <a href="ext:api/org/apache/hadoop/conf/configuration">here</a>.</p>
       
         <p>Additionally, you can control the Hadoop scripts found in the 
         <code>bin/</code> directory of the distribution, by setting site-specific 
@@ -163,9 +165,8 @@
           <title>Configuring the Hadoop Daemons</title>
           
           <p>This section deals with important parameters to be specified in the
-          following:
-          <br/>
-          <code>conf/core-site.xml</code>:</p>
+          following:</p>
+          <anchor id="core-site.xml"/><p><code>conf/core-site.xml</code>:</p>
 
 		  <table>
   		    <tr>
@@ -180,7 +181,7 @@
             </tr>
           </table>
 
-      <p><br/><code>conf/hdfs-site.xml</code>:</p>
+      <anchor id="hdfs-site.xml"/><p><code>conf/hdfs-site.xml</code>:</p>
           
       <table>   
         <tr>
@@ -212,7 +213,7 @@
 		    </tr>
       </table>
 
-      <p><br/><code>conf/mapred-site.xml</code>:</p>
+      <anchor id="mapred-site.xml"/><p><code>conf/mapred-site.xml</code>:</p>
 
       <table>
           <tr>
@@ -221,12 +222,12 @@
           <th>Notes</th>
         </tr>
         <tr>
-          <td>mapred.job.tracker</td>
+          <td>mapreduce.jobtracker.address</td>
           <td>Host or IP and port of <code>JobTracker</code>.</td>
           <td><em>host:port</em> pair.</td>
         </tr>
 		    <tr>
-		      <td>mapred.system.dir</td>
+		      <td>mapreduce.jobtracker.system.dir</td>
 		      <td>
 		        Path on the HDFS where where the Map/Reduce framework stores 
 		        system files e.g. <code>/hadoop/mapred/system/</code>.
@@ -237,7 +238,7 @@
 		      </td>
 		    </tr>
 		    <tr>
-		      <td>mapred.local.dir</td>
+		      <td>mapreduce.cluster.local.dir</td>
 		      <td>
 		        Comma-separated list of paths on the local filesystem where 
 		        temporary Map/Reduce data is written.
@@ -264,7 +265,7 @@
 		      </td>
 		    </tr>
 		    <tr>
-		      <td>mapred.hosts/mapred.hosts.exclude</td>
+		      <td>mapreduce.jobtracker.hosts.filename/mapreduce.jobtracker.hosts.exclude.filename</td>
 		      <td>List of permitted/excluded TaskTrackers.</td>
 		      <td>
 		        If necessary, use these files to control the list of allowable 
@@ -272,82 +273,331 @@
 		      </td>
   		    </tr>
         <tr>
-          <td>mapred.queue.names</td>
-          <td>Comma separated list of queues to which jobs can be submitted.</td>
+          <td>mapreduce.cluster.job-authorization-enabled</td>
+          <td>Boolean, specifying whether job ACLs are supported for 
+              authorizing view and modification of a job</td>
           <td>
-            The Map/Reduce system always supports atleast one queue
-            with the name as <em>default</em>. Hence, this parameter's
-            value should always contain the string <em>default</em>.
-            Some job schedulers supported in Hadoop, like the 
-            <a href="http://hadoop.apache.org/mapreduce/docs/current/capacity_scheduler.html">Capacity Scheduler</a>, 
-            support multiple queues. If such a scheduler is
-            being used, the list of configured queue names must be
-            specified here. Once queues are defined, users can submit
-            jobs to a queue using the property name 
-            <em>mapred.job.queue.name</em> in the job configuration.
-            There could be a separate 
-            configuration file for configuring properties of these 
-            queues that is managed by the scheduler. 
-            Refer to the documentation of the scheduler for information on 
-            the same.
+            If <em>true</em>, job ACLs would be checked while viewing or
+            modifying a job. More details are available at 
+            <a href ="ext:mapred-tutorial/JobAuthorization">Job Authorization</a>. 
           </td>
         </tr>
-        <tr>
-          <td>mapred.acls.enabled</td>
-          <td>Specifies whether ACLs are supported for controlling job
-              submission and administration</td>
-          <td>
-            If <em>true</em>, ACLs would be checked while submitting
-            and administering jobs. ACLs can be specified using the
-            configuration parameters of the form
-            <em>mapred.queue.queue-name.acl-name</em>, defined below.
-          </td>
-        </tr>
-		  </table>
-      
-      <p><br/><code> conf/mapred-queue-acls.xml</code></p>
-      
-      <table>
-       <tr>
-          <th>Parameter</th>
-          <th>Value</th> 
-          <th>Notes</th>
-       </tr>
-        <tr>
-          <td>mapred.queue.<em>queue-name</em>.acl-submit-job</td>
-          <td>List of users and groups that can submit jobs to the
-              specified <em>queue-name</em>.</td>
-          <td>
-            The list of users and groups are both comma separated
-            list of names. The two lists are separated by a blank.
-            Example: <em>user1,user2 group1,group2</em>.
-            If you wish to define only a list of groups, provide
-            a blank at the beginning of the value.
-          </td>
-        </tr>
-        <tr>
-          <td>mapred.queue.<em>queue-name</em>.acl-administer-job</td>
-          <td>List of users and groups that can change the priority
-              or kill jobs that have been submitted to the
-              specified <em>queue-name</em>.</td>
-          <td>
-            The list of users and groups are both comma separated
-            list of names. The two lists are separated by a blank.
-            Example: <em>user1,user2 group1,group2</em>.
-            If you wish to define only a list of groups, provide
-            a blank at the beginning of the value. Note that an
-            owner of a job can always change the priority or kill
-            his/her own job, irrespective of the ACLs.
-          </td>
-        </tr>
-      </table>
-      
+  		    
+		  </table>      
 
           <p>Typically all the above parameters are marked as 
           <a href="ext:api/org/apache/hadoop/conf/configuration/final_parameters">
           final</a> to ensure that they cannot be overriden by user-applications.
           </p>
 
+          <anchor id="mapred-queues.xml"/><p><code>conf/mapred-queues.xml
+          </code>:</p>
+          <p>This file is used to configure the queues in the Map/Reduce
+          system. Queues are abstract entities in the JobTracker that can be
+          used to manage collections of jobs. They provide a way for 
+          administrators to organize jobs in specific ways and to enforce 
+          certain policies on such collections, thus providing varying
+          levels of administrative control and management functions on jobs.
+          </p> 
+          <p>One can imagine the following sample scenarios:</p>
+          <ul>
+            <li> Jobs submitted by a particular group of users can all be 
+            submitted to one queue. </li> 
+            <li> Long running jobs in an organization can be submitted to a
+            queue. </li>
+            <li> Short running jobs can be submitted to a queue and the number
+            of jobs that can run concurrently can be restricted. </li> 
+          </ul> 
+          <p>The usage of queues is closely tied to the scheduler configured
+          at the JobTracker via <em>mapreduce.jobtracker.taskscheduler</em>.
+          The degree of support of queues depends on the scheduler used. Some
+          schedulers support a single queue, while others support more complex
+          configurations. Schedulers also implement the policies that apply 
+          to jobs in a queue. Some schedulers, such as the Fairshare scheduler,
+          implement their own mechanisms for collections of jobs and do not rely
+          on queues provided by the framework. The administrators are 
+          encouraged to refer to the documentation of the scheduler they are
+          interested in for determining the level of support for queues.</p>
+          <p>The Map/Reduce framework supports some basic operations on queues
+          such as job submission to a specific queue, access control for queues,
+          queue states, viewing configured queues and their properties
+          and refresh of queue properties. In order to fully implement some of
+          these operations, the framework takes the help of the configured
+          scheduler.</p>
+          <p>The following types of queue configurations are possible:</p>
+          <ul>
+            <li> Single queue: The default configuration in Map/Reduce comprises
+            of a single queue, as supported by the default scheduler. All jobs
+            are submitted to this default queue which maintains jobs in a priority
+            based FIFO order.</li>
+            <li> Multiple single level queues: Multiple queues are defined, and
+            jobs can be submitted to any of these queues. Different policies
+            can be applied to these queues by schedulers that support this 
+            configuration to provide a better level of support. For example,
+            the <a href="ext:capacity-scheduler">capacity scheduler</a>
+            provides ways of configuring different 
+            capacity and fairness guarantees on these queues.</li>
+            <li> Hierarchical queues: Hierarchical queues are a configuration in
+            which queues can contain other queues within them recursively. The
+            queues that contain other queues are referred to as 
+            container queues. Queues that do not contain other queues are 
+            referred as leaf or job queues. Jobs can only be submitted to leaf
+            queues. Hierarchical queues can potentially offer a higher level 
+            of control to administrators, as schedulers can now build a
+            hierarchy of policies where policies applicable to a container
+            queue can provide context for policies applicable to queues it
+            contains. It also opens up possibilities for delegating queue
+            administration where administration of queues in a container queue
+            can be turned over to a different set of administrators, within
+            the context provided by the container queue. For example, the
+            <a href="ext:capacity-scheduler">capacity scheduler</a>
+            uses hierarchical queues to partition capacity of a cluster
+            among container queues, and allowing queues they contain to divide
+            that capacity in more ways.</li> 
+          </ul>
+
+          <p>Most of the configuration of the queues can be refreshed/reloaded
+          without restarting the Map/Reduce sub-system by editing this
+          configuration file as described in the section on
+          <a href="commands_manual.html#RefreshQueues">reloading queue 
+          configuration</a>.
+          Not all configuration properties can be reloaded of course,
+          as will description of each property below explain.</p>
+
+          <p>The format of conf/mapred-queues.xml is different from the other 
+          configuration files, supporting nested configuration
+          elements to support hierarchical queues. The format is as follows:
+          </p>
+
+          <source>
+          &lt;queues aclsEnabled="$aclsEnabled"&gt;
+            &lt;queue&gt;
+              &lt;name&gt;$queue-name&lt;/name&gt;
+              &lt;state&gt;$state&lt;/state&gt;
+              &lt;queue&gt;
+                &lt;name&gt;$child-queue1&lt;/name&gt;
+                &lt;properties&gt;
+                   &lt;property key="$key" value="$value"/&gt;
+                   ...
+                &lt;/properties&gt;
+                &lt;queue&gt;
+                  &lt;name&gt;$grand-child-queue1&lt;/name&gt;
+                  ...
+                &lt;/queue&gt;
+              &lt;/queue&gt;
+              &lt;queue&gt;
+                &lt;name&gt;$child-queue2&lt;/name&gt;
+                ...
+              &lt;/queue&gt;
+              ...
+              ...
+              ...
+              &lt;queue&gt;
+                &lt;name&gt;$leaf-queue&lt;/name&gt;
+                &lt;acl-submit-job&gt;$acls&lt;/acl-submit-job&gt;
+                &lt;acl-administer-jobs&gt;$acls&lt;/acl-administer-jobs&gt;
+                &lt;properties&gt;
+                   &lt;property key="$key" value="$value"/&gt;
+                   ...
+                &lt;/properties&gt;
+              &lt;/queue&gt;
+            &lt;/queue&gt;
+          &lt;/queues&gt;
+          </source>
+          <table>
+            <tr>
+              <th>Tag/Attribute</th>
+              <th>Value</th>
+              <th>
+              	<a href="commands_manual.html#RefreshQueues">Refresh-able?</a>
+              </th>
+              <th>Notes</th>
+            </tr>
+
+            <tr>
+              <td><anchor id="queues_tag"/>queues</td>
+              <td>Root element of the configuration file.</td>
+              <td>Not-applicable</td>
+              <td>All the queues are nested inside this root element of the
+              file. There can be only one root queues element in the file.</td>
+            </tr>
+
+            <tr>
+              <td>aclsEnabled</td>
+              <td>Boolean attribute to the
+              <a href="#queues_tag"><em>&lt;queues&gt;</em></a> tag
+              specifying whether ACLs are supported for controlling job
+              submission and administration for <em>all</em> the queues
+              configured.
+              </td>
+              <td>Yes</td>
+              <td>If <em>false</em>, ACLs are ignored for <em>all</em> the
+              configured queues. <br/><br/>
+              If <em>true</em>, the user and group details of the user
+              are checked against the configured ACLs of the corresponding
+              job-queue while submitting and administering jobs. ACLs can be
+              specified for each queue using the queue-specific tags
+              "acl-$acl_name", defined below. ACLs are checked only against
+              the job-queues, i.e. the leaf-level queues; ACLs configured
+              for the rest of the queues in the hierarchy are ignored.
+              </td>
+            </tr>
+
+            <tr>
+              <td><anchor id="queue_tag"/>queue</td>
+              <td>A child element of the
+              <a href="#queues_tag"><em>&lt;queues&gt;</em></a> tag or another
+              <a href="#queue_tag"><em>&lt;queue&gt;</em></a>. Denotes a queue
+              in the system.
+              </td>
+              <td>Not applicable</td>
+              <td>Queues can be hierarchical and so this element can contain
+              children of this same type.</td>
+            </tr>
+
+            <tr>
+              <td>name</td>
+              <td>Child element of a 
+              <a href="#queue_tag"><em>&lt;queue&gt;</em></a> specifying the
+              name of the queue.</td>
+              <td>No</td>
+              <td>Name of the queue cannot contain the character <em>":"</em>
+              which is reserved as the queue-name delimiter when addressing a
+              queue in a hierarchy.</td>
+            </tr>
+
+            <tr>
+              <td>state</td>
+              <td>Child element of a
+              <a href="#queue_tag"><em>&lt;queue&gt;</em></a> specifying the
+              state of the queue.
+              </td>
+              <td>Yes</td>
+              <td>Each queue has a corresponding state. A queue in
+              <em>'running'</em> state can accept new jobs, while a queue in
+              <em>'stopped'</em> state will stop accepting any new jobs. State
+              is defined and respected by the framework only for the
+              leaf-level queues and is ignored for all other queues.
+              <br/><br/>
+              The state of the queue can be viewed from the command line using
+              <code>'bin/mapred queue'</code> command and also on the the Web
+              UI.<br/><br/>
+              Administrators can stop and start queues at runtime using the
+              feature of <a href="commands_manual.html#RefreshQueues">reloading
+              queue configuration</a>. If a queue is stopped at runtime, it
+              will complete all the existing running jobs and will stop
+              accepting any new jobs.
+              </td>
+            </tr>
+
+            <tr>
+              <td>acl-submit-job</td>
+              <td>Child element of a
+              <a href="#queue_tag"><em>&lt;queue&gt;</em></a> specifying the
+              list of users and groups that can submit jobs to the specified
+              queue.</td>
+              <td>Yes</td>
+              <td>
+              Applicable only to leaf-queues.<br/><br/>
+              The list of users and groups are both comma separated
+              list of names. The two lists are separated by a blank.
+              Example: <em>user1,user2 group1,group2</em>.
+              If you wish to define only a list of groups, provide
+              a blank at the beginning of the value.
+              <br/><br/>
+              </td>
+            </tr>
+
+            <tr>
+              <td>acl-administer-job</td>
+              <td>Child element of a
+              <a href="#queue_tag"><em>&lt;queue&gt;</em></a> specifying the
+              list of users and groups that can change the priority of a job
+              or kill a job that has been submitted to the specified queue.
+              </td>
+              <td>Yes</td>
+              <td>
+              Applicable only to leaf-queues.<br/><br/>
+              The list of users and groups are both comma separated
+              list of names. The two lists are separated by a blank.
+              Example: <em>user1,user2 group1,group2</em>.
+              If you wish to define only a list of groups, provide
+              a blank at the beginning of the value. Note that an
+              owner of a job can always change the priority or kill
+              his/her own job, irrespective of the ACLs.
+              </td>
+            </tr>
+
+            <tr>
+              <td><anchor id="properties_tag"/>properties</td>
+              <td>Child element of a 
+              <a href="#queue_tag"><em>&lt;queue&gt;</em></a> specifying the
+              scheduler specific properties.</td>
+              <td>Not applicable</td>
+              <td>The scheduler specific properties are the children of this
+              element specified as a group of &lt;property&gt; tags described
+              below. The JobTracker completely ignores these properties. These
+              can be used as per-queue properties needed by the scheduler
+              being configured. Please look at the scheduler specific
+              documentation as to how these properties are used by that
+              particular scheduler.
+              </td>
+            </tr>
+
+            <tr>
+              <td><anchor id="property_tag"/>property</td>
+              <td>Child element of
+              <a href="#properties_tag"><em>&lt;properties&gt;</em></a> for a
+              specific queue.</td>
+              <td>Not applicable</td>
+              <td>A single scheduler specific queue-property. Ignored by
+              the JobTracker and used by the scheduler that is configured.</td>
+            </tr>
+
+            <tr>
+              <td>key</td>
+              <td>Attribute of a
+              <a href="#property_tag"><em>&lt;property&gt;</em></a> for a
+              specific queue.</td>
+              <td>Scheduler-specific</td>
+              <td>The name of a single scheduler specific queue-property.</td>
+            </tr>
+
+            <tr>
+              <td>value</td>
+              <td>Attribute of a
+              <a href="#property_tag"><em>&lt;property&gt;</em></a> for a
+              specific queue.</td>
+              <td>Scheduler-specific</td>
+              <td>The value of a single scheduler specific queue-property.
+              The value can be anything that is left for the proper
+              interpretation by the scheduler that is configured.</td>
+            </tr>
+
+         </table>
+
+          <p>Once the queues are configured properly and the Map/Reduce
+          system is up and running, from the command line one can
+          <a href="commands_manual.html#QueuesList">get the list
+          of queues</a> and
+          <a href="commands_manual.html#QueuesInfo">obtain
+          information specific to each queue</a>. This information is also
+          available from the web UI. On the web UI, queue information can be
+          seen by going to queueinfo.jsp, linked to from the queues table-cell
+          in the cluster-summary table. The queueinfo.jsp prints the hierarchy
+          of queues as well as the specific information for each queue.
+          </p>
+
+          <p> Users can submit jobs only to a
+          leaf-level queue by specifying the fully-qualified queue-name for
+          the property name <em>mapreduce.job.queuename</em> in the job
+          configuration. The character ':' is the queue-name delimiter and so,
+          for e.g., if one wants to submit to a configured job-queue 'Queue-C'
+          which is one of the sub-queues of 'Queue-B' which in-turn is a
+          sub-queue of 'Queue-A', then the job configuration should contain
+          property <em>mapreduce.job.queuename</em> set to the <em>
+          &lt;value&gt;Queue-A:Queue-B:Queue-C&lt;/value&gt;</em></p>
+         </section>
           <section>
             <title>Real-World Cluster Configurations</title>
             
@@ -383,7 +633,7 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.reduce.parallel.copies</td>
+                    <td>mapreduce.reduce.shuffle.parallelcopies</td>
                     <td>20</td>
                     <td>
                       Higher number of parallel copies run by reduces to fetch
@@ -392,7 +642,7 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.map.child.java.opts</td>
+                    <td>mapreduce.map.java.opts</td>
                     <td>-Xmx512M</td>
                     <td>
                       Larger heap-size for child jvms of maps. 
@@ -400,7 +650,7 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.reduce.child.java.opts</td>
+                    <td>mapreduce.reduce.java.opts</td>
                     <td>-Xmx512M</td>
                     <td>
                       Larger heap-size for child jvms of reduces. 
@@ -417,13 +667,13 @@
                   </tr>
                   <tr>
                     <td>conf/core-site.xml</td>
-                    <td>io.sort.factor</td>
+                    <td>mapreduce.task.io.sort.factor</td>
                     <td>100</td>
                     <td>More streams merged at once while sorting files.</td>
                   </tr>
                   <tr>
                     <td>conf/core-site.xml</td>
-                    <td>io.sort.mb</td>
+                    <td>mapreduce.task.io.sort.mb</td>
                     <td>200</td>
                     <td>Higher memory-limit while sorting data.</td>
                   </tr>
@@ -448,7 +698,7 @@
 		          </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.job.tracker.handler.count</td>
+                    <td>mapreduce.jobtracker.handler.count</td>
                     <td>60</td>
                     <td>
                       More JobTracker server threads to handle RPCs from large 
@@ -457,13 +707,13 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.reduce.parallel.copies</td>
+                    <td>mapreduce.reduce.shuffle.parallelcopies</td>
                     <td>50</td>
                     <td></td>
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>tasktracker.http.threads</td>
+                    <td>mapreduce.tasktracker.http.threads</td>
                     <td>50</td>
                     <td>
                       More worker threads for the TaskTracker's http server. The
@@ -473,7 +723,7 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.map.child.java.opts</td>
+                    <td>mapreduce.map.java.opts</td>
                     <td>-Xmx512M</td>
                     <td>
                       Larger heap-size for child jvms of maps. 
@@ -481,7 +731,7 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.reduce.child.java.opts</td>
+                    <td>mapreduce.reduce.java.opts</td>
                     <td>-Xmx1024M</td>
                     <td>Larger heap-size for child jvms of reduces.</td>
                   </tr>
@@ -500,11 +750,11 @@
         or equal to the -Xmx passed to JavaVM, else the VM might not start. 
         </p>
         
-        <p>Note: <code>mapred.child.java.opts</code> are used only for 
+        <p>Note: <code>mapred.{map|reduce}.child.java.opts</code> are used only for 
         configuring the launched child tasks from task tracker. Configuring 
-        the memory options for daemons is documented under 
+        the memory options for daemons is documented in 
         <a href="cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons">
-        Configuring the Environment of the Hadoop Daemons</a>.</p>
+        cluster_setup.html </a></p>
         
         <p>The memory available to some parts of the framework is also
         configurable. In map and reduce tasks, performance may be influenced
@@ -558,7 +808,7 @@
 
     <table>
           <tr><th>Name</th><th>Type</th><th>Description</th></tr>
-          <tr><td>mapred.tasktracker.taskmemorymanager.monitoring-interval</td>
+          <tr><td>mapreduce.tasktracker.taskmemorymanager.monitoringinterval</td>
             <td>long</td>
             <td>The time interval, in milliseconds, between which the TT 
             checks for any memory violation. The default value is 5000 msec
@@ -668,10 +918,11 @@
             the tasks. For maximum security, this task controller 
             sets up restricted permissions and user/group ownership of
             local files and directories used by the tasks such as the
-            job jar files, intermediate files and task log files. Currently
-            permissions on distributed cache files are opened up to be
-            accessible by all users. In future, it is expected that stricter
-            file permissions are set for these files too.
+            job jar files, intermediate files, task log files and distributed
+            cache files. Particularly note that, because of this, except the
+            job owner and tasktracker, no other user can access any of the
+            local files/directories including those localized as part of the
+            distributed cache.
             </td>
             </tr>
             </table>
@@ -684,7 +935,7 @@
             <th>Property</th><th>Value</th><th>Notes</th>
             </tr>
             <tr>
-            <td>mapred.task.tracker.task-controller</td>
+            <td>mapreduce.tasktracker.taskcontroller</td>
             <td>Fully qualified class name of the task controller class</td>
             <td>Currently there are two implementations of task controller
             in the Hadoop system, DefaultTaskController and LinuxTaskController.
@@ -715,21 +966,35 @@
             <p>
             The executable must have specific permissions as follows. The
             executable should have <em>6050 or --Sr-s---</em> permissions
-            user-owned by root(super-user) and group-owned by a group 
-            of which only the TaskTracker's user is the sole group member. 
+            user-owned by root(super-user) and group-owned by a special group 
+            of which the TaskTracker's user is the group member and no job 
+            submitter is. If any job submitter belongs to this special group,
+            security will be compromised. This special group name should be
+            specified for the configuration property 
+            <em>"mapreduce.tasktracker.group"</em> in both mapred-site.xml and 
+            <a href="#task-controller.cfg">task-controller.cfg</a>.  
             For example, let's say that the TaskTracker is run as user
             <em>mapred</em> who is part of the groups <em>users</em> and
-            <em>mapredGroup</em> any of them being the primary group.
+            <em>specialGroup</em> any of them being the primary group.
             Let also be that <em>users</em> has both <em>mapred</em> and
-            another user <em>X</em> as its members, while <em>mapredGroup</em>
-            has only <em>mapred</em> as its member. Going by the above
+            another user (job submitter) <em>X</em> as its members, and X does
+            not belong to <em>specialGroup</em>. Going by the above
             description, the setuid/setgid executable should be set
             <em>6050 or --Sr-s---</em> with user-owner as <em>mapred</em> and
-            group-owner as <em>mapredGroup</em> which has
-            only <em>mapred</em> as its member(and not <em>users</em> which has
+            group-owner as <em>specialGroup</em> which has
+            <em>mapred</em> as its member(and not <em>users</em> which has
             <em>X</em> also as its member besides <em>mapred</em>).
             </p>
+
+            <p>
+            The LinuxTaskController requires that paths including and leading up
+            to the directories specified in
+            <em>mapreduce.cluster.local.dir</em> and <em>hadoop.log.dir</em> to
+            be set 755 permissions.
+            </p>
             
+            <section>
+            <title>task-controller.cfg</title>
             <p>The executable requires a configuration file called 
             <em>taskcontroller.cfg</em> to be
             present in the configuration directory passed to the ant target 
@@ -747,8 +1012,8 @@
             </p>
             <table><tr><th>Name</th><th>Description</th></tr>
             <tr>
-            <td>mapred.local.dir</td>
-            <td>Path to mapred local directories. Should be same as the value 
+            <td>mapreduce.cluster.local.dir</td>
+            <td>Path to mapreduce.cluster.local.directories. Should be same as the value 
             which was provided to key in mapred-site.xml. This is required to
             validate paths passed to the setuid executable in order to prevent
             arbitrary paths being passed to it.</td>
@@ -760,14 +1025,16 @@
             permissions on the log files so that they can be written to by the user's
             tasks and read by the TaskTracker for serving on the web UI.</td>
             </tr>
+            <tr>
+            <td>mapreduce.tasktracker.group</td>
+            <td>Group to which the TaskTracker belongs. The group owner of the
+            taskcontroller binary should be this group. Should be same as
+            the value with which the TaskTracker is configured. This 
+            configuration is required for validating the secure access of the
+            task-controller binary.</td>
+            </tr>
             </table>
-
-            <p>
-            The LinuxTaskController requires that paths including and leading up to
-            the directories specified in
-            <em>mapred.local.dir</em> and <em>hadoop.log.dir</em> to be set 755
-            permissions.
-            </p>
+            </section>
             </section>
             
           </section>
@@ -800,7 +1067,7 @@
             monitoring script in <em>mapred-site.xml</em>.</p>
             <table>
             <tr><th>Name</th><th>Description</th></tr>
-            <tr><td><code>mapred.healthChecker.script.path</code></td>
+            <tr><td><code>mapreduce.tasktracker.healthchecker.script.path</code></td>
             <td>Absolute path to the script which is periodically run by the 
             TaskTracker to determine if the node is 
             healthy or not. The file should be executable by the TaskTracker.
@@ -809,18 +1076,18 @@
             is not started.</td>
             </tr>
             <tr>
-            <td><code>mapred.healthChecker.interval</code></td>
+            <td><code>mapreduce.tasktracker.healthchecker.interval</code></td>
             <td>Frequency at which the node health script is run, 
             in milliseconds</td>
             </tr>
             <tr>
-            <td><code>mapred.healthChecker.script.timeout</code></td>
+            <td><code>mapreduce.tasktracker.healthchecker.script.timeout</code></td>
             <td>Time after which the node health script will be killed by
             the TaskTracker if unresponsive.
             The node is marked unhealthy. if node health script times out.</td>
             </tr>
             <tr>
-            <td><code>mapred.healthChecker.script.args</code></td>
+            <td><code>mapreduce.tasktracker.healthchecker.script.args</code></td>
             <td>Extra arguments that can be passed to the node health script 
             when launched.
             These should be comma separated list of arguments. </td>
@@ -857,17 +1124,17 @@
             <title>History Logging</title>
             
             <p> The job history files are stored in central location 
-            <code> hadoop.job.history.location </code> which can be on DFS also,
+            <code> mapreduce.jobtracker.jobhistory.location </code> which can be on DFS also,
             whose default value is <code>${HADOOP_LOG_DIR}/history</code>. 
             The history web UI is accessible from job tracker web UI.</p>
             
             <p> The history files are also logged to user specified directory
-            <code>hadoop.job.history.user.location</code> 
+            <code>mapreduce.job.userhistorylocation</code> 
             which defaults to job output directory. The files are stored in
             "_logs/history/" in the specified directory. Hence, by default 
-            they will be in "mapred.output.dir/_logs/history/". User can stop
+            they will be in "mapreduce.output.fileoutputformat.outputdir/_logs/history/". User can stop
             logging by giving the value <code>none</code> for 
-            <code>hadoop.job.history.user.location</code> </p>
+            <code>mapreduce.job.userhistorylocation</code> </p>
             
             <p> User can view the history logs summary in specified directory 
             using the following command <br/>
@@ -880,7 +1147,6 @@
             <code>$ bin/hadoop job -history all output-dir</code><br/></p> 
           </section>
         </section>
-      </section>
       
       <p>Once all the necessary configuration is complete, distribute the files
       to the <code>HADOOP_CONF_DIR</code> directory on all the machines, 
@@ -891,9 +1157,9 @@
       <section>
         <title>Map/Reduce</title>
         <p>The job tracker restart can recover running jobs if 
-        <code>mapred.jobtracker.restart.recover</code> is set true and 
+        <code>mapreduce.jobtracker.restart.recover</code> is set true and 
         <a href="#Logging">JobHistory logging</a> is enabled. Also 
-        <code>mapred.jobtracker.job.history.block.size</code> value should be 
+        <code>mapreduce.jobtracker.jobhistory.block.size</code> value should be 
         set to an optimal value to dump job history to disk as soon as 
         possible, the typical value is 3145728(3MB).</p>
       </section>
@@ -951,7 +1217,7 @@
       and starts the <code>TaskTracker</code> daemon on all the listed slaves.
       </p>
     </section>
-    
+
     <section>
       <title>Hadoop Shutdown</title>
       

Added: hadoop/common/trunk/src/docs/src/documentation/content/xdocs/commands_manual.xml
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/src/docs/src/documentation/content/xdocs/commands_manual.xml?rev=951480&view=auto
==============================================================================
--- hadoop/common/trunk/src/docs/src/documentation/content/xdocs/commands_manual.xml (added)
+++ hadoop/common/trunk/src/docs/src/documentation/content/xdocs/commands_manual.xml Fri Jun  4 16:34:18 2010
@@ -0,0 +1,772 @@
+<?xml version="1.0"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+<document>
+	<header>
+		<title>Hadoop Commands Guide</title>
+	</header>
+	
+	<body>
+		<section>
+			<title>Overview</title>
+			<p>
+				All Hadoop commands are invoked by the bin/hadoop script. Running the Hadoop
+				script without any arguments prints the description for all commands.
+			</p>
+			<p>
+				<code>Usage: hadoop [--config confdir] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]</code>
+			</p>
+			<p>
+				Hadoop has an option parsing framework that employs parsing generic options as well as running classes.
+			</p>
+			<table>
+			          <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+			
+			           <tr>
+			          	<td><code>--config confdir</code></td>
+			            <td>Overwrites the default Configuration directory. Default is ${HADOOP_HOME}/conf.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>GENERIC_OPTIONS</code></td>
+			            <td>The common set of options supported by multiple commands.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>COMMAND</code><br/><code>COMMAND_OPTIONS</code></td>
+			            <td>Various commands with their options are described in the following sections. The commands 
+			            have been grouped into <a href="commands_manual.html#User+Commands">User Commands</a> 
+			            and <a href="commands_manual.html#Administration+Commands">Administration Commands</a>.</td>
+			           </tr>
+			     </table>
+			 <section>
+				<title>Generic Options</title>
+				<p>
+				  The following options are supported by <a href="commands_manual.html#dfsadmin">dfsadmin</a>, 
+				  <a href="commands_manual.html#fs">fs</a>, <a href="commands_manual.html#fsck">fsck</a> and 
+				  <a href="commands_manual.html#job">job</a>. 
+				  Applications should implement 
+				  <a href="ext:api/org/apache/hadoop/util/tool">Tool</a> to support
+				  <a href="ext:api/org/apache/hadoop/util/genericoptionsparser">
+				  GenericOptions</a>.
+				</p>
+			     <table>
+			          <tr><th> GENERIC_OPTION </th><th> Description </th></tr>
+			
+			           <tr>
+			          	<td><code>-conf &lt;configuration file&gt;</code></td>
+			            <td>Specify an application configuration file.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-D &lt;property=value&gt;</code></td>
+			            <td>Use value for given property.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-fs &lt;local|namenode:port&gt;</code></td>
+			            <td>Specify a namenode.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-jt &lt;local|jobtracker:port&gt;</code></td>
+			            <td>Specify a job tracker. Applies only to <a href="commands_manual.html#job">job</a>.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-files &lt;comma separated list of files&gt;</code></td>
+			            <td>Specify comma separated files to be copied to the map reduce cluster. 
+			            Applies only to <a href="commands_manual.html#job">job</a>.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-libjars &lt;comma seperated list of jars&gt;</code></td>
+			            <td>Specify comma separated jar files to include in the classpath. 
+			            Applies only to <a href="commands_manual.html#job">job</a>.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-archives &lt;comma separated list of archives&gt;</code></td>
+			            <td>Specify comma separated archives to be unarchived on the compute machines. 
+			            Applies only to <a href="commands_manual.html#job">job</a>.</td>
+			           </tr>
+				</table>
+			</section>	   
+		</section>
+		
+		<section>
+			<title> User Commands </title>
+			<p>Commands useful for users of a Hadoop cluster.</p>
+			<section>
+				<title> archive </title>
+				<p>
+					Creates a Hadoop archive. More information see the <a href="ext:hadoop-archives">Hadoop Archives Guide</a>.
+				</p>
+				<p>
+					<code>Usage: hadoop archive -archiveName NAME &lt;src&gt;* &lt;dest&gt;</code>
+				</p>
+				<table>
+			          <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+					   <tr>
+			          	<td><code>-archiveName NAME</code></td>
+			            <td>Name of the archive to be created.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>src</code></td>
+			            <td>Filesystem pathnames which work as usual with regular expressions.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>dest</code></td>
+			            <td>Destination directory which would contain the archive.</td>
+			           </tr>
+			     </table>
+			</section>
+			
+			<section>
+				<title> distcp </title>
+				<p>
+					Copy file or directories recursively. More information can be found at <a href="ext:distcp">DistCp Guide</a>.
+				</p>
+				<p>
+					<code>Usage: hadoop distcp &lt;srcurl&gt; &lt;desturl&gt;</code>
+				</p>
+				<table>
+			          <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+			
+			           <tr>
+			          	<td><code>srcurl</code></td>
+			            <td>Source Url</td>
+			           </tr>
+			           <tr>
+			          	<td><code>desturl</code></td>
+			            <td>Destination Url</td>
+			           </tr>
+			     </table>
+			</section>
+			       
+			<section>
+				<title> fs </title>
+				<p>
+					Runs a generic filesystem user client.
+				</p>
+				<p>
+					<code>Usage: hadoop fs [</code><a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a><code>] 
+					[COMMAND_OPTIONS]</code>
+				</p>
+				<p>
+					The various COMMAND_OPTIONS can be found at 
+					<a href="file_system_shell.html">File System Shell Guide</a>.
+				</p>   
+			</section>
+			
+			<section>
+				<title> fsck </title>
+				<p>
+					Runs a HDFS filesystem checking utility. See <a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Fsck">Fsck</a> for more info.
+				</p> 
+				<p><code>Usage: hadoop fsck [</code><a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a><code>] 
+				&lt;path&gt; [-move | -delete | -openforwrite] [-files [-blocks 
+				[-locations | -racks]]]</code></p>
+				<table>
+			          <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+			          <tr>
+			            <td><code>&lt;path&gt;</code></td>
+			            <td>Start checking from this path.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-move</code></td>
+			            <td>Move corrupted files to /lost+found</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-delete</code></td>
+			            <td>Delete corrupted files.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-openforwrite</code></td>
+			            <td>Print out files opened for write.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-files</code></td>
+			            <td>Print out files being checked.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-blocks</code></td>
+			            <td>Print out block report.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-locations</code></td>
+			            <td>Print out locations for every block.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-racks</code></td>
+			            <td>Print out network topology for data-node locations.</td>
+			           </tr>
+					</table>
+			</section>
+			
+			<section>
+				<title> jar </title>
+				<p>
+					Runs a jar file. Users can bundle their Map Reduce code in a jar file and execute it using this command.
+				</p> 
+				<p>
+					<code>Usage: hadoop jar &lt;jar&gt; [mainClass] args...</code>
+				</p>
+				<p>
+					The streaming jobs are run via this command. For examples, see 
+					<a href="ext:streaming">Hadoop Streaming</a>.
+				</p>
+				<p>
+					The WordCount example is also run using jar command. For examples, see the
+					<a href="ext:mapred-tutorial">MapReduce Tutorial</a>.
+				</p>
+			</section>
+			
+			<section>
+				<title> job </title>
+				<p>
+					Command to interact with Map Reduce Jobs.
+				</p>
+				<p>
+					<code>Usage: hadoop job [</code><a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a><code>] 
+					[-submit &lt;job-file&gt;] | [-status &lt;job-id&gt;] | 
+					[-counter &lt;job-id&gt; &lt;group-name&gt; &lt;counter-name&gt;] | [-kill &lt;job-id&gt;] | 
+					[-events &lt;job-id&gt; &lt;from-event-#&gt; &lt;#-of-events&gt;] | [-history [all] &lt;historyFile&gt;] |
+					[-list [all]] | [-kill-task &lt;task-id&gt;] | [-fail-task &lt;task-id&gt;] | 
+          [-set-priority &lt;job-id&gt; &lt;priority&gt;]</code>
+				</p>
+				<table>
+			          <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+			
+			           <tr>
+			          	<td><code>-submit &lt;job-file&gt;</code></td>
+			            <td>Submits the job.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-status &lt;job-id&gt;</code></td>
+			            <td>Prints the map and reduce completion percentage and all job counters.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-counter &lt;job-id&gt; &lt;group-name&gt; &lt;counter-name&gt;</code></td>
+			            <td>Prints the counter value.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-kill &lt;job-id&gt;</code></td>
+			            <td>Kills the job.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-events &lt;job-id&gt; &lt;from-event-#&gt; &lt;#-of-events&gt;</code></td>
+			            <td>Prints the events' details received by jobtracker for the given range.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-history [all] &lt;historyFile&gt;</code></td>
+			            <td>-history &lt;historyFile&gt; prints job details, failed and killed tip details. More details 
+			            about the job such as successful tasks and task attempts made for each task can be viewed by 
+			            specifying the [all] option. </td>
+			           </tr>
+			           <tr>
+			          	<td><code>-list [all]</code></td>
+			            <td>-list all displays all jobs. -list displays only jobs which are yet to complete.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-kill-task &lt;task-id&gt;</code></td>
+			            <td>Kills the task. Killed tasks are NOT counted against failed attempts.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-fail-task &lt;task-id&gt;</code></td>
+			            <td>Fails the task. Failed tasks are counted against failed attempts.</td>
+			           </tr>
+                 <tr>
+                  <td><code>-set-priority &lt;job-id&gt; &lt;priority&gt;</code></td>
+                  <td>Changes the priority of the job. 
+                  Allowed priority values are VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW</td>
+                 </tr>
+					</table>
+			</section>
+			
+			<section>
+				<title> pipes </title>
+				<p>
+					Runs a pipes job.
+				</p>
+				<p>
+					<code>Usage: hadoop pipes [-conf &lt;path&gt;] [-jobconf &lt;key=value&gt;, &lt;key=value&gt;, ...] 
+					[-input &lt;path&gt;] [-output &lt;path&gt;] [-jar &lt;jar file&gt;] [-inputformat &lt;class&gt;] 
+					[-map &lt;class&gt;] [-partitioner &lt;class&gt;] [-reduce &lt;class&gt;] [-writer &lt;class&gt;] 
+					[-program &lt;executable&gt;] [-reduces &lt;num&gt;] </code>
+				</p>
+				<table>
+			          <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+			
+			          <tr>
+			          	<td><code>-conf &lt;path&gt;</code></td>
+			            <td>Configuration for job</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-jobconf &lt;key=value&gt;, &lt;key=value&gt;, ...</code></td>
+			            <td>Add/override configuration for job</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-input &lt;path&gt;</code></td>
+			            <td>Input directory</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-output &lt;path&gt;</code></td>
+			            <td>Output directory</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-jar &lt;jar file&gt;</code></td>
+			            <td>Jar filename</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-inputformat &lt;class&gt;</code></td>
+			            <td>InputFormat class</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-map &lt;class&gt;</code></td>
+			            <td>Java Map class</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-partitioner &lt;class&gt;</code></td>
+			            <td>Java Partitioner</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-reduce &lt;class&gt;</code></td>
+			            <td>Java Reduce class</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-writer &lt;class&gt;</code></td>
+			            <td>Java RecordWriter</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-program &lt;executable&gt;</code></td>
+			            <td>Executable URI</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-reduces &lt;num&gt;</code></td>
+			            <td>Number of reduces</td>
+			           </tr>
+					</table>
+			</section>
+      <section>
+        <title> queue </title>
+        <p>
+          command to interact and view Job Queue information
+        </p>
+        <p>
+          <code>Usage : hadoop queue [-list] | [-info &lt;job-queue-name&gt; [-showJobs]] | [-showacls]</code>
+        </p>
+        <table>
+        <tr>
+          <th> COMMAND_OPTION </th><th> Description </th>
+        </tr>
+        <tr>
+          <td><anchor id="QueuesList"/><code>-list</code> </td>
+          <td>Gets list of Job Queues configured in the system. Along with scheduling information
+          associated with the job queues.
+          </td>
+        </tr>
+        <tr>
+          <td><anchor id="QueuesInfo"/><code>-info &lt;job-queue-name&gt; [-showJobs]</code></td>
+          <td>
+           Displays the job queue information and associated scheduling information of particular
+           job queue. If -showJobs options is present a list of jobs submitted to the particular job
+           queue is displayed. 
+          </td>
+        </tr>
+        <tr>
+          <td><code>-showacls</code></td>
+          <td>Displays the queue name and associated queue operations allowed for the current user.
+          The list consists of only those queues to which the user has access.
+          </td>
+          </tr>
+        </table>
+      </section>  	
+			<section>
+				<title> version </title>
+				<p>
+					Prints the version.
+				</p> 
+				<p>
+					<code>Usage: hadoop version</code>
+				</p>
+			</section>
+			<section>
+				<title> CLASSNAME </title>
+				<p>
+					 Hadoop script can be used to invoke any class.
+				</p>
+				<p>
+					 Runs the class named CLASSNAME.
+				</p>
+
+				<p>
+					<code>Usage: hadoop CLASSNAME</code>
+				</p>
+
+			</section>
+    </section>
+		<section>
+			<title> Administration Commands </title>
+			<p>Commands useful for administrators of a Hadoop cluster.</p>
+			<section>
+				<title> balancer </title>
+				<p>
+					Runs a cluster balancing utility. An administrator can simply press Ctrl-C to stop the 
+					rebalancing process. For more details see 
+					<a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Rebalancer">Rebalancer</a>.
+				</p>
+				<p>
+					<code>Usage: hadoop balancer [-threshold &lt;threshold&gt;]</code>
+				</p>
+				<table>
+			          <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+			
+			           <tr>
+			          	<td><code>-threshold &lt;threshold&gt;</code></td>
+			            <td>Percentage of disk capacity. This overwrites the default threshold.</td>
+			           </tr>
+			     </table>
+			</section>
+			
+			<section>
+				<title> daemonlog </title>
+				<p>
+					 Get/Set the log level for each daemon.
+				</p> 
+				<p>
+					<code>Usage: hadoop daemonlog  -getlevel &lt;host:port&gt; &lt;name&gt;</code><br/>
+					<code>Usage: hadoop daemonlog  -setlevel &lt;host:port&gt; &lt;name&gt; &lt;level&gt;</code>
+				</p>
+				<table>
+			          <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+			
+			           <tr>
+			          	<td><code>-getlevel &lt;host:port&gt; &lt;name&gt;</code></td>
+			            <td>Prints the log level of the daemon running at &lt;host:port&gt;. 
+			            This command internally connects to http://&lt;host:port&gt;/logLevel?log=&lt;name&gt;</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-setlevel &lt;host:port&gt; &lt;name&gt; &lt;level&gt;</code></td>
+			            <td>Sets the log level of the daemon running at &lt;host:port&gt;. 
+			            This command internally connects to http://&lt;host:port&gt;/logLevel?log=&lt;name&gt;</td>
+			           </tr>
+			     </table>
+			</section>
+			
+			<section>
+				<title> datanode</title>
+				<p>
+					Runs a HDFS datanode.
+				</p> 
+				<p>
+					<code>Usage: hadoop datanode [-rollback]</code>
+				</p>
+				<table>
+			          <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+			
+			           <tr>
+			          	<td><code>-rollback</code></td>
+			            <td>Rollsback the datanode to the previous version. This should be used after stopping the datanode 
+			            and distributing the old Hadoop version.</td>
+			           </tr>
+			     </table>
+			</section>
+			
+			<section>
+				<title> dfsadmin </title>
+				<p>
+					Runs a HDFS dfsadmin client.
+				</p> 
+				<p>
+					<code>Usage: hadoop dfsadmin  [</code><a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a><code>] [-report] [-safemode enter | leave | get | wait] [-refreshNodes]
+					 [-finalizeUpgrade] [-upgradeProgress status | details | force] [-metasave filename] 
+					 [-setQuota &lt;quota&gt; &lt;dirname&gt;...&lt;dirname&gt;] [-clrQuota &lt;dirname&gt;...&lt;dirname&gt;] 
+					 [-restoreFailedStorage true|false|check] 
+					 [-help [cmd]]</code>
+				</p>
+				<table>
+			          <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+			
+			           <tr>
+			          	<td><code>-report</code></td>
+			            <td>Reports basic filesystem information and statistics.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-safemode enter | leave | get | wait</code></td>
+			            <td>Safe mode maintenance command.
+                Safe mode is a Namenode state in which it <br/>
+                        1.  does not accept changes to the name space (read-only) <br/> 
+                        2.  does not replicate or delete blocks. <br/>
+                Safe mode is entered automatically at Namenode startup, and
+                leaves safe mode automatically when the configured minimum
+                percentage of blocks satisfies the minimum replication
+                condition.  Safe mode can also be entered manually, but then
+                it can only be turned off manually as well.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-refreshNodes</code></td>
+			            <td>Re-read the hosts and exclude files to update the set
+                of Datanodes that are allowed to connect to the Namenode
+                and those that should be decommissioned or recommissioned.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-finalizeUpgrade</code></td>
+			            <td>Finalize upgrade of HDFS.
+                Datanodes delete their previous version working directories,
+                followed by Namenode doing the same.
+                This completes the upgrade process.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-printTopology</code></td>
+			            <td>Print a tree of the rack/datanode topology of the
+                 cluster as seen by the NameNode.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-upgradeProgress status | details | force</code></td>
+			            <td>Request current distributed upgrade status,
+                a detailed status or force the upgrade to proceed.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-metasave filename</code></td>
+			            <td>Save Namenode's primary data structures
+                to &lt;filename&gt; in the directory specified by hadoop.log.dir property.
+                &lt;filename&gt; will contain one line for each of the following <br/>
+                        1. Datanodes heart beating with Namenode<br/>
+                        2. Blocks waiting to be replicated<br/>
+                        3. Blocks currrently being replicated<br/>
+                        4. Blocks waiting to be deleted</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-setQuota &lt;quota&gt; &lt;dirname&gt;...&lt;dirname&gt;</code></td>
+			            <td>Set the quota &lt;quota&gt; for each directory &lt;dirname&gt;.
+                The directory quota is a long integer that puts a hard limit on the number of names in the directory tree.<br/>
+                Best effort for the directory, with faults reported if<br/>
+                1. N is not a positive integer, or<br/>
+                2. user is not an administrator, or<br/>
+                3. the directory does not exist or is a file, or<br/>
+                4. the directory would immediately exceed the new quota.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-clrQuota &lt;dirname&gt;...&lt;dirname&gt;</code></td>
+			            <td>Clear the quota for each directory &lt;dirname&gt;.<br/>
+                Best effort for the directory. with fault reported if<br/>
+                1. the directory does not exist or is a file, or<br/>
+                2. user is not an administrator.<br/>
+                It does not fault if the directory has no quota.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-restoreFailedStorage true | false | check</code></td>
+			            <td>This option will turn on/off automatic attempt to restore failed storage replicas. 
+			            If a failed storage becomes available again the system will attempt to restore 
+			            edits and/or fsimage during checkpoint. 'check' option will return current setting.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-help [cmd]</code></td>
+			            <td> Displays help for the given command or all commands if none
+                is specified.</td>
+			           </tr>
+			     </table>
+			</section>
+			<section>
+        <title>mradmin</title>
+        <p>Runs MR admin client</p>
+        <p><code>Usage: hadoop mradmin  [</code>
+        <a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a>
+        <code>] [-refreshServiceAcl] [-refreshQueues] [-refreshNodes] [-help [cmd]] </code></p>
+        <table>
+        <tr>
+        <th> COMMAND_OPTION </th><th> Description </th>
+        </tr>
+        <tr>
+        <td><code>-refreshServiceAcl</code></td>
+        <td> Reload the service-level authorization policies. Jobtracker
+         will reload the authorization policy file.</td>
+        </tr>
+        <tr>
+        <td><anchor id="RefreshQueues"/><code>-refreshQueues</code></td>
+        <td><p> Reload the queues' configuration at the JobTracker.
+          Most of the configuration of the queues can be refreshed/reloaded
+          without restarting the Map/Reduce sub-system. Administrators
+          typically own the
+          <a href="cluster_setup.html#mapred-queues.xml">
+          <em>conf/mapred-queues.xml</em></a>
+          file, can edit it while the JobTracker is still running, and can do
+          a reload by running this command.</p>
+          <p>It should be noted that while trying to refresh queues'
+          configuration, one cannot change the hierarchy of queues itself.
+          This means no operation that involves a change in either the
+          hierarchy structure itself or the queues' names will be allowed.
+          Only selected properties of queues can be changed during refresh.
+          For example, new queues cannot be added dynamically, neither can an
+          existing queue be deleted.</p>
+          <p>If during a reload of queue configuration,
+          a syntactic or semantic error in made during the editing of the
+          configuration file, the refresh command fails with an exception that
+          is printed on the standard output of this command, thus informing the
+          requester with any helpful messages of what has gone wrong during
+          the edit/reload. Importantly, the existing queue configuration is
+          untouched and the system is left in a consistent state.
+          </p>
+          <p>As described in the
+          <a href="cluster_setup.html#mapred-queues.xml"><em>
+          conf/mapred-queues.xml</em></a> section, the
+          <a href="cluster_setup.html#properties_tag"><em>
+          &lt;properties&gt;</em></a> tag in the queue configuration file can
+          also be used to specify per-queue properties needed by the scheduler.
+           When the framework's queue configuration is reloaded using this
+          command, this scheduler specific configuration will also be reloaded
+          , provided the scheduler being configured supports this reload.
+          Please see the documentation of the particular scheduler in use.</p>
+          </td>
+        </tr>
+        <tr>
+        <td><code>-refreshNodes</code></td>
+        <td> Refresh the hosts information at the jobtracker.</td>
+        </tr>
+        <tr>
+        <td><code>-help [cmd]</code></td>
+        <td>Displays help for the given command or all commands if none
+                is specified.</td>
+        </tr>
+        </table>
+      </section>
+			<section>
+				<title> jobtracker </title>
+				<p>
+					Runs the MapReduce job Tracker node.
+				</p> 
+				<p>
+					<code>Usage: hadoop jobtracker [-dumpConfiguration]</code>
+					</p>
+          <table>
+          <tr>
+          <th>COMMAND_OPTION</th><th> Description</th>
+          </tr>
+          <tr>
+          <td><code>-dumpConfiguration</code></td>
+          <td> Dumps the configuration used by the JobTracker alongwith queue
+          configuration in JSON format into Standard output used by the 
+          jobtracker and exits.</td>
+          </tr>
+          </table>
+				
+			</section>
+			
+			<section>
+				<title> namenode </title>
+				<p>
+					Runs the namenode. For more information about upgrade, rollback and finalize see 
+					<a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Upgrade+and+Rollback">Upgrade and Rollback</a>.
+				</p>
+				<p>
+					<code>Usage: hadoop namenode [-format] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] | [-checkpoint] | [-backup]</code>
+				</p>
+				<table>
+			          <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+			
+                <tr>
+                  <td><code>-regular</code></td>
+                  <td>Start namenode in standard, active role rather than as backup or checkpoint node. This is the default role.</td>
+                </tr>
+                <tr>
+                  <td><code>-checkpoint</code></td>
+                  <td>Start namenode in checkpoint role, creating periodic checkpoints of the active namenode metadata.</td>
+                </tr>
+                <tr>
+                  <td><code>-backup</code></td>
+                  <td>Start namenode in backup role, maintaining an up-to-date in-memory copy of the namespace and creating periodic checkpoints.</td>
+                </tr>
+			           <tr>
+			          	<td><code>-format</code></td>
+			            <td>Formats the namenode. It starts the namenode, formats it and then shut it down.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-upgrade</code></td>
+			            <td>Namenode should be started with upgrade option after the distribution of new Hadoop version.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-rollback</code></td>
+			            <td>Rollsback the namenode to the previous version. This should be used after stopping the cluster 
+			            and distributing the old Hadoop version.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-finalize</code></td>
+			            <td>Finalize will remove the previous state of the files system. Recent upgrade will become permanent. 
+			            Rollback option will not be available anymore. After finalization it shuts the namenode down.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-importCheckpoint</code></td>
+			            <td>Loads image from a checkpoint directory and saves it into the current one. Checkpoint directory 
+			            is read from property fs.checkpoint.dir
+			            (see <a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Import+checkpoint">Import Checkpoint</a>).
+			            </td>
+			           </tr>
+			            <tr>
+			          	<td><code>-checkpoint</code></td>
+			            <td>Enables checkpointing 
+			            (see <a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Checkpoint+Node">Checkpoint Node</a>).</td>
+			           </tr>
+			            <tr>
+			          	<td><code>-backup</code></td>
+			            <td>Enables checkpointing and maintains an in-memory, up-to-date copy of the file system namespace 
+			            (see <a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Backup+Node">Backup Node</a>).</td>
+			           </tr>
+			     </table>
+			</section>
+			
+			<section>
+				<title> secondarynamenode </title>
+				<note>
+					The Secondary NameNode has been deprecated. Instead, consider using the
+					<a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Checkpoint+Node">Checkpoint Node</a> or 
+					<a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Backup+Node">Backup Node</a>. 
+				</note>
+				<p>	
+					Runs the HDFS secondary 
+					namenode. See <a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Secondary+NameNode">Secondary NameNode</a> 
+					for more info.
+				</p>
+				<p>
+					<code>Usage: hadoop secondarynamenode [-checkpoint [force]] | [-geteditsize]</code>
+				</p>
+				<table>
+			          <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+			
+			           <tr>
+			          	<td><code>-checkpoint [force]</code></td>
+			            <td>Checkpoints the Secondary namenode if EditLog size >= fs.checkpoint.size. 
+			            If -force is used, checkpoint irrespective of EditLog size.</td>
+			           </tr>
+			           <tr>
+			          	<td><code>-geteditsize</code></td>
+			            <td>Prints the EditLog size.</td>
+			           </tr>
+			     </table>
+			</section>
+			
+			<section>
+				<title> tasktracker </title>
+				<p>
+					Runs a MapReduce task Tracker node.
+				</p> 
+				<p>
+					<code>Usage: hadoop tasktracker</code>
+				</p>
+			</section>
+			
+		</section>
+		
+		
+		      
+
+	</body>
+</document>      

Propchange: hadoop/common/trunk/src/docs/src/documentation/content/xdocs/commands_manual.xml
------------------------------------------------------------------------------
    svn:eol-style = native



Mime
View raw message