hadoop-mapreduce-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ste...@apache.org
Subject svn commit: r885145 [11/34] - in /hadoop/mapreduce/branches/MAPREDUCE-233: ./ .eclipse.templates/ .eclipse.templates/.launches/ conf/ ivy/ lib/ src/benchmarks/gridmix/ src/benchmarks/gridmix/pipesort/ src/benchmarks/gridmix2/ src/benchmarks/gridmix2/sr...
Date Sat, 28 Nov 2009 20:26:22 GMT
Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/capacity_scheduler.xml Sat Nov 28 20:26:01 2009
@@ -21,7 +21,7 @@
 <document>
   
   <header>
-    <title>Capacity Scheduler Guide</title>
+    <title>Capacity Scheduler</title>
   </header>
   
   <body>
@@ -30,7 +30,7 @@
       <title>Purpose</title>
       
       <p>This document describes the Capacity Scheduler, a pluggable 
-      Map/Reduce scheduler for Hadoop which provides a way to share 
+      MapReduce scheduler for Hadoop which provides a way to share 
       large clusters.</p>
     </section>
     
@@ -40,7 +40,7 @@
       <p>The Capacity Scheduler supports the following features:</p> 
       <ul>
         <li>
-          Support for multiple queues, where a job is submitted to a queue.
+          Multiple queues, where a job is submitted to a queue.
         </li>
         <li>
           Queues are allocated a fraction of the capacity of the grid in the 
@@ -81,7 +81,7 @@
     </section>
     
     <section>
-      <title>Picking a task to run</title>
+      <title>Picking a Task to Run</title>
       
       <p>Note that many of these steps can be, and will be, enhanced over time
       to provide better algorithms.</p>
@@ -131,37 +131,36 @@
           the following property in the site configuration:</p>
           <table>
             <tr>
-              <td>Property</td>
-              <td>Value</td>
+              <th>Name</th>
+              <th>Value</th>
             </tr>
             <tr>
-              <td>mapred.jobtracker.taskScheduler</td>
+              <td>mapreduce.jobtracker.taskscheduler</td>
               <td>org.apache.hadoop.mapred.CapacityTaskScheduler</td>
             </tr>
           </table>
       </section>
 
       <section>
-        <title>Setting up queues</title>
+        <title>Setting Up Queues</title>
         <p>
           You can define multiple queues to which users can submit jobs with
           the Capacity Scheduler. To define multiple queues, you should edit
           the site configuration for Hadoop and modify the
-          <em>mapred.queue.names</em> property.
+          <em>mapreduce.jobtracker.taskscheduler.queue.names</em> property.
         </p>
         <p>
           You can also configure ACLs for controlling which users or groups
           have access to the queues.
         </p>
         <p>
-          For more details, refer to
-          <a href="cluster_setup.html#Configuring+the+Hadoop+Daemons">Cluster 
-          Setup</a> documentation.
+          For more details, see
+          <a href="http://hadoop.apache.org/common/docs/current/cluster_setup.html#Configuring+the+Hadoop+Daemons">Configuring the Hadoop Daemons</a>.
         </p>
       </section>
   
       <section>
-        <title>Configuring properties for queues</title>
+        <title>Configuring Properties for Queues</title>
 
         <p>The Capacity Scheduler can be configured with several properties
         for each queue that control the behavior of the Scheduler. This
@@ -183,16 +182,16 @@
 
         <table>
           <tr><th>Name</th><th>Description</th></tr>
-          <tr><td>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.capacity</td>
+          <tr><td>mapred.capacity-scheduler.queue.&lt;queue-<br/>name&gt;.capacity</td>
           	<td>Percentage of the number of slots in the cluster that are made 
             to be available for jobs in this queue. The sum of capacities 
             for all queues should be less than or equal 100.</td>
           </tr>
-          <tr><td>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.supports-priority</td>
+          <tr><td>mapred.capacity-scheduler.queue.&lt;queue-<br/>name&gt;.supports-priority</td>
           	<td>If true, priorities of jobs will be taken into account in scheduling 
           	decisions.</td>
           </tr>
-          <tr><td>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.minimum-user-limit-percent</td>
+          <tr><td>mapred.capacity-scheduler.queue.&lt;queue-<br/>name&gt;.minimum-user-limit-percent</td>
           	<td>Each queue enforces a limit on the percentage of resources 
           	allocated to a user at any given time, if there is competition 
           	for them. This user limit can vary between a minimum and maximum 
@@ -205,52 +204,19 @@
           	users, no user can use more than 25% of the queue's resources. A 
           	value of 100 implies no user limits are imposed.</td>
           </tr>
-          <tr><td>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.max.map.slots</td>
-          	<td>
-		    This value is the maximum max slots that can be used in a
-		    queue at any point of time. So for example assuming above config value
-		    is 100 , not more than 100 tasks would be in the queue at any point of
-		    time, assuming each task takes one slot.
-
-		    Default value of -1 would disable this capping feature
-
-		    Typically the queue capacity should be equal to this limit.
-		    If queue capacity is more than this limit, excess capacity will be
-		    used by the other queues. If queue capacity is less than the above
-		    limit , then the limit would be the queue capacity - as in the current
-		    implementation
-                </td>
-          </tr>
-          <tr><td>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.max.reduce.slots</td>
-          	<td>
-		    This value is the maximum reduce slots that can be used in a
-		    queue at any point of time. So for example assuming above config value
-		    is 100 , not more than 100 tasks would be in the queue at any point of
-		    time, assuming each task takes one slot.
-
-		    Default value of -1 would disable this capping feature
-
-		    Typically the queue capacity should be equal to this limit.
-		    If queue capacity is more than this limit, excess capacity will be
-		    used by the other queues. If queue capacity is less than the above
-		    limit , then the limit would be the queue capacity - as in the current
-		    implementation
-                </td>
-          </tr>
         </table>
       </section>
       
       <section>
-        <title>Memory management</title>
+        <title>Memory Management</title>
       
         <p>The Capacity Scheduler supports scheduling of tasks on a
         <code>TaskTracker</code>(TT) based on a job's memory requirements
         and the availability of RAM and Virtual Memory (VMEM) on the TT node.
-        See the <a href="mapred_tutorial.html#Memory+monitoring">Hadoop 
-        Map/Reduce tutorial</a> for details on how the TT monitors
-        memory usage.</p>
-        <p>Currently the memory based scheduling is only supported
-        in Linux platform.</p>
+        See the 
+        <a href="mapred_tutorial.html">MapReduce Tutorial</a> 
+        for details on how the TT monitors memory usage.</p>
+        <p>Currently the memory based scheduling is only supported in Linux platform.</p>
         <p>Memory-based scheduling works as follows:</p>
         <ol>
           <li>The absence of any one or more of three config parameters 
@@ -260,8 +226,8 @@
           <code>mapred.task.limit.maxvmem</code>, disables memory-based
           scheduling, just as it disables memory monitoring for a TT. These
           config parameters are described in the 
-          <a href="mapred_tutorial.html#Memory+monitoring">Hadoop Map/Reduce 
-          tutorial</a>. The value of  
+          <a href="mapred_tutorial.html">MapReduce Tutorial</a>. 
+          The value of  
           <code>mapred.tasktracker.vmem.reserved</code> is 
           obtained from the TT via its heartbeat. 
           </li>
@@ -286,7 +252,7 @@
           set, the Scheduler computes the available RAM on the node. Next, 
           the Scheduler figures out the RAM requirements of the job, if any. 
           As with VMEM, users can optionally specify a RAM limit for their job
-          (<code>mapred.task.maxpmem</code>, described in the Map/Reduce 
+          (<code>mapred.task.maxpmem</code>, described in the MapReduce 
           tutorial). The Scheduler also maintains a limit for this value 
           (<code>mapred.capacity-scheduler.task.default-pmem-percentage-in-vmem</code>, 
           described below). All these three values must be set for the 
@@ -303,7 +269,7 @@
 
         <table>
           <tr><th>Name</th><th>Description</th></tr>
-          <tr><td>mapred.capacity-scheduler.task.default-pmem-percentage-in-vmem</td>
+          <tr><td>mapred.capacity-scheduler.task.default-pmem-<br/>percentage-in-vmem</td>
           	<td>A percentage of the default VMEM limit for jobs
           	(<code>mapred.task.default.maxvmem</code>). This is the default 
           	RAM task-limit associated with a task. Unless overridden by a 
@@ -323,14 +289,14 @@
         scheduled, for reducing the memory footprint on jobtracker. 
         Following are the parameters, by which you can control the laziness
         of the job initialization. The following parameters can be 
-        configured in capacity-scheduler.xml
+        configured in capacity-scheduler.xml:
         </p>
         
         <table>
           <tr><th>Name</th><th>Description</th></tr>
           <tr>
             <td>
-              mapred.capacity-scheduler.queue.&lt;queue-name&gt;.maximum-initialized-jobs-per-user
+              mapred.capacity-scheduler.queue.&lt;queue-<br/>name&gt;.maximum-initialized-jobs-per-user
             </td>
             <td>
               Maximum number of jobs which are allowed to be pre-initialized for
@@ -367,13 +333,13 @@
         </table>
       </section>   
       <section>
-        <title>Reviewing the configuration of the Capacity Scheduler</title>
+        <title>Reviewing the Configuration of the Capacity Scheduler</title>
         <p>
           Once the installation and configuration is completed, you can review
-          it after starting the Map/Reduce cluster from the admin UI.
+          it after starting the MapReduce cluster from the admin UI.
         </p>
         <ul>
-          <li>Start the Map/Reduce cluster as usual.</li>
+          <li>Start the MapReduce cluster as usual.</li>
           <li>Open the JobTracker web UI.</li>
           <li>The queues you have configured should be listed under the <em>Scheduling
               Information</em> section of the page.</li>

Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/cluster_setup.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/cluster_setup.xml?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/cluster_setup.xml (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/cluster_setup.xml Sat Nov 28 20:26:01 2009
@@ -221,12 +221,12 @@
           <th>Notes</th>
         </tr>
         <tr>
-          <td>mapred.job.tracker</td>
+          <td>mapreduce.jobtracker.address</td>
           <td>Host or IP and port of <code>JobTracker</code>.</td>
           <td><em>host:port</em> pair.</td>
         </tr>
 		    <tr>
-		      <td>mapred.system.dir</td>
+		      <td>mapreduce.jobtracker.system.dir</td>
 		      <td>
 		        Path on the HDFS where where the Map/Reduce framework stores 
 		        system files e.g. <code>/hadoop/mapred/system/</code>.
@@ -237,7 +237,7 @@
 		      </td>
 		    </tr>
 		    <tr>
-		      <td>mapred.local.dir</td>
+		      <td>mapreduce.cluster.local.dir</td>
 		      <td>
 		        Comma-separated list of paths on the local filesystem where 
 		        temporary Map/Reduce data is written.
@@ -264,7 +264,7 @@
 		      </td>
 		    </tr>
 		    <tr>
-		      <td>mapred.hosts/mapred.hosts.exclude</td>
+		      <td>mapreduce.jobtracker.hosts.filename/mapreduce.jobtracker.hosts.exclude.filename</td>
 		      <td>List of permitted/excluded TaskTrackers.</td>
 		      <td>
 		        If necessary, use these files to control the list of allowable 
@@ -284,7 +284,7 @@
             being used, the list of configured queue names must be
             specified here. Once queues are defined, users can submit
             jobs to a queue using the property name 
-            <em>mapred.job.queue.name</em> in the job configuration.
+            <em>mapreduce.job.queuename</em> in the job configuration.
             There could be a separate 
             configuration file for configuring properties of these 
             queues that is managed by the scheduler. 
@@ -383,7 +383,7 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.reduce.parallel.copies</td>
+                    <td>mapreduce.reduce.shuffle.parallelcopies</td>
                     <td>20</td>
                     <td>
                       Higher number of parallel copies run by reduces to fetch
@@ -392,7 +392,7 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.map.child.java.opts</td>
+                    <td>mapreduce.map.java.opts</td>
                     <td>-Xmx512M</td>
                     <td>
                       Larger heap-size for child jvms of maps. 
@@ -400,7 +400,7 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.reduce.child.java.opts</td>
+                    <td>mapreduce.reduce.java.opts</td>
                     <td>-Xmx512M</td>
                     <td>
                       Larger heap-size for child jvms of reduces. 
@@ -417,13 +417,13 @@
                   </tr>
                   <tr>
                     <td>conf/core-site.xml</td>
-                    <td>io.sort.factor</td>
+                    <td>mapreduce.task.io.sort.factor</td>
                     <td>100</td>
                     <td>More streams merged at once while sorting files.</td>
                   </tr>
                   <tr>
                     <td>conf/core-site.xml</td>
-                    <td>io.sort.mb</td>
+                    <td>mapreduce.task.io.sort.mb</td>
                     <td>200</td>
                     <td>Higher memory-limit while sorting data.</td>
                   </tr>
@@ -448,7 +448,7 @@
 		          </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.job.tracker.handler.count</td>
+                    <td>mapreduce.jobtracker.handler.count</td>
                     <td>60</td>
                     <td>
                       More JobTracker server threads to handle RPCs from large 
@@ -457,13 +457,13 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.reduce.parallel.copies</td>
+                    <td>mapreduce.reduce.shuffle.parallelcopies</td>
                     <td>50</td>
                     <td></td>
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>tasktracker.http.threads</td>
+                    <td>mapreduce.tasktracker.http.threads</td>
                     <td>50</td>
                     <td>
                       More worker threads for the TaskTracker's http server. The
@@ -473,7 +473,7 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.map.child.java.opts</td>
+                    <td>mapreduce.map.java.opts</td>
                     <td>-Xmx512M</td>
                     <td>
                       Larger heap-size for child jvms of maps. 
@@ -481,7 +481,7 @@
                   </tr>
                   <tr>
                     <td>conf/mapred-site.xml</td>
-                    <td>mapred.reduce.child.java.opts</td>
+                    <td>mapreduce.reduce.java.opts</td>
                     <td>-Xmx1024M</td>
                     <td>Larger heap-size for child jvms of reduces.</td>
                   </tr>
@@ -558,7 +558,7 @@
 
     <table>
           <tr><th>Name</th><th>Type</th><th>Description</th></tr>
-          <tr><td>mapred.tasktracker.taskmemorymanager.monitoring-interval</td>
+          <tr><td>mapreduce.tasktracker.taskmemorymanager.monitoringinterval</td>
             <td>long</td>
             <td>The time interval, in milliseconds, between which the TT 
             checks for any memory violation. The default value is 5000 msec
@@ -668,10 +668,11 @@
             the tasks. For maximum security, this task controller 
             sets up restricted permissions and user/group ownership of
             local files and directories used by the tasks such as the
-            job jar files, intermediate files and task log files. Currently
-            permissions on distributed cache files are opened up to be
-            accessible by all users. In future, it is expected that stricter
-            file permissions are set for these files too.
+            job jar files, intermediate files, task log files and distributed
+            cache files. Particularly note that, because of this, except the
+            job owner and tasktracker, no other user can access any of the
+            local files/directories including those localized as part of the
+            distributed cache.
             </td>
             </tr>
             </table>
@@ -684,7 +685,7 @@
             <th>Property</th><th>Value</th><th>Notes</th>
             </tr>
             <tr>
-            <td>mapred.task.tracker.task-controller</td>
+            <td>mapreduce.tasktracker.taskcontroller</td>
             <td>Fully qualified class name of the task controller class</td>
             <td>Currently there are two implementations of task controller
             in the Hadoop system, DefaultTaskController and LinuxTaskController.
@@ -747,8 +748,8 @@
             </p>
             <table><tr><th>Name</th><th>Description</th></tr>
             <tr>
-            <td>mapred.local.dir</td>
-            <td>Path to mapred local directories. Should be same as the value 
+            <td>mapreduce.cluster.local.dir</td>
+            <td>Path to mapreduce.cluster.local.directories. Should be same as the value 
             which was provided to key in mapred-site.xml. This is required to
             validate paths passed to the setuid executable in order to prevent
             arbitrary paths being passed to it.</td>
@@ -765,7 +766,7 @@
             <p>
             The LinuxTaskController requires that paths including and leading up to
             the directories specified in
-            <em>mapred.local.dir</em> and <em>hadoop.log.dir</em> to be set 755
+            <em>mapreduce.cluster.local.dir</em> and <em>hadoop.log.dir</em> to be set 755
             permissions.
             </p>
             </section>
@@ -800,7 +801,7 @@
             monitoring script in <em>mapred-site.xml</em>.</p>
             <table>
             <tr><th>Name</th><th>Description</th></tr>
-            <tr><td><code>mapred.healthChecker.script.path</code></td>
+            <tr><td><code>mapreduce.tasktracker.healthchecker.script.path</code></td>
             <td>Absolute path to the script which is periodically run by the 
             TaskTracker to determine if the node is 
             healthy or not. The file should be executable by the TaskTracker.
@@ -809,18 +810,18 @@
             is not started.</td>
             </tr>
             <tr>
-            <td><code>mapred.healthChecker.interval</code></td>
+            <td><code>mapreduce.tasktracker.healthchecker.interval</code></td>
             <td>Frequency at which the node health script is run, 
             in milliseconds</td>
             </tr>
             <tr>
-            <td><code>mapred.healthChecker.script.timeout</code></td>
+            <td><code>mapreduce.tasktracker.healthchecker.script.timeout</code></td>
             <td>Time after which the node health script will be killed by
             the TaskTracker if unresponsive.
             The node is marked unhealthy. if node health script times out.</td>
             </tr>
             <tr>
-            <td><code>mapred.healthChecker.script.args</code></td>
+            <td><code>mapreduce.tasktracker.healthchecker.script.args</code></td>
             <td>Extra arguments that can be passed to the node health script 
             when launched.
             These should be comma separated list of arguments. </td>
@@ -857,17 +858,17 @@
             <title>History Logging</title>
             
             <p> The job history files are stored in central location 
-            <code> hadoop.job.history.location </code> which can be on DFS also,
+            <code> mapreduce.jobtracker.jobhistory.location </code> which can be on DFS also,
             whose default value is <code>${HADOOP_LOG_DIR}/history</code>. 
             The history web UI is accessible from job tracker web UI.</p>
             
             <p> The history files are also logged to user specified directory
-            <code>hadoop.job.history.user.location</code> 
+            <code>mapreduce.job.userhistorylocation</code> 
             which defaults to job output directory. The files are stored in
             "_logs/history/" in the specified directory. Hence, by default 
-            they will be in "mapred.output.dir/_logs/history/". User can stop
+            they will be in "mapreduce.output.fileoutputformat.outputdir/_logs/history/". User can stop
             logging by giving the value <code>none</code> for 
-            <code>hadoop.job.history.user.location</code> </p>
+            <code>mapreduce.job.userhistorylocation</code> </p>
             
             <p> User can view the history logs summary in specified directory 
             using the following command <br/>
@@ -891,9 +892,9 @@
       <section>
         <title>Map/Reduce</title>
         <p>The job tracker restart can recover running jobs if 
-        <code>mapred.jobtracker.restart.recover</code> is set true and 
+        <code>mapreduce.jobtracker.restart.recover</code> is set true and 
         <a href="#Logging">JobHistory logging</a> is enabled. Also 
-        <code>mapred.jobtracker.job.history.block.size</code> value should be 
+        <code>mapreduce.jobtracker.jobhistory.block.size</code> value should be 
         set to an optimal value to dump job history to disk as soon as 
         possible, the typical value is 3145728(3MB).</p>
       </section>

Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/commands_manual.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/commands_manual.xml?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/commands_manual.xml (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/commands_manual.xml Sat Nov 28 20:26:01 2009
@@ -19,14 +19,14 @@
 <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
 <document>
 	<header>
-		<title>Commands Guide</title>
+		<title>Hadoop Commands Guide</title>
 	</header>
 	
 	<body>
 		<section>
 			<title>Overview</title>
 			<p>
-				All hadoop commands are invoked by the bin/hadoop script. Running the hadoop
+				All Hadoop commands are invoked by the bin/hadoop script. Running the Hadoop
 				script without any arguments prints the description for all commands.
 			</p>
 			<p>
@@ -104,11 +104,11 @@
 		
 		<section>
 			<title> User Commands </title>
-			<p>Commands useful for users of a hadoop cluster.</p>
+			<p>Commands useful for users of a Hadoop cluster.</p>
 			<section>
 				<title> archive </title>
 				<p>
-					Creates a hadoop archive. More information can be found at <a href="hadoop_archives.html">Hadoop Archives</a>.
+					Creates a Hadoop archive. More information see the <a href="hadoop_archives.html">Hadoop Archives Guide</a>.
 				</p>
 				<p>
 					<code>Usage: hadoop archive -archiveName NAME &lt;src&gt;* &lt;dest&gt;</code>
@@ -133,7 +133,7 @@
 			<section>
 				<title> distcp </title>
 				<p>
-					Copy file or directories recursively. More information can be found at <a href="distcp.html">Hadoop DistCp Guide</a>.
+					Copy file or directories recursively. More information can be found at <a href="distcp.html">DistCp Guide</a>.
 				</p>
 				<p>
 					<code>Usage: hadoop distcp &lt;srcurl&gt; &lt;desturl&gt;</code>
@@ -155,21 +155,22 @@
 			<section>
 				<title> fs </title>
 				<p>
-					<code>Usage: hadoop fs [</code><a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a><code>] 
-					[COMMAND_OPTIONS]</code>
+					Runs a generic filesystem user client.
 				</p>
 				<p>
-					Runs a generic filesystem user client.
+					<code>Usage: hadoop fs [</code><a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a><code>] 
+					[COMMAND_OPTIONS]</code>
 				</p>
 				<p>
-					The various COMMAND_OPTIONS can be found at <a href="hdfs_shell.html">Hadoop FS Shell Guide</a>.
+					The various COMMAND_OPTIONS can be found at 
+					<a href="http://hadoop.apache.org/common/docs/current/file_system_shell.html">File System Shell Guide</a>.
 				</p>   
 			</section>
 			
 			<section>
 				<title> fsck </title>
 				<p>
-					Runs a HDFS filesystem checking utility. See <a href="hdfs_user_guide.html#Fsck">Fsck</a> for more info.
+					Runs a HDFS filesystem checking utility. See <a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Fsck">Fsck</a> for more info.
 				</p> 
 				<p><code>Usage: hadoop fsck [</code><a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a><code>] 
 				&lt;path&gt; [-move | -delete | -openforwrite] [-files [-blocks 
@@ -220,12 +221,12 @@
 					<code>Usage: hadoop jar &lt;jar&gt; [mainClass] args...</code>
 				</p>
 				<p>
-					The streaming jobs are run via this command. Examples can be referred from 
-					<a href="streaming.html#More+usage+examples">Streaming examples</a>
+					The streaming jobs are run via this command. For examples, see 
+					<a href="streaming.html">Hadoop Streaming</a>.
 				</p>
 				<p>
-					Word count example is also run using jar command. It can be referred from
-					<a href="mapred_tutorial.html#Usage">Wordcount example</a>
+					The WordCount example is also run using jar command. For examples, see the
+					<a href="mapred_tutorial.html">MapReduce Tutorial</a>.
 				</p>
 			</section>
 			
@@ -238,7 +239,7 @@
 					<code>Usage: hadoop job [</code><a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a><code>] 
 					[-submit &lt;job-file&gt;] | [-status &lt;job-id&gt;] | 
 					[-counter &lt;job-id&gt; &lt;group-name&gt; &lt;counter-name&gt;] | [-kill &lt;job-id&gt;] | 
-					[-events &lt;job-id&gt; &lt;from-event-#&gt; &lt;#-of-events&gt;] | [-history [all] &lt;jobOutputDir&gt;] |
+					[-events &lt;job-id&gt; &lt;from-event-#&gt; &lt;#-of-events&gt;] | [-history [all] &lt;historyFile&gt;] |
 					[-list [all]] | [-kill-task &lt;task-id&gt;] | [-fail-task &lt;task-id&gt;] | 
           [-set-priority &lt;job-id&gt; &lt;priority&gt;]</code>
 				</p>
@@ -266,8 +267,8 @@
 			            <td>Prints the events' details received by jobtracker for the given range.</td>
 			           </tr>
 			           <tr>
-			          	<td><code>-history [all] &lt;jobOutputDir&gt;</code></td>
-			            <td>-history &lt;jobOutputDir&gt; prints job details, failed and killed tip details. More details 
+			          	<td><code>-history [all] &lt;historyFile&gt;</code></td>
+			            <td>-history &lt;historyFile&gt; prints job details, failed and killed tip details. More details 
 			            about the job such as successful tasks and task attempts made for each task can be viewed by 
 			            specifying the [all] option. </td>
 			           </tr>
@@ -401,24 +402,27 @@
 			<section>
 				<title> CLASSNAME </title>
 				<p>
-					 hadoop script can be used to invoke any class.
+					 Hadoop script can be used to invoke any class.
 				</p>
 				<p>
-					<code>Usage: hadoop CLASSNAME</code>
+					 Runs the class named CLASSNAME.
 				</p>
+
 				<p>
-					 Runs the class named CLASSNAME.
+					<code>Usage: hadoop CLASSNAME</code>
 				</p>
+
 			</section>
     </section>
 		<section>
 			<title> Administration Commands </title>
-			<p>Commands useful for administrators of a hadoop cluster.</p>
+			<p>Commands useful for administrators of a Hadoop cluster.</p>
 			<section>
 				<title> balancer </title>
 				<p>
 					Runs a cluster balancing utility. An administrator can simply press Ctrl-C to stop the 
-					rebalancing process. See <a href="hdfs_user_guide.html#Rebalancer">Rebalancer</a> for more details.
+					rebalancing process. For more details see 
+					<a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Rebalancer">Rebalancer</a>.
 				</p>
 				<p>
 					<code>Usage: hadoop balancer [-threshold &lt;threshold&gt;]</code>
@@ -472,7 +476,7 @@
 			           <tr>
 			          	<td><code>-rollback</code></td>
 			            <td>Rollsback the datanode to the previous version. This should be used after stopping the datanode 
-			            and distributing the old hadoop version.</td>
+			            and distributing the old Hadoop version.</td>
 			           </tr>
 			     </table>
 			</section>
@@ -584,7 +588,7 @@
         </tr>
         <tr>
         <td><code>-refreshQueueAcls</code></td>
-        <td> Refresh the queue acls used by hadoop, to check access during submissions
+        <td> Refresh the queue acls used by Hadoop, to check access during submissions
         and administration of the job by the user. The properties present in
         <code>mapred-queue-acls.xml</code> is reloaded by the queue manager.</td>
         </tr>
@@ -615,11 +619,11 @@
 			<section>
 				<title> namenode </title>
 				<p>
-					Runs the namenode. More info about the upgrade, rollback and finalize is at 
-					<a href="hdfs_user_guide.html#Upgrade+and+Rollback">Upgrade Rollback</a>
+					Runs the namenode. For more information about upgrade, rollback and finalize see 
+					<a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Upgrade+and+Rollback">Upgrade and Rollback</a>.
 				</p>
 				<p>
-					<code>Usage: hadoop namenode [-format] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint]</code>
+					<code>Usage: hadoop namenode [-format] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] | [-checkpoint] | [-backup]</code>
 				</p>
 				<table>
 			          <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
@@ -642,12 +646,12 @@
 			           </tr>
 			           <tr>
 			          	<td><code>-upgrade</code></td>
-			            <td>Namenode should be started with upgrade option after the distribution of new hadoop version.</td>
+			            <td>Namenode should be started with upgrade option after the distribution of new Hadoop version.</td>
 			           </tr>
 			           <tr>
 			          	<td><code>-rollback</code></td>
 			            <td>Rollsback the namenode to the previous version. This should be used after stopping the cluster 
-			            and distributing the old hadoop version.</td>
+			            and distributing the old Hadoop version.</td>
 			           </tr>
 			           <tr>
 			          	<td><code>-finalize</code></td>
@@ -657,18 +661,33 @@
 			           <tr>
 			          	<td><code>-importCheckpoint</code></td>
 			            <td>Loads image from a checkpoint directory and saves it into the current one. Checkpoint directory 
-			            is read from property fs.checkpoint.dir</td>
+			            is read from property fs.checkpoint.dir
+			            (see <a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Import+checkpoint">Import Checkpoint</a>).
+			            </td>
+			           </tr>
+			            <tr>
+			          	<td><code>-checkpoint</code></td>
+			            <td>Enables checkpointing 
+			            (see <a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Checkpoint+Node">Checkpoint Node</a>).</td>
+			           </tr>
+			            <tr>
+			          	<td><code>-backup</code></td>
+			            <td>Enables checkpointing and maintains an in-memory, up-to-date copy of the file system namespace 
+			            (see <a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Backup+Node">Backup Node</a>).</td>
 			           </tr>
 			     </table>
 			</section>
 			
 			<section>
 				<title> secondarynamenode </title>
-				<p>
-					Use of the Secondary NameNode has been deprecated. Instead, consider using a 
-					<a href="hdfs_user_guide.html#Checkpoint+node">Checkpoint node</a> or 
-					<a href="hdfs_user_guide.html#Backup+node">Backup node</a>. Runs the HDFS secondary 
-					namenode. See <a href="hdfs_user_guide.html#Secondary+NameNode">Secondary NameNode</a> 
+				<note>
+					The Secondary NameNode has been deprecated. Instead, consider using the
+					<a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Checkpoint+Node">Checkpoint Node</a> or 
+					<a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Backup+Node">Backup Node</a>. 
+				</note>
+				<p>	
+					Runs the HDFS secondary 
+					namenode. See <a href="http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Secondary+NameNode">Secondary NameNode</a> 
 					for more info.
 				</p>
 				<p>

Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/distcp.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/distcp.xml?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/distcp.xml (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/distcp.xml Sat Nov 28 20:26:01 2009
@@ -30,10 +30,10 @@
       <title>Overview</title>
 
       <p>DistCp (distributed copy) is a tool used for large inter/intra-cluster
-      copying. It uses Map/Reduce to effect its distribution, error
+      copying. It uses MapReduce to effect its distribution, error
       handling and recovery, and reporting. It expands a list of files and
       directories into input to map tasks, each of which will copy a partition
-      of the files specified in the source list. Its Map/Reduce pedigree has
+      of the files specified in the source list. Its MapReduce pedigree has
       endowed it with some quirks in both its semantics and execution. The
       purpose of this document is to offer guidance for common tasks and to
       elucidate its model.</p>
@@ -45,36 +45,35 @@
 
       <section>
         <title>Basic</title>
-        <p>The most common invocation of DistCp is an inter-cluster copy:</p>
-        <p><code>bash$ hadoop distcp hdfs://nn1:8020/foo/bar \</code><br/>
-           <code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-                 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-                 hdfs://nn2:8020/bar/foo</code></p>
+    <p>The most common invocation of DistCp is an inter-cluster copy:</p>
+<source>
+bash$ hadoop distcp hdfs://nn1:8020/foo/bar \ 
+            hdfs://nn2:8020/bar/foo 
+</source>             
 
         <p>This will expand the namespace under <code>/foo/bar</code> on nn1
         into a temporary file, partition its contents among a set of map
         tasks, and start a copy on each TaskTracker from nn1 to nn2. Note
         that DistCp expects absolute paths.</p>
 
-        <p>One can also specify multiple source directories on the command
-        line:</p>
-        <p><code>bash$ hadoop distcp hdfs://nn1:8020/foo/a \</code><br/>
-           <code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-                 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-                 hdfs://nn1:8020/foo/b \</code><br/>
-           <code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-                 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-                 hdfs://nn2:8020/bar/foo</code></p>
-
-        <p>Or, equivalently, from a file using the <code>-f</code> option:<br/>
-        <code>bash$ hadoop distcp -f hdfs://nn1:8020/srclist \</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-              &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-              &nbsp;hdfs://nn2:8020/bar/foo</code><br/></p>
-
-        <p>Where <code>srclist</code> contains<br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn1:8020/foo/a</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn1:8020/foo/b</code></p>
+    <p>One can also specify multiple source directories on the command line:</p>
+<source>
+bash$ hadoop distcp hdfs://nn1:8020/foo/a \ 
+            hdfs://nn1:8020/foo/b \ 
+            hdfs://nn2:8020/bar/foo 
+</source>             
+
+<p>Or, equivalently, from a file using the <code>-f</code> option:</p>
+<source>
+bash$ hadoop distcp -f hdfs://nn1:8020/srclist \ 
+            hdfs://nn2:8020/bar/foo 
+</source>          
+
+<p>Where <code>srclist</code> contains:</p> 
+<source>
+hdfs://nn1:8020/foo/a 
+hdfs://nn1:8020/foo/b 
+</source>
 
         <p>When copying from multiple sources, DistCp will abort the copy with
         an error message if two sources collide, but collisions at the
@@ -89,11 +88,11 @@
         both the source and destination file systems. For HDFS, both the source
         and destination must be running the same version of the protocol or use
         a backwards-compatible protocol (see <a href="#cpver">Copying Between
-        Versions</a>).</p>
+        Versions of HDFS</a>).</p>
 
         <p>After a copy, it is recommended that one generates and cross-checks
         a listing of the source and destination to verify that the copy was
-        truly successful. Since DistCp employs both Map/Reduce and the
+        truly successful. Since DistCp employs both MapReduce and the
         FileSystem API, issues in or between any of the three could adversely
         and silently affect the copy. Some have had success running with
         <code>-update</code> enabled to perform a second pass, but users should
@@ -107,11 +106,13 @@
 
       </section> <!-- Basic -->
 
+
       <section id="options">
         <title>Options</title>
 
         <section>
         <title>Option Index</title>
+        <p></p>
         <table>
           <tr><th> Flag </th><th> Description </th><th> Notes </th></tr>
 
@@ -150,7 +151,7 @@
               <td>Overwrite destination</td>
               <td>If a map fails and <code>-i</code> is not specified, all the
               files in the split, not only those that failed, will be recopied.
-              As discussed in the <a href="#uo">following</a>, it also changes
+              As discussed in <a href="#uo">Update and Overwrite</a>, it also changes
               the semantics for generating destination paths, so users should
               use this carefully.
               </td></tr>
@@ -159,8 +160,8 @@
               <td>As noted in the preceding, this is not a &quot;sync&quot;
               operation. The only criterion examined is the source and
               destination file sizes; if they differ, the source file
-              replaces the destination file. As discussed in the
-              <a href="#uo">following</a>, it also changes the semantics for
+              replaces the destination file. As discussed in 
+              <a href="#uo">Update and Overwrite</a>, it also changes the semantics for
               generating destination paths, so users should use this carefully.
               </td></tr>
           <tr><td><code>-f &lt;urilist_uri&gt;</code></td>
@@ -187,7 +188,9 @@
 
         </table>
 
-      </section>
+      </section> <!-- Option Index -->
+
+
 
       <section id="Symbolic-Representations">
         <title>Symbolic Representations</title>
@@ -200,7 +203,7 @@
           <li>1230k = 1230 * 1024 = 1259520</li>
           <li>891g = 891 * 1024^3 = 956703965184</li>
         </ul>
-      </section>
+      </section> <!-- Symbolic-Representations -->
 
       <section id="uo">
         <title>Update and Overwrite</title>
@@ -210,12 +213,15 @@
         <code>/foo/b</code> to <code>/bar/foo</code>, where the sources contain
         the following:</p>
 
-        <p><code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn1:8020/foo/a</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn1:8020/foo/a/aa</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn1:8020/foo/a/ab</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn1:8020/foo/b</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn1:8020/foo/b/ba</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn1:8020/foo/b/ab</code></p>
+        
+<source>
+    hdfs://nn1:8020/foo/a 
+    hdfs://nn1:8020/foo/a/aa 
+    hdfs://nn1:8020/foo/a/ab 
+    hdfs://nn1:8020/foo/b 
+    hdfs://nn1:8020/foo/b/ba 
+    hdfs://nn1:8020/foo/b/ab 
+</source>
 
         <p>If either <code>-update</code> or <code>-overwrite</code> is set,
         then both sources will map an entry to <code>/bar/foo/ab</code> at the
@@ -226,46 +232,51 @@
         <p>In the default case, both <code>/bar/foo/a</code> and
         <code>/bar/foo/b</code> will be created and neither will collide.</p>
 
-        <p>Now consider a legal copy using <code>-update</code>:<br/>
-        <code>distcp -update hdfs://nn1:8020/foo/a \</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-              &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-              hdfs://nn1:8020/foo/b \</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-              &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-              hdfs://nn2:8020/bar</code></p>
-
-        <p>With sources/sizes:</p>
-
-        <p><code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn1:8020/foo/a</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn1:8020/foo/a/aa 32</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn1:8020/foo/a/ab 32</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn1:8020/foo/b</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn1:8020/foo/b/ba 64</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn1:8020/foo/b/bb 32</code></p>
-
-        <p>And destination/sizes:</p>
-
-        <p><code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn2:8020/bar</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn2:8020/bar/aa 32</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn2:8020/bar/ba 32</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn2:8020/bar/bb 64</code></p>
-
-        <p>Will effect:</p>
-
-        <p><code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn2:8020/bar</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn2:8020/bar/aa 32</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn2:8020/bar/ab 32</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn2:8020/bar/ba 64</code><br/>
-        <code>&nbsp;&nbsp;&nbsp;&nbsp;hdfs://nn2:8020/bar/bb 32</code></p>
+<p>Now consider a legal copy using <code>-update</code>:</p>
+<source>
+distcp -update hdfs://nn1:8020/foo/a \ 
+    hdfs://nn1:8020/foo/b \ 
+    hdfs://nn1:8020/foo/file1 \
+    hdfs://nn2:8020/bar 
+</source>
+
+<p>With sources/sizes:</p>
+<source>
+    hdfs://nn1:8020/foo/a 
+    hdfs://nn1:8020/foo/a/aa 32 
+    hdfs://nn1:8020/foo/a/ab 32 
+    hdfs://nn1:8020/foo/b 
+    hdfs://nn1:8020/foo/b/ba 64 
+    hdfs://nn1:8020/foo/b/bb 32 
+    hdfs://nn1:8020/foo/file1 20
+</source>
+
+<p>And destination/sizes:</p>
+<source>
+    hdfs://nn2:8020/bar 
+    hdfs://nn2:8020/bar/aa 32 
+    hdfs://nn2:8020/bar/ba 32 
+    hdfs://nn2:8020/bar/bb 64 
+    hdfs://nn1:8020/foo/file1 15
+</source>
+
+<p>Will effect:</p>
+<source>
+    hdfs://nn2:8020/bar 
+    hdfs://nn2:8020/bar/aa 32 
+    hdfs://nn2:8020/bar/ab 32 
+    hdfs://nn2:8020/bar/ba 64 
+    hdfs://nn2:8020/bar/bb 32 
+    hdfs://nn1:8020/foo/file1 20
+</source>
 
         <p>Only <code>aa</code> is not overwritten on nn2. If
         <code>-overwrite</code> were specified, all elements would be
         overwritten.</p>
 
-      </section> <!-- Update and Overwrite -->
+    </section> <!-- Update and Overwrite -->
 
-      </section> <!-- Options -->
+    </section> <!-- Options -->
 
     </section> <!-- Usage -->
 
@@ -273,7 +284,7 @@
       <title>Appendix</title>
 
       <section>
-        <title>Map sizing</title>
+        <title>Map Sizing</title>
 
           <p>DistCp makes a faint attempt to size each map comparably so that
           each copies roughly the same number of bytes. Note that files are the
@@ -293,7 +304,7 @@
       </section>
 
       <section id="cpver">
-        <title>Copying between versions of HDFS</title>
+        <title>Copying Between Versions of HDFS</title>
 
         <p>For copying between two different versions of Hadoop, one will
         usually use HftpFileSystem. This is a read-only FileSystem, so DistCp
@@ -306,7 +317,37 @@
       </section>
 
       <section>
-        <title>Map/Reduce and other side-effects</title>
+        <title>Copying to S3</title>
+
+        <p>DistCp can be used to copy data between HDFS and other filesystems,
+        including those backed by S3. The <code>s3n</code> FileSystem
+        implementation allows DistCp (and Hadoop in general) to use an S3
+        bucket as a source or target for transfers. To transfer data from
+        HDFS to an S3 bucket, invoke DistCp using arguments like the following:
+        </p>
+<source>
+bash$ hadoop distcp hdfs://nn:8020/foo/bar \
+    s3n://$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY@&lt;bucket&gt;/foo/bar
+</source>
+
+        <p><code>$AWS_ACCESS_KEY_ID</code> and
+        <code>$AWS_SECRET_ACCESS_KEY</code> are environment variables holding
+        S3 access credentials.</p>
+
+        <p>Some FileSystem operations take longer on S3 than on HDFS. If you
+        are transferring large files to S3 (e.g., 1 GB and up), you may
+        experience timeouts during your job. To prevent this, you should set
+        the task timeout to a larger interval than is typically used:
+        </p>
+<source>
+bash$ hadoop distcp -D mapred.task.timeout=1800000 \
+    hdfs://nn:8020/foo/bar \
+    s3n://$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY@&lt;bucket&gt;/foo/bar
+</source>
+      </section>
+
+      <section>
+        <title>MapReduce and Other Side-effects</title>
 
         <p>As has been mentioned in the preceding, should a map fail to copy
         one of its inputs, there will be several side-effects.</p>
@@ -320,7 +361,7 @@
           copied by a previous map on a re-execution will be marked as
           &quot;skipped&quot;.</li>
 
-          <li>If a map fails <code>mapred.map.max.attempts</code> times, the
+          <li>If a map fails <code>mapreduce.map.maxattempts</code> times, the
           remaining map tasks will be killed (unless <code>-i</code> is
           set).</li>
 

Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/fair_scheduler.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/fair_scheduler.xml?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/fair_scheduler.xml (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/fair_scheduler.xml Sat Nov 28 20:26:01 2009
@@ -18,7 +18,7 @@
 <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
 <document>
   <header>
-    <title>Fair Scheduler Guide</title>
+    <title>Fair Scheduler</title>
   </header>
   <body>
 
@@ -26,7 +26,7 @@
       <title>Purpose</title>
 
       <p>This document describes the Fair Scheduler, a pluggable
-        Map/Reduce scheduler for Hadoop which provides a way to share
+        MapReduce scheduler for Hadoop which provides a way to share
         large clusters.</p>
     </section>
 
@@ -108,7 +108,7 @@
       </p>
 <source>
 &lt;property&gt;
-  &lt;name&gt;mapred.jobtracker.taskScheduler&lt;/name&gt;
+  &lt;name&gt;mapreduce.jobtracker.taskscheduler&lt;/name&gt;
   &lt;value&gt;org.apache.hadoop.mapred.FairScheduler&lt;/value&gt;
 &lt;/property&gt;
 </source>
@@ -148,7 +148,7 @@
           The following parameters can be set in <em>mapred-site.xml</em>
           to affect the behavior of the fair scheduler:
         </p>
-        <p><strong>Basic Parameters:</strong></p>
+        <p><strong>Basic Parameters</strong></p>
         <table>
           <tr>
           <th>Name</th><th>Description</th>
@@ -163,25 +163,25 @@
           </tr>
           <tr>
           <td>
+            mapred.fairscheduler.pool
+          </td>
+          <td>
+            Specify the pool that a job belongs in.  
+            If this is specified then mapred.fairscheduler.poolnameproperty is ignored.
+          </td>
+          </tr>
+          <tr>
+          <td>
             mapred.fairscheduler.poolnameproperty
           </td>
           <td>
             Specify which jobconf property is used to determine the pool that a
-            job belongs in. String, default: <em>user.name</em>
+            job belongs in. String, default: <em>mapreduce.job.mapreduce.job.user.name</em>
             (i.e. one pool for each user). 
             Another useful value is <em>group.name</em> to create a
             pool per Unix group.
-            Finally, a common setting is to use a non-standard property
-            such as <em>pool.name</em> as the pool name property, and make it
-            default to <em>user.name</em> through the following setting:<br/>
-            <code>&lt;property&gt;</code><br/> 
-            <code>&nbsp;&nbsp;&lt;name&gt;pool.name&lt;/name&gt;</code><br/>
-            <code>&nbsp;&nbsp;&lt;value&gt;${user.name}&lt;/value&gt;</code><br/>
-            <code>&lt;/property&gt;</code><br/>
-            This allows you to specify the pool name explicitly for some jobs
-            through the jobconf (e.g. passing <em>-Dpool.name=&lt;name&gt;</em>
-            to <em>bin/hadoop jar</em>, while having the default be the user's
-            pool.
+            mapred.fairscheduler. poolnameproperty is used only for jobs in which 
+            mapred.fairscheduler.pool is not explicitly set.
           </td>
           </tr>
           <tr>
@@ -195,7 +195,8 @@
           </td>
           </tr>
         </table>
-        <p><strong>Advanced Parameters:</strong></p>
+        <p> <br></br></p>
+        <p><strong>Advanced Parameters</strong> </p>
         <table>
           <tr>
           <th>Name</th><th>Description</th>
@@ -399,7 +400,7 @@
     &lt;minReduces&gt;5&lt;/minReduces&gt;
     &lt;minSharePreemptionTimeout&gt;300&lt;/minSharePreemptionTimeout&gt;
   &lt;/pool&gt;
-  &lt;user name="sample_user"&gt;
+  &lt;mapreduce.job.mapreduce.job.user.name="sample_user"&gt;
     &lt;maxRunningJobs&gt;6&lt;/maxRunningJobs&gt;
   &lt;/user&gt;
   &lt;userMaxJobsDefault&gt;3&lt;/userMaxJobsDefault&gt;
@@ -532,7 +533,7 @@
      implementing a "shortest job first" policy which reduces response
      times for interactive jobs even further.
      These extension points are listed in
-     <a href="#Advanced+Parameters">advanced mapred-site.xml properties</a>.
+     <a href="#Scheduler+Parameters+in+mapred-site.xml">Advanced Parameters</a>.
      </p>
     </section>
     -->

Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/hadoop_archives.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/hadoop_archives.xml?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/hadoop_archives.xml (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/hadoop_archives.xml Sat Nov 28 20:26:01 2009
@@ -18,11 +18,11 @@
 <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
 <document>
         <header>
-        <title>Archives Guide</title>
+        <title>Hadoop Archives Guide</title>
         </header>
         <body>
         <section>
-        <title> What are Hadoop archives? </title>
+        <title>Overview</title>
         <p>
         Hadoop archives are special format archives. A Hadoop archive
         maps to a file system directory. A Hadoop archive always has a *.har
@@ -32,8 +32,9 @@
         within the part files. 
         </p>
         </section>
+        
         <section>
-        <title> How to create an archive? </title>
+        <title> How to Create an Archive </title>
         <p>
         <code>Usage: hadoop archive -archiveName name &lt;src&gt;* &lt;dest&gt;</code>
         </p>
@@ -42,8 +43,8 @@
         An example would be foo.har. The name should have a *.har extension. 
         The inputs are file system pathnames which work as usual with regular
         expressions. The destination directory would contain the archive.
-        Note that this is a Map/Reduce job that creates the archives. You would
-        need a map reduce cluster to run this. The following is an example:</p>
+        Note that this is a MapReduce job that creates the archives. You would
+        need a MapReduce cluster to run this. The following is an example:</p>
         <p>
         <code>hadoop archive -archiveName foo.har /user/hadoop/dir1 /user/hadoop/dir2 /user/zoo/</code>
         </p><p>
@@ -52,28 +53,29 @@
         The sources are not changed or removed when an archive is created.
         </p>
         </section>
+        
         <section>
-        <title> How to look up files in archives? </title>
+        <title> How to Look Up Files in Archives </title>
         <p>
         The archive exposes itself as a file system layer. So all the fs shell
         commands in the archives work but with a different URI. Also, note that
-        archives are immutable. So, rename's, deletes and creates return
-        an error. URI for Hadoop Archives is 
+        archives are immutable. So, rename, delete and create will return
+        an error. The URI for Hadoop Archives is:
         </p><p><code>har://scheme-hostname:port/archivepath/fileinarchive</code></p><p>
         If no scheme is provided it assumes the underlying filesystem. 
-        In that case the URI would look like 
+        In that case the URI would look like this:
         </p><p><code>
         har:///archivepath/fileinarchive</code></p>
         <p>
         Here is an example of archive. The input to the archives is /dir. The directory dir contains 
-        files filea, fileb. To archive /dir to /user/hadoop/foo.har, the command is 
+        files filea, fileb. To archive /dir to /user/hadoop/foo.har, the command is: 
         </p>
         <p><code>hadoop archive -archiveName foo.har /dir /user/hadoop</code>
         </p><p>
-        To get file listing for files in the created archive 
+        To get file listing for files in the created archive: 
         </p>
         <p><code>hadoop dfs -lsr har:///user/hadoop/foo.har</code></p>
-        <p>To cat filea in archive -
+        <p>To cat filea in archive:
         </p><p><code>hadoop dfs -cat har:///user/hadoop/foo.har/dir/filea</code></p>
         </section>
 	</body>

Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/index.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/index.xml?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/index.xml (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/index.xml Sat Nov 28 20:26:01 2009
@@ -26,11 +26,22 @@
   
   <body>
   <p>
-  The Hadoop Documentation provides the information you need to get started using Hadoop, the Hadoop Distributed File System (HDFS), and Hadoop on Demand (HOD).
-  </p><p>
-Begin with the <a href="quickstart.html">Hadoop Quick Start</a> which shows you how to set up a single-node Hadoop installation. Then move on to the <a href="cluster_setup.html">Hadoop Cluster Setup</a> to learn how to set up a multi-node Hadoop installation. Once your Hadoop installation is in place, try out the <a href="mapred_tutorial.html">Hadoop Map/Reduce Tutorial</a>. 
-  </p><p>
-If you have more questions, you can ask on the <a href="ext:lists">Hadoop Core Mailing Lists</a> or browse the <a href="ext:archive">Mailing List Archives</a>.
+  The Hadoop MapReduce Documentation provides the information you need to get started writing MapReduce applications. 
+  Begin with the <a href="mapred_tutorial.html">MapReduce Tutorial</a> which shows you how to write MapReduce applications using Java. 
+  To write MapReduce applications in languages other than Java see <a href="streaming.html">Hadoop Streaming</a>, a utility that allows you to create
+  and run jobs with any executable as the mapper or reducer.
+  </p>
+  
+  <p>
+   MapReduce works in tandem with a cluster environment and a distributed file system. 
+   For information about Hadoop clusters (single or multi node) see the 
+ <a href="http://hadoop.apache.org/common/docs/current/index.html">Hadoop Common Documentation</a>.
+   For information about the Hadoop Distributed File System (HDFS) see the 
+ <a href="http://hadoop.apache.org/hdfs/docs/current/index.html">HDFS Documentation</a>.
+  </p>  
+  
+  <p>
+If you have more questions, you can ask on the <a href="ext:lists">Hadoop MapReduce Mailing Lists</a> or browse the <a href="ext:archive">Mailing List Archives</a>.
     </p>
   </body>
   



Mime
View raw message