hadoop-mapreduce-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ste...@apache.org
Subject svn commit: r885145 [12/34] - in /hadoop/mapreduce/branches/MAPREDUCE-233: ./ .eclipse.templates/ .eclipse.templates/.launches/ conf/ ivy/ lib/ src/benchmarks/gridmix/ src/benchmarks/gridmix/pipesort/ src/benchmarks/gridmix2/ src/benchmarks/gridmix2/sr...
Date Sat, 28 Nov 2009 20:26:22 GMT
Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml Sat Nov 28 20:26:01 2009
@@ -21,7 +21,7 @@
 <document>
   
   <header>
-    <title>Map/Reduce Tutorial</title>
+    <title>MapReduce Tutorial</title>
   </header>
   
   <body>
@@ -30,22 +30,21 @@
       <title>Purpose</title>
       
       <p>This document comprehensively describes all user-facing facets of the 
-      Hadoop Map/Reduce framework and serves as a tutorial.
+      Hadoop MapReduce framework and serves as a tutorial.
       </p>
     </section>
     
     <section>
-      <title>Pre-requisites</title>
+      <title>Prerequisites</title>
       
-      <p>Ensure that Hadoop is installed, configured and is running. More
-      details:</p> 
+      <p>Make sure Hadoop is installed, configured and running. See these guides:
+      </p> 
       <ul>
         <li>
-          <a href="quickstart.html">Hadoop Quick Start</a> for first-time users.
+          <a href="http://hadoop.apache.org/common/docs/current/single_node_setup.html">Single Node Setup</a> for first-time users.
         </li>
         <li>
-          <a href="cluster_setup.html">Hadoop Cluster Setup</a> for large, 
-          distributed clusters.
+          <a href="http://hadoop.apache.org/common/docs/current/cluster_setup.html">Cluster Setup</a> for large, distributed clusters.
         </li>
       </ul>
     </section>
@@ -53,12 +52,12 @@
     <section>
       <title>Overview</title>
       
-      <p>Hadoop Map/Reduce is a software framework for easily writing 
+      <p>Hadoop MapReduce is a software framework for easily writing 
       applications which process vast amounts of data (multi-terabyte data-sets) 
       in-parallel on large clusters (thousands of nodes) of commodity 
       hardware in a reliable, fault-tolerant manner.</p>
       
-      <p>A Map/Reduce <em>job</em> usually splits the input data-set into 
+      <p>A MapReduce <em>job</em> usually splits the input data-set into 
       independent chunks which are processed by the <em>map tasks</em> in a
       completely parallel manner. The framework sorts the outputs of the maps, 
       which are then input to the <em>reduce tasks</em>. Typically both the 
@@ -67,13 +66,14 @@
       tasks.</p>
       
       <p>Typically the compute nodes and the storage nodes are the same, that is, 
-      the Map/Reduce framework and the Hadoop Distributed File System (see <a href="hdfs_design.html">HDFS Architecture </a>) 
+      the MapReduce framework and the 
+      <a href="http://hadoop.apache.org/hdfs/docs/current/index.html">Hadoop Distributed File System</a> (HDFS) 
       are running on the same set of nodes. This configuration
       allows the framework to effectively schedule tasks on the nodes where data 
       is already present, resulting in very high aggregate bandwidth across the 
       cluster.</p>
       
-      <p>The Map/Reduce framework consists of a single master 
+      <p>The MapReduce framework consists of a single master 
       <code>JobTracker</code> and one slave <code>TaskTracker</code> per 
       cluster-node. The master is responsible for scheduling the jobs' component 
       tasks on the slaves, monitoring them and re-executing the failed tasks. The 
@@ -90,7 +90,7 @@
       information to the job-client.</p>
       
       <p>Although the Hadoop framework is implemented in Java<sup>TM</sup>, 
-      Map/Reduce applications need not be written in Java.</p>
+      MapReduce applications need not be written in Java.</p>
       <ul>
         <li>
           <a href="ext:api/org/apache/hadoop/streaming/package-summary">
@@ -101,7 +101,7 @@
         <li>
           <a href="ext:api/org/apache/hadoop/mapred/pipes/package-summary">
           Hadoop Pipes</a> is a <a href="http://www.swig.org/">SWIG</a>-
-          compatible <em>C++ API</em> to implement Map/Reduce applications (non 
+          compatible <em>C++ API</em> to implement MapReduce applications (non 
           JNI<sup>TM</sup> based).
         </li>
       </ul>
@@ -110,7 +110,7 @@
     <section>
       <title>Inputs and Outputs</title>
 
-      <p>The Map/Reduce framework operates exclusively on 
+      <p>The MapReduce framework operates exclusively on 
       <code>&lt;key, value&gt;</code> pairs, that is, the framework views the 
       input to the job as a set of <code>&lt;key, value&gt;</code> pairs and 
       produces a set of <code>&lt;key, value&gt;</code> pairs as the output of 
@@ -124,7 +124,7 @@
       WritableComparable</a> interface to facilitate sorting by the framework.
       </p>
 
-      <p>Input and Output types of a Map/Reduce job:</p>
+      <p>Input and Output types of a MapReduce job:</p>
       <p>
         (input) <code>&lt;k1, v1&gt;</code> 
         -&gt; 
@@ -145,14 +145,16 @@
     <section>
       <title>Example: WordCount v1.0</title>
       
-      <p>Before we jump into the details, lets walk through an example Map/Reduce 
+      <p>Before we jump into the details, lets walk through an example MapReduce 
       application to get a flavour for how they work.</p>
       
       <p><code>WordCount</code> is a simple application that counts the number of
       occurences of each word in a given input set.</p>
       
-      <p>This works with a local-standalone, pseudo-distributed or fully-distributed 
-      Hadoop installation(see <a href="quickstart.html"> Hadoop Quick Start</a>).</p>
+      <p>This example works with a 
+      pseudo-distributed (<a href="http://hadoop.apache.org/common/docs/current/single_node_setup.html#SingleNodeSetup">Single Node Setup</a>) 
+     or fully-distributed (<a href="http://hadoop.apache.org/common/docs/current/cluster_setup.html">Cluster Setup</a>) 
+      Hadoop installation.</p>   
       
       <section>
         <title>Source Code</title>
@@ -605,17 +607,35 @@
         would be present in the current working directory of the task 
         using the option <code>-files</code>. The <code>-libjars</code>
         option allows applications to add jars to the classpaths of the maps
-        and reduces. The <code>-archives</code> allows them to pass archives
-        as arguments that are unzipped/unjarred and a link with name of the
-        jar/zip are created in the current working directory of tasks. More
+        and reduces. The option <code>-archives</code> allows them to pass 
+        comma separated list of archives as arguments. These archives are 
+        unarchived and a link with name of the archive is created in 
+        the current working directory of tasks. More
         details about the command line options are available at 
-        <a href="commands_manual.html"> Hadoop Command Guide.</a></p>
+        <a href="commands_manual.html"> Hadoop Commands Guide.</a></p>
         
         <p>Running <code>wordcount</code> example with 
-        <code>-libjars</code> and <code>-files</code>:<br/>
+        <code>-libjars</code>, <code>-files</code> and <code>-archives</code>:
+        <br/>
         <code> hadoop jar hadoop-examples.jar wordcount -files cachefile.txt 
-        -libjars mylib.jar input output </code> 
-        </p>
+        -libjars mylib.jar -archives myarchive.zip input output </code> 
+         Here, myarchive.zip will be placed and unzipped into a directory 
+         by the name "myarchive.zip"
+        </p>
+        
+        <p> Users can specify a different symbolic name for 
+        files and archives passed through -files and -archives option, using #.
+        </p>
+        
+        <p> For example,
+        <code> hadoop jar hadoop-examples.jar wordcount 
+        -files dir1/dict.txt#dict1,dir2/dict.txt#dict2 
+        -archives mytar.tgz#tgzdir input output </code>
+        Here, the files dir1/dict.txt and dir2/dict.txt can be accessed by 
+        tasks using the symbolic names dict1 and dict2 respectively.
+        And the archive mytar.tgz will be placed and unarchived into a 
+        directory by the name tgzdir
+        </p> 
       </section>
       
       <section>
@@ -697,10 +717,10 @@
     </section>
     
     <section>
-      <title>Map/Reduce - User Interfaces</title>
+      <title>MapReduce - User Interfaces</title>
       
       <p>This section provides a reasonable amount of detail on every user-facing 
-      aspect of the Map/Reduce framwork. This should help users implement, 
+      aspect of the MapReduce framwork. This should help users implement, 
       configure and tune their jobs in a fine-grained manner. However, please 
       note that the javadoc for each class/interface remains the most 
       comprehensive documentation available; this is only meant to be a tutorial.
@@ -739,7 +759,7 @@
           to be of the same type as the input records. A given input pair may 
           map to zero or many output pairs.</p> 
  
-          <p>The Hadoop Map/Reduce framework spawns one map task for each 
+          <p>The Hadoop MapReduce framework spawns one map task for each 
           <code>InputSplit</code> generated by the <code>InputFormat</code> for 
           the job.</p>
           
@@ -898,7 +918,7 @@
  
             <p>The right number of reduces seems to be <code>0.95</code> or 
             <code>1.75</code> multiplied by (&lt;<em>no. of nodes</em>&gt; * 
-            <code>mapred.tasktracker.reduce.tasks.maximum</code>).</p>
+            <code>mapreduce.tasktracker.reduce.tasks.maximum</code>).</p>
  
             <p>With <code>0.95</code> all of the reduces can launch immediately 
             and start transfering map outputs as the maps finish. With 
@@ -950,7 +970,7 @@
           <title>Reporter</title>
         
           <p><a href="ext:api/org/apache/hadoop/mapred/reporter">
-          Reporter</a> is a facility for Map/Reduce applications to report 
+          Reporter</a> is a facility for MapReduce applications to report 
           progress, set application-level status messages and update 
           <code>Counters</code>.</p>
  
@@ -960,7 +980,7 @@
           significant amount of time to process individual key/value pairs, 
           this is crucial since the framework might assume that the task has 
           timed-out and kill that task. Another way to avoid this is to 
-          set the configuration parameter <code>mapred.task.timeout</code> to a
+          set the configuration parameter <code>mapreduce.task.timeout</code> to a
           high-enough value (or even set it to <em>zero</em> for no time-outs).
           </p>
 
@@ -973,12 +993,12 @@
         
           <p><a href="ext:api/org/apache/hadoop/mapred/outputcollector">
           OutputCollector</a> is a generalization of the facility provided by
-          the Map/Reduce framework to collect data output by the 
+          the MapReduce framework to collect data output by the 
           <code>Mapper</code> or the <code>Reducer</code> (either the 
           intermediate outputs or the output of the job).</p>
         </section>
       
-        <p>Hadoop Map/Reduce comes bundled with a 
+        <p>Hadoop MapReduce comes bundled with a 
         <a href="ext:api/org/apache/hadoop/mapred/lib/package-summary">
         library</a> of generally useful mappers, reducers, and partitioners.</p>
       </section>
@@ -987,10 +1007,10 @@
         <title>Job Configuration</title>
         
         <p><a href="ext:api/org/apache/hadoop/mapred/jobconf">
-        JobConf</a> represents a Map/Reduce job configuration.</p>
+        JobConf</a> represents a MapReduce job configuration.</p>
  
         <p><code>JobConf</code> is the primary interface for a user to describe
-        a Map/Reduce job to the Hadoop framework for execution. The framework 
+        a MapReduce job to the Hadoop framework for execution. The framework 
         tries to faithfully execute the job as described by <code>JobConf</code>, 
         however:</p> 
         <ul>
@@ -1058,7 +1078,7 @@
         <code>-Djava.library.path=&lt;&gt;</code> etc. If the 
         <code>mapred.{map|reduce}.child.java.opts</code> parameters contains the 
         symbol <em>@taskid@</em> it is interpolated with value of 
-        <code>taskid</code> of the map/reduce task.</p>
+        <code>taskid</code> of the MapReduce task.</p>
         
         <p>Here is an example with multiple arguments and substitutions, 
         showing jvm GC logging, and start of a passwordless JVM JMX agent so that
@@ -1070,7 +1090,7 @@
 
         <p>
           <code>&lt;property&gt;</code><br/>
-          &nbsp;&nbsp;<code>&lt;name&gt;mapred.map.child.java.opts&lt;/name&gt;</code><br/>
+          &nbsp;&nbsp;<code>&lt;name&gt;mapreduce.map.java.opts&lt;/name&gt;</code><br/>
           &nbsp;&nbsp;<code>&lt;value&gt;</code><br/>
           &nbsp;&nbsp;&nbsp;&nbsp;<code>
                     -Xmx512M -Djava.library.path=/home/mycompany/lib
@@ -1084,7 +1104,7 @@
         
         <p>
           <code>&lt;property&gt;</code><br/>
-          &nbsp;&nbsp;<code>&lt;name&gt;mapred.reduce.child.java.opts&lt;/name&gt;</code><br/>
+          &nbsp;&nbsp;<code>&lt;name&gt;mapreduce.reduce.java.opts&lt;/name&gt;</code><br/>
           &nbsp;&nbsp;<code>&lt;value&gt;</code><br/>
           &nbsp;&nbsp;&nbsp;&nbsp;<code>
                     -Xmx1024M -Djava.library.path=/home/mycompany/lib
@@ -1109,9 +1129,9 @@
         
         <p>Note: <code>mapred.{map|reduce}.child.java.opts</code> are used only 
         for configuring the launched child tasks from task tracker. Configuring 
-        the memory options for daemons is documented in 
-        <a href="cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons">
-        cluster_setup.html </a></p>
+        the memory options for daemons is documented under
+        <a href="http://hadoop.apache.org/common/docs/current/cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons">
+        Configuring the Environment of the Hadoop Daemons</a> (Cluster Setup).</p>
         
         <p>The memory available to some parts of the framework is also
         configurable. In map and reduce tasks, performance may be influenced
@@ -1157,26 +1177,28 @@
 
           <table>
             <tr><th>Name</th><th>Type</th><th>Description</th></tr>
-            <tr><td>io.sort.mb</td><td>int</td>
+            <tr><td>mapreduce.task.io.sort.mb</td><td>int</td>
                 <td>The cumulative size of the serialization and accounting
                 buffers storing records emitted from the map, in megabytes.
                 </td></tr>
-            <tr><td>io.sort.record.percent</td><td>float</td>
+            <tr><td>mapreduce.map.sort.record.percent</td><td>float</td>
                 <td>The ratio of serialization to accounting space can be
                 adjusted. Each serialized record requires 16 bytes of
                 accounting information in addition to its serialized size to
                 effect the sort. This percentage of space allocated from
-                <code>io.sort.mb</code> affects the probability of a spill to
+                <code>mapreduce.task.io.sort.mb</code> affects the 
+                probability of a spill to
                 disk being caused by either exhaustion of the serialization
                 buffer or the accounting space. Clearly, for a map outputting
                 small records, a higher value than the default will likely
                 decrease the number of spills to disk.</td></tr>
-            <tr><td>io.sort.spill.percent</td><td>float</td>
+            <tr><td>mapreduce.map.sort.spill.percent</td><td>float</td>
                 <td>This is the threshold for the accounting and serialization
                 buffers. When this percentage of either buffer has filled,
                 their contents will be spilled to disk in the background. Let
-                <code>io.sort.record.percent</code> be <em>r</em>,
-                <code>io.sort.mb</code> be <em>x</em>, and this value be
+                <code>mapreduce.map.sort.record.percent</code> be <em>r</em>,
+                <code>mapreduce.task.io.sort.mb</code> be <em>x</em>, 
+                and this value be
                 <em>q</em>. The maximum number of records collected before the
                 collection thread will spill is <code>r * x * q * 2^16</code>.
                 Note that a higher value may decrease the number of- or even
@@ -1216,7 +1238,7 @@
 
           <table>
             <tr><th>Name</th><th>Type</th><th>Description</th></tr>
-            <tr><td>io.sort.factor</td><td>int</td>
+            <tr><td>mapreduce.task.io.sort.factor</td><td>int</td>
                 <td>Specifies the number of segments on disk to be merged at
                 the same time. It limits the number of open files and
                 compression codecs during the merge. If the number of files
@@ -1224,7 +1246,7 @@
                 Though this limit also applies to the map, most jobs should be
                 configured so that hitting this limit is unlikely
                 there.</td></tr>
-            <tr><td>mapred.inmem.merge.threshold</td><td>int</td>
+            <tr><td>mapreduce.reduce.merge.inmem.threshold</td><td>int</td>
                 <td>The number of sorted map outputs fetched into memory
                 before being merged to disk. Like the spill thresholds in the
                 preceding note, this is not defining a unit of partition, but
@@ -1233,7 +1255,7 @@
                 less expensive than merging from disk (see notes following
                 this table). This threshold influences only the frequency of
                 in-memory merges during the shuffle.</td></tr>
-            <tr><td>mapred.job.shuffle.merge.percent</td><td>float</td>
+            <tr><td>mapreduce.reduce.shuffle.merge.percent</td><td>float</td>
                 <td>The memory threshold for fetched map outputs before an
                 in-memory merge is started, expressed as a percentage of
                 memory allocated to storing map outputs in memory. Since map
@@ -1243,14 +1265,14 @@
                 reduces whose input can fit entirely in memory. This parameter
                 influences only the frequency of in-memory merges during the
                 shuffle.</td></tr>
-            <tr><td>mapred.job.shuffle.input.buffer.percent</td><td>float</td>
+            <tr><td>mapreduce.reduce.shuffle.input.buffer.percent</td><td>float</td>
                 <td>The percentage of memory- relative to the maximum heapsize
-                as typically specified in <code>mapred.reduce.child.java.opts</code>-
+                as typically specified in <code>mapreduce.reduce.java.opts</code>-
                 that can be allocated to storing map outputs during the
                 shuffle. Though some memory should be set aside for the
                 framework, in general it is advantageous to set this high
                 enough to store large and numerous map outputs.</td></tr>
-            <tr><td>mapred.job.reduce.input.buffer.percent</td><td>float</td>
+            <tr><td>mapreduce.reduce.input.buffer.percent</td><td>float</td>
                 <td>The percentage of memory relative to the maximum heapsize
                 in which map outputs may be retained during the reduce. When
                 the reduce begins, map outputs will be merged to disk until
@@ -1275,7 +1297,8 @@
             than aggressively increasing buffer sizes.</li>
             <li>When merging in-memory map outputs to disk to begin the
             reduce, if an intermediate merge is necessary because there are
-            segments to spill and at least <code>io.sort.factor</code>
+            segments to spill and at least 
+            <code>mapreduce.task.io.sort.factor</code>
             segments already on disk, the in-memory map outputs will be part
             of the intermediate merge.</li>
           </ul>
@@ -1285,7 +1308,7 @@
         <section>
         <title> Directory Structure </title>
         <p>The task tracker has local directory,
-        <code> ${mapred.local.dir}/taskTracker/</code> to create localized
+        <code> ${mapreduce.cluster.local.dir}/taskTracker/</code> to create localized
         cache and localized job. It can define multiple local directories 
         (spanning multiple disks) and then each filename is assigned to a
         semi-random local directory. When the job starts, task tracker 
@@ -1293,24 +1316,24 @@
         specified in the configuration. Thus the task tracker directory 
         structure looks the following: </p>         
         <ul>
-        <li><code>${mapred.local.dir}/taskTracker/archive/</code> :
+        <li><code>${mapreduce.cluster.local.dir}/taskTracker/archive/</code> :
         The distributed cache. This directory holds the localized distributed
         cache. Thus localized distributed cache is shared among all
         the tasks and jobs </li>
-        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/</code> :
+        <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/</code> :
         The localized job directory 
         <ul>
-        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/work/</code> 
+        <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/work/</code> 
         : The job-specific shared directory. The tasks can use this space as 
         scratch space and share files among them. This directory is exposed
         to the users through the configuration property  
-        <code>job.local.dir</code>. The directory can accessed through 
+        <code>mapreduce.job.local.dir</code>. The directory can accessed through 
         api <a href="ext:api/org/apache/hadoop/mapred/jobconf/getjoblocaldir">
         JobConf.getJobLocalDir()</a>. It is available as System property also.
         So, users (streaming etc.) can call 
-        <code>System.getProperty("job.local.dir")</code> to access the 
+        <code>System.getProperty("mapreduce.job.local.dir")</code> to access the 
         directory.</li>
-        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/jars/</code>
+        <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/jars/</code>
         : The jars directory, which has the job jar file and expanded jar.
         The <code>job.jar</code> is the application's jar file that is
         automatically distributed to each machine. It is expanded in jars
@@ -1319,29 +1342,29 @@
         <a href="ext:api/org/apache/hadoop/mapred/jobconf/getjar"> 
         JobConf.getJar() </a>. To access the unjarred directory,
         JobConf.getJar().getParent() can be called.</li>
-        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/job.xml</code>
+        <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/job.xml</code>
         : The job.xml file, the generic job configuration, localized for 
         the job. </li>
-        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid</code>
+        <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/$taskid</code>
         : The task directory for each task attempt. Each task directory
         again has the following structure :
         <ul>
-        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/job.xml</code>
+        <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/$taskid/job.xml</code>
         : A job.xml file, task localized job configuration, Task localization
         means that properties have been set that are specific to
         this particular task within the job. The properties localized for 
         each task are described below.</li>
-        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/output</code>
+        <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/$taskid/output</code>
         : A directory for intermediate output files. This contains the
         temporary map reduce data generated by the framework
         such as map output files etc. </li>
-        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work</code>
+        <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/$taskid/work</code>
         : The curernt working directory of the task. 
         With <a href="#Task+JVM+Reuse">jvm reuse</a> enabled for tasks, this 
         directory will be the directory on which the jvm has started</li>
-        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work/tmp</code>
+        <li><code>${mapreduce.cluster.local.dir}/taskTracker/jobcache/$jobid/$taskid/work/tmp</code>
         : The temporary directory for the task. 
-        (User can specify the property <code>mapred.child.tmp</code> to set
+        (User can specify the property <code>mapreduce.task.tmp.dir</code> to set
         the value of temporary directory for map and reduce tasks. This 
         defaults to <code>./tmp</code>. If the value is not an absolute path,
         it is prepended with task's working directory. Otherwise, it is
@@ -1350,7 +1373,7 @@
         <code>-Djava.io.tmpdir='the absolute path of the tmp dir'</code>.
         Anp pipes and streaming are set with environment variable,
         <code>TMPDIR='the absolute path of the tmp dir'</code>). This 
-        directory is created, if <code>mapred.child.tmp</code> has the value
+        directory is created, if <code>mapreduce.task.tmp.dir</code> has the value
         <code>./tmp</code> </li>
         </ul>
         </li>
@@ -1362,7 +1385,7 @@
         <section>
         <title>Task JVM Reuse</title>
         <p>Jobs can enable task JVMs to be reused by specifying the job 
-        configuration <code>mapred.job.reuse.jvm.num.tasks</code>. If the
+        configuration <code>mapreduce.job.jvm.numtasks</code>. If the
         value is 1 (the default), then JVMs are not reused 
         (i.e. 1 task per JVM). If it is -1, there is no limit to the number
         of tasks a JVM can run (of the same job). One can also specify some
@@ -1377,26 +1400,26 @@
          for each task's execution: </p>
         <table>
           <tr><th>Name</th><th>Type</th><th>Description</th></tr>
-          <tr><td>mapred.job.id</td><td>String</td><td>The job id</td></tr>
-          <tr><td>mapred.jar</td><td>String</td>
+          <tr><td>mapreduce.job.id</td><td>String</td><td>The job id</td></tr>
+          <tr><td>mapreduce.job.jar</td><td>String</td>
               <td>job.jar location in job directory</td></tr>
-          <tr><td>job.local.dir</td><td> String</td>
+          <tr><td>mapreduce.job.local.dir</td><td> String</td>
               <td> The job specific shared scratch space</td></tr>
-          <tr><td>mapred.tip.id</td><td> String</td>
+          <tr><td>mapreduce.task.id</td><td> String</td>
               <td> The task id</td></tr>
-          <tr><td>mapred.task.id</td><td> String</td>
+          <tr><td>mapreduce.task.attempt.id</td><td> String</td>
               <td> The task attempt id</td></tr>
-          <tr><td>mapred.task.is.map</td><td> boolean </td>
+          <tr><td>mapreduce.task.ismap</td><td> boolean </td>
               <td>Is this a map task</td></tr>
-          <tr><td>mapred.task.partition</td><td> int </td>
+          <tr><td>mapreduce.task.partition</td><td> int </td>
               <td>The id of the task within the job</td></tr>
-          <tr><td>map.input.file</td><td> String</td>
+          <tr><td>mapreduce.map.input.file</td><td> String</td>
               <td> The filename that the map is reading from</td></tr>
-          <tr><td>map.input.start</td><td> long</td>
+          <tr><td>mapreduce.map.input.start</td><td> long</td>
               <td> The offset of the start of the map input split</td></tr>
-          <tr><td>map.input.length </td><td>long </td>
+          <tr><td>mapreduce.map.input.length </td><td>long </td>
               <td>The number of bytes in the map input split</td></tr>
-          <tr><td>mapred.work.output.dir</td><td> String </td>
+          <tr><td>mapreduce.task.output.dir</td><td> String </td>
               <td>The task's temporary output directory</td></tr>
         </table>
 
@@ -1404,7 +1427,7 @@
         <strong>Note:</strong>
         During the execution of a streaming job, the names of the "mapred" parameters are transformed. 
         The dots ( . ) become underscores ( _ ).
-        For example, mapred.job.id becomes mapred_job_id and mapred.jar becomes mapred_jar. 
+        For example, mapreduce.job.id becomes mapreduce.job.id and mapreduce.job.jar becomes mapreduce.job.jar. 
         To get the values in a streaming job's mapper/reducer use the parameter names with the underscores.
         </p>
         </section>
@@ -1428,9 +1451,9 @@
         System.loadLibrary</a> or 
         <a href="http://java.sun.com/javase/6/docs/api/java/lang/System.html#load(java.lang.String)">
         System.load</a>. More details on how to load shared libraries through 
-        distributed cache are documented at 
-        <a href="native_libraries.html#Loading+native+libraries+through+DistributedCache">
-        native_libraries.html</a></p>
+        distributed cache are documented under 
+        <a href="http://hadoop.apache.org/common/docs/current/native_libraries.html#Loading+Native+Libraries+Through+DistributedCache">
+        Building Native Hadoop Libraries</a>.</p>
         </section>
       </section>
       
@@ -1442,7 +1465,7 @@
         with the <code>JobTracker</code>.</p>
  
         <p><code>JobClient</code> provides facilities to submit jobs, track their 
-        progress, access component-tasks' reports and logs, get the Map/Reduce 
+        progress, access component-tasks' reports and logs, get the MapReduce 
         cluster's status information and so on.</p>
  
         <p>The job submission process involves:</p>
@@ -1454,7 +1477,7 @@
             <code>DistributedCache</code> of the job, if necessary.
           </li>
           <li>
-            Copying the job's jar and configuration to the Map/Reduce system 
+            Copying the job's jar and configuration to the MapReduce system 
             directory on the <code>FileSystem</code>.
           </li>
           <li>
@@ -1462,23 +1485,16 @@
             monitoring it's status.
           </li>
         </ol>
-        <p> Job history files are also logged to user specified directory
-        <code>hadoop.job.history.user.location</code> 
-        which defaults to job output directory. The files are stored in
-        "_logs/history/" in the specified directory. Hence, by default they
-        will be in mapred.output.dir/_logs/history. User can stop
-        logging by giving the value <code>none</code> for 
-        <code>hadoop.job.history.user.location</code></p>
 
-        <p> User can view the history logs summary in specified directory 
+        <p> User can view the history log summary for a given history file
         using the following command <br/>
-        <code>$ bin/hadoop job -history output-dir</code><br/> 
+        <code>$ bin/hadoop job -history history-file</code><br/> 
         This command will print job details, failed and killed tip
         details. <br/>
         More details about the job such as successful tasks and 
         task attempts made for each task can be viewed using the  
         following command <br/>
-       <code>$ bin/hadoop job -history all output-dir</code><br/></p> 
+       <code>$ bin/hadoop job -history all history-file</code><br/></p> 
             
         <p> User can use 
         <a href="ext:api/org/apache/hadoop/mapred/outputlogfilter">OutputLogFilter</a>
@@ -1491,8 +1507,8 @@
         <section>
           <title>Job Control</title>
  
-          <p>Users may need to chain Map/Reduce jobs to accomplish complex
-          tasks which cannot be done via a single Map/Reduce job. This is fairly
+          <p>Users may need to chain MapReduce jobs to accomplish complex
+          tasks which cannot be done via a single MapReduce job. This is fairly
           easy since the output of the job typically goes to distributed 
           file-system, and the output, in turn, can be used as the input for the 
           next job.</p>
@@ -1526,10 +1542,10 @@
         <title>Job Input</title>
         
         <p><a href="ext:api/org/apache/hadoop/mapred/inputformat">
-        InputFormat</a> describes the input-specification for a Map/Reduce job.
+        InputFormat</a> describes the input-specification for a MapReduce job.
         </p> 
  
-        <p>The Map/Reduce framework relies on the <code>InputFormat</code> of 
+        <p>The MapReduce framework relies on the <code>InputFormat</code> of 
         the job to:</p>
         <ol>
           <li>Validate the input-specification of the job.</li>
@@ -1552,7 +1568,7 @@
         <code>InputSplit</code> instances based on the total size, in bytes, of 
         the input files. However, the <code>FileSystem</code> blocksize of the 
         input files is treated as an upper bound for input splits. A lower bound
-        on the split size can be set via <code>mapred.min.split.size</code>.</p>
+        on the split size can be set via <code>mapreduce.input.fileinputformat.split.minsize</code>.</p>
  
         <p>Clearly, logical splits based on input-size is insufficient for many
         applications since record boundaries must be respected. In such cases, 
@@ -1584,7 +1600,7 @@
           
           <p><a href="ext:api/org/apache/hadoop/mapred/filesplit">
           FileSplit</a> is the default <code>InputSplit</code>. It sets 
-          <code>map.input.file</code> to the path of the input file for the
+          <code>mapreduce.map.input.file</code> to the path of the input file for the
           logical split.</p>
         </section>
         
@@ -1608,10 +1624,10 @@
         <title>Job Output</title>
         
         <p><a href="ext:api/org/apache/hadoop/mapred/outputformat">
-        OutputFormat</a> describes the output-specification for a Map/Reduce 
+        OutputFormat</a> describes the output-specification for a MapReduce 
         job.</p>
 
-        <p>The Map/Reduce framework relies on the <code>OutputFormat</code> of 
+        <p>The MapReduce framework relies on the <code>OutputFormat</code> of 
         the job to:</p>
         <ol>
           <li>
@@ -1652,9 +1668,9 @@
         
         <p><a href="ext:api/org/apache/hadoop/mapred/outputcommitter">
         OutputCommitter</a> describes the commit of task output for a 
-        Map/Reduce job.</p>
+        MapReduce job.</p>
 
-        <p>The Map/Reduce framework relies on the <code>OutputCommitter</code>
+        <p>The MapReduce framework relies on the <code>OutputCommitter</code>
         of the job to:</p>
         <ol>
           <li>
@@ -1712,34 +1728,34 @@
           (using the attemptid, say <code>attempt_200709221812_0001_m_000000_0</code>), 
           not just per task.</p> 
  
-          <p>To avoid these issues the Map/Reduce framework, when the 
+          <p>To avoid these issues the MapReduce framework, when the 
           <code>OutputCommitter</code> is <code>FileOutputCommitter</code>, 
           maintains a special 
-          <code>${mapred.output.dir}/_temporary/_${taskid}</code> sub-directory
-          accessible via <code>${mapred.work.output.dir}</code>
+          <code>${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid}</code> sub-directory
+          accessible via <code>${mapreduce.task.output.dir}</code>
           for each task-attempt on the <code>FileSystem</code> where the output
           of the task-attempt is stored. On successful completion of the 
           task-attempt, the files in the 
-          <code>${mapred.output.dir}/_temporary/_${taskid}</code> (only) 
-          are <em>promoted</em> to <code>${mapred.output.dir}</code>. Of course, 
+          <code>${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid}</code> (only) 
+          are <em>promoted</em> to <code>${mapreduce.output.fileoutputformat.outputdir}</code>. Of course, 
           the framework discards the sub-directory of unsuccessful task-attempts. 
           This process is completely transparent to the application.</p>
  
           <p>The application-writer can take advantage of this feature by 
-          creating any side-files required in <code>${mapred.work.output.dir}</code>
+          creating any side-files required in <code>${mapreduce.task.output.dir}</code>
           during execution of a task via 
           <a href="ext:api/org/apache/hadoop/mapred/fileoutputformat/getworkoutputpath">
           FileOutputFormat.getWorkOutputPath()</a>, and the framework will promote them 
           similarly for succesful task-attempts, thus eliminating the need to 
           pick unique paths per task-attempt.</p>
           
-          <p>Note: The value of <code>${mapred.work.output.dir}</code> during 
+          <p>Note: The value of <code>${mapreduce.task.output.dir}</code> during 
           execution of a particular task-attempt is actually 
-          <code>${mapred.output.dir}/_temporary/_{$taskid}</code>, and this value is 
-          set by the Map/Reduce framework. So, just create any side-files in the 
+          <code>${mapreduce.output.fileoutputformat.outputdir}/_temporary/_{$taskid}</code>, and this value is 
+          set by the MapReduce framework. So, just create any side-files in the 
           path  returned by
           <a href="ext:api/org/apache/hadoop/mapred/fileoutputformat/getworkoutputpath">
-          FileOutputFormat.getWorkOutputPath() </a>from Map/Reduce 
+          FileOutputFormat.getWorkOutputPath() </a>from MapReduce 
           task to take advantage of this feature.</p>
           
           <p>The entire discussion holds true for maps of jobs with 
@@ -1778,7 +1794,7 @@
           support multiple queues.</p>
           
           <p>A job defines the queue it needs to be submitted to through the
-          <code>mapred.job.queue.name</code> property, or through the
+          <code>mapreduce.job.queuename</code> property, or through the
           <a href="ext:api/org/apache/hadoop/mapred/jobconf/setqueuename">setQueueName(String)</a>
           API. Setting the queue name is optional. If a job is submitted 
           without an associated queue name, it is submitted to the 'default' 
@@ -1788,7 +1804,7 @@
           <title>Counters</title>
           
           <p><code>Counters</code> represent global counters, defined either by 
-          the Map/Reduce framework or applications. Each <code>Counter</code> can 
+          the MapReduce framework or applications. Each <code>Counter</code> can 
           be of any <code>Enum</code> type. Counters of a particular 
           <code>Enum</code> are bunched into groups of type 
           <code>Counters.Group</code>.</p>
@@ -1812,7 +1828,7 @@
           files efficiently.</p>
  
           <p><code>DistributedCache</code> is a facility provided by the 
-          Map/Reduce framework to cache files (text, archives, jars and so on) 
+          MapReduce framework to cache files (text, archives, jars and so on) 
           needed by applications.</p>
  
           <p>Applications specify the files to be cached via urls (hdfs://)
@@ -1858,7 +1874,7 @@
           directory</code> of the task via the 
           <a href="ext:api/org/apache/hadoop/filecache/distributedcache/createsymlink">
           DistributedCache.createSymlink(Configuration)</a> api. Or by setting
-          the configuration property <code>mapred.create.symlink</code>
+          the configuration property <code>mapreduce.job.cache.symlink.create</code>
           as <code>yes</code>. The DistributedCache will use the 
           <code>fragment</code> of the URI as the name of the symlink. 
           For example, the URI 
@@ -1877,10 +1893,56 @@
           can be used to cache files/jars and also add them to the 
           <em>classpath</em> of child-jvm. The same can be done by setting
           the configuration properties 
-          <code>mapred.job.classpath.{files|archives}</code>. Similarly the
+          <code>mapreduce.job.classpath.{files|archives}</code>. Similarly the
           cached files that are symlinked into the working directory of the
           task can be used to distribute native libraries and load them.</p>
           
+          <p>The <code>DistributedCache</code> tracks modification timestamps 
+          of the cache files/archives. Clearly the cache files/archives should
+          not be modified by the application or externally 
+          while the job is executing.</p>
+          
+          <p>Here is an illustrative example on how to use the 
+          <code>DistributedCache</code>:<br/>
+           // Setting up the cache for the application
+           1. Copy the requisite files to the <code>FileSystem</code>:<br/>
+            <code>$ bin/hadoop fs -copyFromLocal lookup.dat /myapp/lookup.dat</code><br/>  
+            <code>$ bin/hadoop fs -copyFromLocal map.zip /myapp/map.zip </code><br/> 
+            <code>$ bin/hadoop fs -copyFromLocal mylib.jar /myapp/mylib.jar</code><br/>
+            <code>$ bin/hadoop fs -copyFromLocal mytar.tar /myapp/mytar.tar</code><br/>
+            <code>$ bin/hadoop fs -copyFromLocal mytgz.tgz /myapp/mytgz.tgz</code><br/>
+            <code>$ bin/hadoop fs -copyFromLocal mytargz.tar.gz /myapp/mytargz.tar.gz</code><br/>
+           2. Setup the job<br/>
+            <code>Job job = new Job(conf);</code><br/>
+            <code>job.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"));</code><br/>
+            <code>job.addCacheArchive(new URI("/myapp/map.zip"));</code><br/>
+            <code>job.addFileToClassPath(new Path("/myapp/mylib.jar"));</code><br/>
+            <code>job.addCacheArchive(new URI("/myapp/mytar.tar"));</code><br/>
+            <code>job.addCacheArchive(new URI("/myapp/mytgz.tgz"));</code><br/>
+            <code>job.addCacheArchive(new URI("/myapp/mytargz.tar.gz"));</code><br/>
+      
+           3. Use the cached files in the 
+              <code>{@link org.apache.hadoop.mapreduce.Mapper}
+              or {@link org.apache.hadoop.mapreduce.Reducer}:</code><br/>
+      
+              <code>public static class MapClass extends Mapper&lt;K, V, K, V&gt; {</code><br/>
+                <code>private Path[] localArchives;</code><br/>
+                <code>private Path[] localFiles;</code><br/>
+                <code>public void setup(Context context) {</code><br/>
+                 <code>// Get the cached archives/files</code><br/>
+                 <code>localArchives = context.getLocalCacheArchives();</code><br/>
+                 <code>localFiles = context.getLocalCacheFiles();</code><br/>
+              <code>}</code><br/>
+        
+              <code>public void map(K key, V value, 
+                  Context context) throws IOException {</code><br/>
+                <code>// Use data from the cached archives/files here</code><br/>
+                <code>// ...</code><br/>
+                <code>// ...</code><br/>
+                <code>context.write(k, v);</code><br/>
+              <code>}</code><br/>
+            <code>}</code></p>
+          
         </section>
         
         <section>
@@ -1890,7 +1952,7 @@
           interface supports the handling of generic Hadoop command-line options.
           </p>
           
-          <p><code>Tool</code> is the standard for any Map/Reduce tool or 
+          <p><code>Tool</code> is the standard for any MapReduce tool or 
           application. The application should delegate the handling of 
           standard command-line options to 
           <a href="ext:api/org/apache/hadoop/util/genericoptionsparser">
@@ -1923,7 +1985,7 @@
           <title>IsolationRunner</title>
           
           <p><a href="ext:api/org/apache/hadoop/mapred/isolationrunner">
-          IsolationRunner</a> is a utility to help debug Map/Reduce programs.</p>
+          IsolationRunner</a> is a utility to help debug MapReduce programs.</p>
           
           <p>To use the <code>IsolationRunner</code>, first set 
           <code>keep.failed.tasks.files</code> to <code>true</code> 
@@ -1950,7 +2012,7 @@
           
           <p>User can specify whether the system should collect profiler
           information for some of the tasks in the job by setting the
-          configuration property <code>mapred.task.profile</code>. The
+          configuration property <code>mapreduce.task.profile</code>. The
           value can be set using the api 
           <a href="ext:api/org/apache/hadoop/mapred/jobconf/setprofileenabled">
           JobConf.setProfileEnabled(boolean)</a>. If the value is set 
@@ -1960,15 +2022,15 @@
           
           <p>Once user configures that profiling is needed, she/he can use
           the configuration property 
-          <code>mapred.task.profile.{maps|reduces}</code> to set the ranges
-          of Map/Reduce tasks to profile. The value can be set using the api 
+          <code>mapreduce.task.profile.{maps|reduces}</code> to set the ranges
+          of MapReduce tasks to profile. The value can be set using the api 
           <a href="ext:api/org/apache/hadoop/mapred/jobconf/setprofiletaskrange">
           JobConf.setProfileTaskRange(boolean,String)</a>.
           By default, the specified range is <code>0-2</code>.</p>
           
           <p>User can also specify the profiler configuration arguments by 
           setting the configuration property 
-          <code>mapred.task.profile.params</code>. The value can be specified 
+          <code>mapreduce.task.profile.params</code>. The value can be specified 
           using the api
           <a href="ext:api/org/apache/hadoop/mapred/jobconf/setprofileparams">
           JobConf.setProfileParams(String)</a>. If the string contains a 
@@ -1982,8 +2044,8 @@
         
         <section>
           <title>Debugging</title>
-          <p>The Map/Reduce framework provides a facility to run user-provided 
-          scripts for debugging. When a Map/Reduce task fails, a user can run 
+          <p>The MapReduce framework provides a facility to run user-provided 
+          scripts for debugging. When a MapReduce task fails, a user can run 
           a debug script, to process task logs for example. The script is 
           given access to the task's stdout and stderr outputs, syslog and 
           jobconf. The output from the debug script's stdout and stderr is 
@@ -2003,8 +2065,8 @@
           <section>
           <title> How to submit the script: </title>
           <p> A quick way to submit the debug script is to set values for the 
-          properties <code>mapred.map.task.debug.script</code> and 
-          <code>mapred.reduce.task.debug.script</code>, for debugging map and 
+          properties <code>mapreduce.map.debug.script</code> and 
+          <code>mapreduce.reduce.debug.script</code>, for debugging map and 
           reduce tasks respectively. These properties can also be set by using APIs 
           <a href="ext:api/org/apache/hadoop/mapred/jobconf/setmapdebugscript">
           JobConf.setMapDebugScript(String) </a> and
@@ -2016,7 +2078,7 @@
             
           <p>The arguments to the script are the task's stdout, stderr, 
           syslog and jobconf files. The debug command, run on the node where
-          the Map/Reduce task failed, is: <br/>
+          the MapReduce task failed, is: <br/>
           <code> $script $stdout $stderr $syslog $jobconf </code> </p> 
 
           <p> Pipes programs have the c++ program name as a fifth argument
@@ -2036,14 +2098,14 @@
           <title>JobControl</title>
           
           <p><a href="ext:api/org/apache/hadoop/mapred/jobcontrol/package-summary">
-          JobControl</a> is a utility which encapsulates a set of Map/Reduce jobs
+          JobControl</a> is a utility which encapsulates a set of MapReduce jobs
           and their dependencies.</p>
         </section>
         
         <section>
           <title>Data Compression</title>
           
-          <p>Hadoop Map/Reduce provides facilities for the application-writer to
+          <p>Hadoop MapReduce provides facilities for the application-writer to
           specify compression for both intermediate map-outputs and the
           job-outputs i.e. output of the reduces. It also comes bundled with
           <a href="ext:api/org/apache/hadoop/io/compress/compressioncodec">
@@ -2052,10 +2114,11 @@
           algorithm. The <a href="ext:gzip">gzip</a> file format is also
           supported.</p>
           
-          <p>Hadoop also provides native implementations of the above compression
+         <p>Hadoop also provides native implementations of the above compression
           codecs for reasons of both performance (zlib) and non-availability of
-          Java libraries. More details on their usage and availability are
-          available <a href="native_libraries.html">here</a>.</p>
+          Java libraries. For more information see the
+          <a href="http://hadoop.apache.org/common/docs/current/native_libraries.html">Native Libraries Guide</a>.</p>
+          
           
           <section>
             <title>Intermediate Outputs</title>
@@ -2172,13 +2235,13 @@
       <title>Example: WordCount v2.0</title>
       
       <p>Here is a more complete <code>WordCount</code> which uses many of the
-      features provided by the Map/Reduce framework we discussed so far.</p>
+      features provided by the MapReduce framework we discussed so far.</p>
       
-      <p>This needs the HDFS to be up and running, especially for the 
+      <p>This example needs the HDFS to be up and running, especially for the 
       <code>DistributedCache</code>-related features. Hence it only works with a 
-      <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
-      <a href="quickstart.html#Fully-Distributed+Operation">fully-distributed</a> 
-      Hadoop installation.</p>      
+      pseudo-distributed (<a href="http://hadoop.apache.org/common/docs/current/single_node_setup.html#SingleNodeSetup">Single Node Setup</a>) 
+     or fully-distributed (<a href="http://hadoop.apache.org/common/docs/current/cluster_setup.html#Fully-Distributed+Operation">Cluster Setup</a>) 
+      Hadoop installation.</p>     
       
       <section>
         <title>Source Code</title>
@@ -2367,7 +2430,7 @@
             <td>30.</td>
             <td>
               &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-              <code>inputFile = job.get("map.input.file");</code>
+              <code>inputFile = job.get("mapreduce.map.input.file");</code>
             </td>
           </tr>
           <tr>
@@ -3124,7 +3187,7 @@
         <title>Highlights</title>
         
         <p>The second version of <code>WordCount</code> improves upon the 
-        previous one by using some features offered by the Map/Reduce framework:
+        previous one by using some features offered by the MapReduce framework:
         </p>
         <ul>
           <li>

Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/site.xml?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/site.xml Sat Nov 28 20:26:01 2009
@@ -34,39 +34,23 @@
   
    <docs label="Getting Started"> 
 		<overview   				label="Overview" 					href="index.html" />
-		<quickstart 				label="Quick Start"        		href="quickstart.html" />
-		<setup     					label="Cluster Setup"      		href="cluster_setup.html" />
-		<mapred    				label="Map/Reduce Tutorial" 	href="mapred_tutorial.html" />
-  </docs>	
+		<mapred    				label="MapReduce Tutorial" 	href="mapred_tutorial.html" />
+		 <streaming 				label="Hadoop Streaming"  href="streaming.html" />
+   </docs>	
 		
- <docs label="Programming Guides">
-		<commands 				label="Commands"     					href="commands_manual.html" />
-		<distcp    					label="DistCp"       						href="distcp.html" />
-		<native_lib    				label="Native Libraries" 					href="native_libraries.html" />
-		<streaming 				label="Streaming"          				href="streaming.html" />
-		<fair_scheduler 			label="Fair Scheduler" 					href="fair_scheduler.html"/>
-		<cap_scheduler 		label="Capacity Scheduler" 			href="capacity_scheduler.html"/>
-		<SLA					 	label="Service Level Authorization" 	href="service_level_auth.html"/>
-		<vaidya    					label="Vaidya" 								href="vaidya.html"/>
-		<archives  				label="Archives"     						href="hadoop_archives.html"/>
+  <docs label="Guides">
+		<commands 				label="Hadoop Commands"  href="commands_manual.html" />
+		<distcp    					label="DistCp"       href="distcp.html" />
+		<vaidya    					label="Vaidya" 		href="vaidya.html"/>
+		<archives  				label="Hadoop Archives"     href="hadoop_archives.html"/>
+		<gridmix  				label="Gridmix"     href="gridmix.html"/>
    </docs>
    
-   <docs label="HDFS">
-		<hdfs_user      				label="User Guide"    							href="hdfs_user_guide.html" />
-		<hdfs_arch     				label="Architecture"  								href="hdfs_design.html" />	
-		<hdfs_fs       	 				label="File System Shell Guide"     		href="hdfs_shell.html" />
-		<hdfs_perm      				label="Permissions Guide"    					href="hdfs_permissions_guide.html" />
-		<hdfs_quotas     			label="Quotas Guide" 							href="hdfs_quota_admin_guide.html" />
-		<hdfs_SLG        			label="Synthetic Load Generator Guide"  href="SLG_user_guide.html" />
-		<hdfs_imageviewer						label="Offline Image Viewer Guide"	href="hdfs_imageviewer.html" />
-		<hdfs_libhdfs   				label="C API libhdfs"         						href="libhdfs.html" /> 
-   </docs> 
-   
-   <docs label="HOD">
-		<hod_user 	label="User Guide" 	href="hod_user_guide.html"/>
-		<hod_admin 	label="Admin Guide" 	href="hod_admin_guide.html"/>
-		<hod_config 	label="Config Guide" 	href="hod_config_guide.html"/> 
-   </docs> 
+    <docs label="Schedulers">
+        <cap_scheduler 		label="Capacity Scheduler"     href="capacity_scheduler.html"/>
+		<fair_scheduler 			label="Fair Scheduler"            href="fair_scheduler.html"/>
+		<cap_scheduler 		label="Hod Scheduler" 			href="hod_scheduler.html"/>
+    </docs>
    
    <docs label="Miscellaneous"> 
 		<api       	label="API Docs"           href="ext:api/index" />
@@ -78,19 +62,20 @@
    </docs> 
    
   <external-refs>
-    <site      href="http://hadoop.apache.org/core/"/>
-    <lists     href="http://hadoop.apache.org/core/mailing_lists.html"/>
-    <archive   href="http://mail-archives.apache.org/mod_mbox/hadoop-core-commits/"/>
-    <releases  href="http://hadoop.apache.org/core/releases.html">
-      <download href="#Download" />
+    <site      href="http://hadoop.apache.org/mapreduce/"/>
+    <lists     href="http://hadoop.apache.org/mapreduce/mailing_lists.html"/>
+    <archive   href="http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-commits/"/>
+    <releases  href="http://hadoop.apache.org/mapreduce/releases.html">
+           <download href="#Download" />
     </releases>
-    <jira      href="http://hadoop.apache.org/core/issue_tracking.html"/>
-    <wiki      href="http://wiki.apache.org/hadoop/" />
-    <faq       href="http://wiki.apache.org/hadoop/FAQ" />
-    <hadoop-default href="http://hadoop.apache.org/core/docs/current/hadoop-default.html" />
-    <core-default href="http://hadoop.apache.org/core/docs/current/core-default.html" />
-    <hdfs-default href="http://hadoop.apache.org/core/docs/current/hdfs-default.html" />
-    <mapred-default href="http://hadoop.apache.org/core/docs/current/mapred-default.html" />
+    <jira      href="http://hadoop.apache.org/mapreduce/issue_tracking.html"/>
+    <wiki      href="http://wiki.apache.org/hadoop/MapReduce" />
+    <faq       href="http://wiki.apache.org/hadoop/MapReduce/FAQ" />
+    
+    <common-default href="http://hadoop.apache.org/common/docs/current/common-default.html" />
+    <hdfs-default href="http://hadoop.apache.org/hdfs/docs/current/hdfs-default.html" />
+    <mapred-default href="http://hadoop.apache.org/mapreduce/docs/current/mapred-default.html" />
+    
     <zlib      href="http://www.zlib.net/" />
     <gzip      href="http://www.gzip.org/" />
     <bzip      href="http://www.bzip.org/" />

Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/streaming.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/streaming.xml?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/streaming.xml (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/streaming.xml Sat Nov 28 20:26:01 2009
@@ -30,7 +30,7 @@
 <title>Hadoop Streaming</title>
 
 <p>
-Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run Map/Reduce jobs with any executable or 
+Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run MapReduce jobs with any executable or 
 script as the mapper and/or the reducer. For example:
 </p>
 <source>
@@ -47,7 +47,7 @@
 <title>How Streaming Works </title>
 <p>
 In the above example, both the mapper and the reducer are executables that read the input from stdin (line by line) and emit the output to stdout. 
-The utility will create a Map/Reduce job, submit the job to an appropriate cluster, and monitor the progress of the job until it completes.
+The utility will create a MapReduce job, submit the job to an appropriate cluster, and monitor the progress of the job until it completes.
 </p>
 <p>
   When an executable is specified for mappers, each mapper task will launch the executable as a separate process when the mapper is initialized. 
@@ -63,7 +63,7 @@
 prefix of a line up to the first tab character is the key and the rest of the line (excluding the tab character) is the value. However, this can be customized, as discussed later.
 </p>
 <p>
-This is the basis for the communication protocol between the Map/Reduce framework and the streaming mapper/reducer.
+This is the basis for the communication protocol between the MapReduce framework and the streaming mapper/reducer.
 </p>
 <p>
 You can supply a Java class as the mapper and/or the reducer. The above example is equivalent to:
@@ -161,7 +161,7 @@
 <section>
 <title>Specifying Other Plugins for Jobs </title>
 <p>
-Just as with a normal Map/Reduce job, you can specify other plugins for a streaming job:
+Just as with a normal MapReduce job, you can specify other plugins for a streaming job:
 </p>
 <source>
    -inputformat JavaClassName
@@ -188,7 +188,7 @@
 <!-- GENERIC COMMAND OPTIONS-->
 <section>
 <title>Generic Command Options</title>
-<p>Streaming supports <a href="streaming.html#Streaming+Command+Options">streaming command options</a> as well as generic command options.
+<p>Streaming supports generic command options as well as <a href="streaming.html#Streaming+Command+Options">streaming command options</a>.
 The general command line syntax is shown below. </p>
 <p><strong>Note:</strong> Be sure to place the generic options before the streaming options, otherwise the command will fail. 
 For an example, see <a href="streaming.html#Making+Archives+Available+to+Tasks">Making Archives Available to Tasks</a>.</p>
@@ -201,7 +201,7 @@
 <tr><td> -D  property=value </td><td> Optional </td><td> Use value for given property </td></tr>
 <tr><td> -fs host:port or local </td><td> Optional </td><td> Specify a namenode </td></tr>
 <tr><td> -jt host:port or local </td><td> Optional </td><td> Specify a job tracker </td></tr>
-<tr><td> -files </td><td> Optional </td><td> Specify comma-separated files to be copied to the Map/Reduce cluster </td></tr>
+<tr><td> -files </td><td> Optional </td><td> Specify comma-separated files to be copied to the MapReduce cluster </td></tr>
 <tr><td> -libjars  </td><td> Optional </td><td> Specify comma-separated jar files to include in the classpath </td></tr>
 <tr><td> -archives </td><td> Optional </td><td> Specify comma-separated archives to be unarchived on the compute machines </td></tr>
 </table>
@@ -223,9 +223,9 @@
 To specify additional local temp directories use:
 </p>
 <source>
-   -D mapred.local.dir=/tmp/local
-   -D mapred.system.dir=/tmp/system
-   -D mapred.temp.dir=/tmp/temp
+   -D mapreduce.cluster.local.dir=/tmp/local
+   -D mapreduce.jobtracker.system.dir=/tmp/system
+   -D mapreduce.cluster.temp.dir=/tmp/temp
 </source>
 <p><strong>Note:</strong> For more details on jobconf parameters see:
 <a href="ext:mapred-default">mapred-default.html</a></p>
@@ -234,14 +234,14 @@
 <section>
 <title>Specifying Map-Only Jobs </title>
 <p>
-Often, you may want to process input data using a map function only. To do this, simply set mapred.reduce.tasks to zero. 
-The Map/Reduce framework will not create any reducer tasks. Rather, the outputs of the mapper tasks will be the final output of the job.
+Often, you may want to process input data using a map function only. To do this, simply set mapreduce.job.reduces to zero. 
+The MapReduce framework will not create any reducer tasks. Rather, the outputs of the mapper tasks will be the final output of the job.
 </p>
 <source>
-    -D mapred.reduce.tasks=0
+    -D mapreduce.job.reduces=0
 </source>
 <p>
-To be backward compatible, Hadoop Streaming also supports the "-reduce NONE" option, which is equivalent to "-D mapred.reduce.tasks=0".
+To be backward compatible, Hadoop Streaming also supports the "-reduce NONE" option, which is equivalent to "-D mapreduce.job.reduces=0".
 </p>
 </section>
 
@@ -252,7 +252,7 @@
 </p>
 <source>
 $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
-    -D mapred.reduce.tasks=2 \
+    -D mapreduce.job.reduces=2 \
     -input myInputDirs \
     -output myOutputDir \
     -mapper org.apache.hadoop.mapred.lib.IdentityMapper \
@@ -263,7 +263,7 @@
 <section>
 <title>Customizing How Lines are Split into Key/Value Pairs</title>
 <p>
-As noted earlier, when the Map/Reduce framework reads a line from the stdout of the mapper, it splits the line into a key/value pair. 
+As noted earlier, when the MapReduce framework reads a line from the stdout of the mapper, it splits the line into a key/value pair. 
 By default, the prefix of the line up to the first tab character is the key and the rest of the line (excluding the tab character) is the value.
 </p>
 <p>
@@ -290,7 +290,7 @@
 Similarly, you can use "-D stream.reduce.output.field.separator=SEP" and "-D stream.num.reduce.output.fields=NUM" to specify 
 the nth field separator in a line of the reduce outputs as the separator between the key and the value.
 </p>
-<p> Similarly, you can specify "stream.map.input.field.separator" and "stream.reduce.input.field.separator" as the input separator for Map/Reduce 
+<p> Similarly, you can specify "stream.map.input.field.separator" and "stream.reduce.input.field.separator" as the input separator for MapReduce 
 inputs. By default the separator is the tab character.</p>
 </section>
 
@@ -306,8 +306,7 @@
 <p><strong>Note:</strong>
 The -files and -archives options are generic options.
 Be sure to place the generic options before the command options, otherwise the command will fail. 
-For an example, see <a href="streaming.html#The+-archives+Option">The -archives Option</a>.
-Also see <a href="streaming.html#Other+Supported+Options">Other Supported Options</a>.
+For an example, see <a href="streaming.html#Making+Archives+Available+to+Tasks">Making Archives Available to Tasks</a>.
 </p>
 
 <section>
@@ -323,6 +322,10 @@
 <source>
 -files hdfs://host:fs_port/user/testfile.txt
 </source>
+<p> User can specify a different symlink name for -files using #. </p>
+<source>
+-files hdfs://host:fs_port/user/testfile.txt#testfile
+</source>
 <p>
 Multiple entries can be specified like this:
 </p>
@@ -343,6 +346,10 @@
 <source>
 -archives hdfs://host:fs_port/user/testfile.jar
 </source>
+<p> User can specify a different symlink name for -archives using #. </p>
+<source>
+-archives hdfs://host:fs_port/user/testfile.tgz#tgzdir
+</source>
 
 <p>
 In this example, the input.txt file has two lines specifying the names of the two files: cachedir.jar/cache.txt and cachedir.jar/cache2.txt. 
@@ -351,9 +358,9 @@
 <source>
 $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
                   -archives 'hdfs://hadoop-nn1.example.com/user/me/samples/cachefile/cachedir.jar' \  
-                  -D mapred.map.tasks=1 \
-                  -D mapred.reduce.tasks=1 \ 
-                  -D mapred.job.name="Experiment" \
+                  -D mapreduce.job.maps=1 \
+                  -D mapreduce.job.reduces=1 \ 
+                  -D mapreduce.job.name="Experiment" \
                   -input "/user/me/samples/cachefile/input.txt"  \
                   -output "/user/me/samples/cachefile/out" \  
                   -mapper "xargs cat"  \
@@ -401,7 +408,7 @@
 <p>
 Hadoop has a library class, 
 <a href="ext:api/org/apache/hadoop/mapred/lib/keyfieldbasedpartitioner">KeyFieldBasedPartitioner</a>, 
-that is useful for many applications. This class allows the Map/Reduce 
+that is useful for many applications. This class allows the MapReduce 
 framework to partition the map outputs based on certain key fields, not
 the whole keys. For example:
 </p>
@@ -409,9 +416,9 @@
 $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
     -D stream.map.output.field.separator=. \
     -D stream.num.map.output.key.fields=4 \
-    -D map.output.key.field.separator=. \
-    -D mapred.text.key.partitioner.options=-k1,2 \
-    -D mapred.reduce.tasks=12 \
+    -D mapreduce.map.output.key.field.separator=. \
+    -D mapreduce.partition.keypartitioner.options=-k1,2 \
+    -D mapreduce.job.reduces=12 \
     -input myInputDirs \
     -output myOutputDir \
     -mapper org.apache.hadoop.mapred.lib.IdentityMapper \
@@ -421,11 +428,11 @@
 <p>
 Here, <em>-D stream.map.output.field.separator=.</em> and <em>-D stream.num.map.output.key.fields=4</em> are as explained in previous example. The two variables are used by streaming to identify the key/value pair of mapper. 
 </p><p>
-The map output keys of the above Map/Reduce job normally have four fields
-separated by ".". However, the Map/Reduce framework will partition the map
+The map output keys of the above MapReduce job normally have four fields
+separated by ".". However, the MapReduce framework will partition the map
 outputs by the first two fields of the keys using the 
-<em>-D mapred.text.key.partitioner.options=-k1,2</em> option. 
-Here, <em>-D map.output.key.field.separator=.</em> specifies the separator 
+<em>-D mapreduce.partition.keypartitioner.options=-k1,2</em> option. 
+Here, <em>-D mapreduce.map.output.key.field.separator=.</em> specifies the separator 
 for the partition. This guarantees that all the key/value pairs with the 
 same first two fields in the keys will be partitioned into the same reducer.
 </p><p>
@@ -470,22 +477,22 @@
 </p>
 <source>
 $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
-    -D mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator \
+    -D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator \
     -D stream.map.output.field.separator=. \
     -D stream.num.map.output.key.fields=4 \
-    -D map.output.key.field.separator=. \
-    -D mapred.text.key.comparator.options=-k2,2nr \
-    -D mapred.reduce.tasks=12 \
+    -D mapreduce.map.output.key.field.separator=. \
+    -D mapreduce.partition.keycomparator.options=-k2,2nr \
+    -D mapreduce.job.reduces=12 \
     -input myInputDirs \
     -output myOutputDir \
     -mapper org.apache.hadoop.mapred.lib.IdentityMapper \
     -reducer org.apache.hadoop.mapred.lib.IdentityReducer 
 </source>
 <p>
-The map output keys of the above Map/Reduce job normally have four fields
-separated by ".". However, the Map/Reduce framework will sort the 
+The map output keys of the above MapReduce job normally have four fields
+separated by ".". However, the MapReduce framework will sort the 
 outputs by the second field of the keys using the 
-<em>-D mapred.text.key.comparator.options=-k2,2nr</em> option. 
+<em>-D mapreduce.partition.keycomparator.options=-k2,2nr</em> option. 
 Here, <em>-n</em> specifies that the sorting is numerical sorting and 
 <em>-r</em> specifies that the result should be reversed. A simple illustration
 is shown below:
@@ -526,7 +533,7 @@
 </p>
 <source>
 $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
-    -D mapred.reduce.tasks=12 \
+    -D mapreduce.job.reduces=12 \
     -input myInputDirs \
     -output myOutputDir \
     -mapper myAggregatorForKeyCount.py \
@@ -571,11 +578,11 @@
 <source>
 $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
     -D map.output.key.field.separa=. \
-    -D mapred.text.key.partitioner.options=-k1,2 \
-    -D mapred.data.field.separator=. \
-    -D map.output.key.value.fields.spec=6,5,1-3:0- \
-    -D reduce.output.key.value.fields.spec=0-2:5- \
-    -D mapred.reduce.tasks=12 \
+    -D mapreduce.partition.keypartitioner.options=-k1,2 \
+    -D mapreduce.fieldsel.data.field.separator=. \
+    -D mapreduce.fieldsel.mapreduce.fieldsel.map.output.key.value.fields.spec=6,5,1-3:0- \
+    -D mapreduce.fieldsel.mapreduce.fieldsel.reduce.output.key.value.fields.spec=0-2:5- \
+    -D mapreduce.job.reduces=12 \
     -input myInputDirs \
     -output myOutputDir \
     -mapper org.apache.hadoop.mapred.lib.FieldSelectionMapReduce \
@@ -584,13 +591,13 @@
 </source>
 
 <p>
-The option "-D map.output.key.value.fields.spec=6,5,1-3:0-" specifies key/value selection for the map outputs. 
+The option "-D mapreduce.fieldsel.mapreduce.fieldsel.map.output.key.value.fields.spec=6,5,1-3:0-" specifies key/value selection for the map outputs. 
 Key selection spec and value selection spec are separated by ":". 
 In this case, the map output key will consist of fields 6, 5, 1, 2, and 3. 
 The map output value will consist of all fields (0- means field 0 and all the subsequent fields). 
 </p>
 <p>
-The option "-D reduce.output.key.value.fields.spec=0-2:5-" specifies 
+The option "-D mapreduce.fieldsel.mapreduce.fieldsel.reduce.output.key.value.fields.spec=0-2:5-" specifies 
 key/value selection for the reduce outputs. In this case, the reduce 
 output key will consist of fields 0, 1, 2 (corresponding to the original 
 fields 6, 5, 1). The reduce output value will consist of all fields starting
@@ -653,7 +660,7 @@
 <section>
 <title>How many reducers should I use? </title>
 <p>
-See the Hadoop Wiki for details: <a href="mapred_tutorial.html#Reducer">Reducer</a>
+For details see <a href="mapred_tutorial.html#Reducer">Reducer</a>.
 </p>
 </section>
 
@@ -676,7 +683,7 @@
 dan     75
 
 $ c2='cut -f2'; $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-    -D mapred.job.name='Experiment'
+    -D mapreduce.job.name='Experiment'
     -input /user/me/samples/student_marks 
     -output /user/me/samples/student_out 
     -mapper \"$c2\" -reducer 'cat'  
@@ -735,7 +742,7 @@
 <section>
 <title>How do I generate output files with gzip format? </title>
 <p>
-Instead of plain text files, you can generate gzip files as your generated output. Pass '-D mapred.output.compress=true -D  mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec' as option to your streaming job.
+Instead of plain text files, you can generate gzip files as your generated output. Pass '-D mapreduce.output.fileoutputformat.compress=true -D  mapreduce.output.fileoutputformat.compression.codec=org.apache.hadoop.io.compress.GzipCodec' as option to your streaming job.
 </p>
 </section>
 
@@ -790,9 +797,9 @@
 <section>
 <title>How do I get the JobConf variables in a streaming job's mapper/reducer?</title>
 <p>
-See <a href="mapred_tutorial.html#Configured+Parameters">Configured Parameters</a>. 
+See the <a href="mapred_tutorial.html#Configured+Parameters">Configured Parameters</a>. 
 During the execution of a streaming job, the names of the "mapred" parameters are transformed. The dots ( . ) become underscores ( _ ).
-For example, mapred.job.id becomes mapred_job_id and mapred.jar becomes mapred_jar. In your code, use the parameter names with the underscores.
+For example, mapreduce.job.id becomes mapreduce.job.id and mapreduce.job.jar becomes mapreduce.job.jar. In your code, use the parameter names with the underscores.
 </p>
 </section>
 

Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/tabs.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/tabs.xml?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/tabs.xml (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/tabs.xml Sat Nov 28 20:26:01 2009
@@ -30,8 +30,8 @@
     directory (ends in '/'), in which case /index.html will be added
   -->
 
-  <tab label="Project" href="http://hadoop.apache.org/core/" />
-  <tab label="Wiki" href="http://wiki.apache.org/hadoop" />
-  <tab label="Hadoop 0.21 Documentation" dir="" />  
+  <tab label="Project" href="http://hadoop.apache.org/mapreduce/" />
+  <tab label="Wiki" href="http://wiki.apache.org/hadoop/MapReduce" />
+  <tab label="MapReduce 0.21 Documentation" dir="" />  
   
 </tabs>

Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/vaidya.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/vaidya.xml?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/vaidya.xml (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/content/xdocs/vaidya.xml Sat Nov 28 20:26:01 2009
@@ -29,8 +29,8 @@
     <section>
       <title>Purpose</title>
       
-      <p>This document describes various user-facing facets of Hadoop Vaidya, a performance diagnostic tool for map/reduce jobs. It
-         describes how to execute a default set of rules against your map/reduce job counters and
+      <p>This document describes various user-facing facets of Hadoop Vaidya, a performance diagnostic tool for MapReduce jobs. It
+         describes how to execute a default set of rules against your MapReduce job counters and
          how to write and execute new rules to detect specific performance problems. 
       </p>
       <p>A few sample test rules are provided with the tool with the objective of growing the rules database over the time. 
@@ -41,7 +41,7 @@
     </section>
     
     <section>
-      <title>Pre-requisites</title>
+      <title>Prerequisites</title>
       
       <p>Ensure that Hadoop is installed and configured. More details:</p> 
       <ul>
@@ -59,11 +59,11 @@
       
       <p>Hadoop Vaidya (Vaidya in Sanskrit language means "one who knows", or "a physician") 
 	    is a rule based performance diagnostic tool for 
-        Map/Reduce jobs. It performs a post execution analysis of map/reduce 
+        MapReduce jobs. It performs a post execution analysis of MapReduce 
         job by parsing and collecting execution statistics through job history 
         and job configuration files. It runs a set of predefined tests/rules 
         against job execution statistics to diagnose various performance problems. 
-        Each test rule detects a specific performance problem with the Map/Reduce job and provides 
+        Each test rule detects a specific performance problem with the MapReduce job and provides 
         a targeted advice to the user. This tool generates an XML report based on 
         the evaluation results of individual test rules.
       </p>
@@ -75,9 +75,9 @@
 	 
 	 <p> This section describes main concepts and terminology involved with Hadoop Vaidya,</p>
 		<ul>
-			<li> <em>PostExPerformanceDiagnoser</em>: This class extends the base Diagnoser class and acts as a driver for post execution performance analysis of Map/Reduce Jobs. 
+			<li> <em>PostExPerformanceDiagnoser</em>: This class extends the base Diagnoser class and acts as a driver for post execution performance analysis of MapReduce Jobs. 
                        It detects performance inefficiencies by executing a set of performance diagnosis rules against the job execution statistics.</li>
-			<li> <em>Job Statistics</em>: This includes the job configuration information (job.xml) and various counters logged by Map/Reduce job as a part of the job history log
+			<li> <em>Job Statistics</em>: This includes the job configuration information (job.xml) and various counters logged by MapReduce job as a part of the job history log
 		           file. The counters are parsed and collected into the Job Statistics data structures, which contains global job level aggregate counters and 
 			     a set of counters for each Map and Reduce task.</li>
 			<li> <em>Diagnostic Test/Rule</em>: This is a program logic that detects the inefficiency of M/R job based on the job statistics. The
@@ -139,10 +139,10 @@
 	</section>
 	
     <section>
-		<title>How to Write and Execute your own Tests</title>
+		<title>How to Write and Execute Your Own Tests</title>
 		<p>Writing and executing your own test rules is not very hard. You can take a look at Hadoop Vaidya source code for existing set of tests. 
-		   The source code is at this <a href="http://svn.apache.org/viewvc/hadoop/core/trunk/src/contrib/vaidya/src/java/org/apache/hadoop/vaidya/">hadoop svn repository location</a>
-		   . The default set of tests are under <code>"postexdiagnosis/tests/"</code> folder.</p>
+		   The source code is at this <a href="http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/vaidya/src/java/org/apache/hadoop/vaidya/">hadoop svn repository location</a>. 
+		   The default set of tests are under <code>"postexdiagnosis/tests/"</code> folder.</p>
 		<ul>
 		  <li>Writing a test class for your new test case should extend the <code>org.apache.hadoop.vaidya.DiagnosticTest</code> class and 
 		       it should override following three methods from the base class, 

Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/skinconf.xml
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/skinconf.xml?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/skinconf.xml (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/docs/src/documentation/skinconf.xml Sat Nov 28 20:26:01 2009
@@ -68,7 +68,7 @@
   <project-name>Hadoop</project-name>
   <project-description>Scalable Computing Platform</project-description>
   <project-url>http://hadoop.apache.org/core/</project-url>
-  <project-logo>images/core-logo.gif</project-logo>
+  <project-logo>images/mapreduce-logo.jpg</project-logo>
 
   <!-- group logo -->
   <group-name>Hadoop</group-name>
@@ -146,13 +146,13 @@
     <!--Headers -->
 	#content h1 {
 	  margin-bottom: .5em;
-	  font-size: 200%; color: black;
+	  font-size: 185%; color: black;
 	  font-family: arial;
 	}  
-    h2, .h3 { font-size: 195%; color: black; font-family: arial; }
-	h3, .h4 { font-size: 140%; color: black; font-family: arial; margin-bottom: 0.5em; }
+    h2, .h3 { font-size: 175%; color: black; font-family: arial; }
+	h3, .h4 { font-size: 135%; color: black; font-family: arial; margin-bottom: 0.5em; }
 	h4, .h5 { font-size: 125%; color: black;  font-style: italic; font-weight: bold; font-family: arial; }
-	h5, h6 { font-size: 110%; color: #363636; font-weight: bold; } 
+	h5, h6 { font-size: 110%; color: #363636; font-weight: bold; }  
    
    <!--Code Background -->
     pre.code {

Propchange: hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/
------------------------------------------------------------------------------
--- svn:mergeinfo (original)
+++ svn:mergeinfo Sat Nov 28 20:26:01 2009
@@ -1,3 +1,3 @@
 /hadoop/core/branches/branch-0.19/mapred/src/examples:713112
 /hadoop/core/trunk/src/examples:776175-784663
-/hadoop/mapreduce/trunk/src/examples:804974-807678
+/hadoop/mapreduce/trunk/src/examples:804974-884916

Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/BaileyBorweinPlouffe.java
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/BaileyBorweinPlouffe.java?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/BaileyBorweinPlouffe.java (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/BaileyBorweinPlouffe.java Sat Nov 28 20:26:01 2009
@@ -68,7 +68,8 @@
   public static final String DESCRIPTION
       = "A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.";
 
-  private static final String NAME = BaileyBorweinPlouffe.class.getSimpleName();
+  private static final String NAME = "mapreduce." + 
+    BaileyBorweinPlouffe.class.getSimpleName();
 
   //custom job properties
   private static final String WORKING_DIR_PROPERTY = NAME + ".dir";
@@ -327,11 +328,11 @@
     job.setInputFormatClass(BbpInputFormat.class);
 
     // disable task timeout
-    jobconf.setLong("mapred.task.timeout", 0);
+    jobconf.setLong(JobContext.TASK_TIMEOUT, 0);
 
     // do not use speculative execution
-    jobconf.setBoolean("mapred.map.tasks.speculative.execution", false);
-    jobconf.setBoolean("mapred.reduce.tasks.speculative.execution", false);
+    jobconf.setBoolean(JobContext.MAP_SPECULATIVE, false);
+    jobconf.setBoolean(JobContext.REDUCE_SPECULATIVE, false);
     return job;
   }
 

Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/ExampleDriver.java
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/ExampleDriver.java?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/ExampleDriver.java (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/ExampleDriver.java Sat Nov 28 20:26:01 2009
@@ -59,14 +59,12 @@
       pgd.addClass("secondarysort", SecondarySort.class,
                    "An example defining a secondary sort to the reduce.");
       pgd.addClass("sudoku", Sudoku.class, "A sudoku solver.");
-      pgd.addClass("sleep", SleepJob.class, "A job that sleeps at each map and reduce task.");
       pgd.addClass("join", Join.class, "A job that effects a join over sorted, equally partitioned datasets");
       pgd.addClass("multifilewc", MultiFileWordCount.class, "A job that counts words from several files.");
       pgd.addClass("dbcount", DBCountPageView.class, "An example job that count the pageview counts from a database.");
       pgd.addClass("teragen", TeraGen.class, "Generate data for the terasort");
       pgd.addClass("terasort", TeraSort.class, "Run the terasort");
       pgd.addClass("teravalidate", TeraValidate.class, "Checking results of terasort");
-      pgd.addClass("fail", FailJob.class, "a job that always fails");
       exitCode = pgd.driver(argv);
     }
     catch(Throwable e){

Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/Grep.java
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/Grep.java?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/Grep.java (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/Grep.java Sat Nov 28 20:26:01 2009
@@ -52,9 +52,9 @@
           Integer.toString(new Random().nextInt(Integer.MAX_VALUE)));
 
     Configuration conf = getConf();
-    conf.set("mapred.mapper.regex", args[2]);
+    conf.set(RegexMapper.PATTERN, args[2]);
     if (args.length == 4)
-      conf.set("mapred.mapper.regex.group", args[3]);
+      conf.set(RegexMapper.GROUP, args[3]);
 
     Job grepJob = new Job(conf);
     

Modified: hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/Join.java
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/Join.java?rev=885145&r1=885144&r2=885145&view=diff
==============================================================================
--- hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/Join.java (original)
+++ hadoop/mapreduce/branches/MAPREDUCE-233/src/examples/org/apache/hadoop/examples/Join.java Sat Nov 28 20:26:01 2009
@@ -52,7 +52,7 @@
  *            [<i>in-dir</i>]* <i>in-dir</i> <i>out-dir</i> 
  */
 public class Join extends Configured implements Tool {
-
+  public static String REDUCES_PER_HOST = "mapreduce.join.reduces_per_host";
   static int printUsage() {
     System.out.println("join [-r <reduces>] " +
                        "[-inFormat <input format class>] " +
@@ -77,7 +77,7 @@
     JobClient client = new JobClient(conf);
     ClusterStatus cluster = client.getClusterStatus();
     int num_reduces = (int) (cluster.getMaxReduceTasks() * 0.9);
-    String join_reduces = conf.get("mapreduce.join.reduces_per_host");
+    String join_reduces = conf.get(REDUCES_PER_HOST);
     if (join_reduces != null) {
        num_reduces = cluster.getTaskTrackers() * 
                        Integer.parseInt(join_reduces);



Mime
View raw message