hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From d...@apache.org
Subject svn commit: r674834 [1/3] - in /hadoop/core/trunk: ./ docs/ src/docs/src/documentation/content/xdocs/
Date Tue, 08 Jul 2008 14:11:58 GMT
Author: ddas
Date: Tue Jul  8 07:11:58 2008
New Revision: 674834

URL: http://svn.apache.org/viewvc?rev=674834&view=rev
Log:
HADOOP-3691. Fix streaming and tutorial docs. Contributed by Jothi Padmanabhan.

Modified:
    hadoop/core/trunk/CHANGES.txt
    hadoop/core/trunk/docs/changes.html
    hadoop/core/trunk/docs/mapred_tutorial.html
    hadoop/core/trunk/docs/mapred_tutorial.pdf
    hadoop/core/trunk/docs/streaming.html
    hadoop/core/trunk/docs/streaming.pdf
    hadoop/core/trunk/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
    hadoop/core/trunk/src/docs/src/documentation/content/xdocs/streaming.xml

Modified: hadoop/core/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/CHANGES.txt?rev=674834&r1=674833&r2=674834&view=diff
==============================================================================
--- hadoop/core/trunk/CHANGES.txt (original)
+++ hadoop/core/trunk/CHANGES.txt Tue Jul  8 07:11:58 2008
@@ -777,6 +777,8 @@
     HADOOP-3692. Fix documentation for Cluster setup and Quick start guides. 
     (Amareshwari Sriramadasu via ddas)
 
+    HADOOP-3691. Fix streaming and tutorial docs. (Jothi Padmanabhan via ddas)
+
 Release 0.17.1 - Unreleased
 
   INCOMPATIBLE CHANGES

Modified: hadoop/core/trunk/docs/changes.html
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/docs/changes.html?rev=674834&r1=674833&r2=674834&view=diff
==============================================================================
--- hadoop/core/trunk/docs/changes.html (original)
+++ hadoop/core/trunk/docs/changes.html Tue Jul  8 07:11:58 2008
@@ -378,7 +378,7 @@
     </ol>
   </li>
   <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._bug_fixes_')">
 BUG FIXES
-</a>&nbsp;&nbsp;&nbsp;(113)
+</a>&nbsp;&nbsp;&nbsp;(115)
     <ol id="release_0.18.0_-_unreleased_._bug_fixes_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2905">HADOOP-2905</a>.
'fsck -move' triggers NPE in NameNode.<br />(Lohit Vjayarenu via rangadi)</li>
       <li>Increment ClientProtocol.versionID missed by <a href="http://issues.apache.org/jira/browse/HADOOP-2585">HADOOP-2585</a>.<br
/>(shv)</li>
@@ -603,6 +603,8 @@
 conform to style guidelines.<br />(Amareshwari Sriramadasu via cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3653">HADOOP-3653</a>.
Fix test-patch target to properly account for Eclipse
 classpath jars.<br />(Brice Arnould via nigel)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3692">HADOOP-3692</a>.
Fix documentation for Cluster setup and Quick start guides.<br />(Amareshwari Sriramadasu
via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3691">HADOOP-3691</a>.
Fix streaming and tutorial docs.<br />(Jothi Padmanabhan via ddas)</li>
     </ol>
   </li>
 </ul>

Modified: hadoop/core/trunk/docs/mapred_tutorial.html
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/docs/mapred_tutorial.html?rev=674834&r1=674833&r2=674834&view=diff
==============================================================================
--- hadoop/core/trunk/docs/mapred_tutorial.html (original)
+++ hadoop/core/trunk/docs/mapred_tutorial.html Tue Jul  8 07:11:58 2008
@@ -5,7 +5,7 @@
 <meta content="Apache Forrest" name="Generator">
 <meta name="Forrest-version" content="0.8">
 <meta name="Forrest-skin-name" content="pelt">
-<title>Hadoop Map-Reduce Tutorial</title>
+<title>Hadoop Map/Reduce Tutorial</title>
 <link type="text/css" href="skin/basic.css" rel="stylesheet">
 <link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
 <link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
@@ -187,7 +187,7 @@
 <a class="dida" href="mapred_tutorial.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif"
class="skin"><br>
         PDF</a>
 </div>
-<h1>Hadoop Map-Reduce Tutorial</h1>
+<h1>Hadoop Map/Reduce Tutorial</h1>
 <div id="minitoc-area">
 <ul class="minitoc">
 <li>
@@ -217,7 +217,7 @@
 </ul>
 </li>
 <li>
-<a href="#Map-Reduce+-+User+Interfaces">Map-Reduce - User Interfaces</a>
+<a href="#Map%2FReduce+-+User+Interfaces">Map/Reduce - User Interfaces</a>
 <ul class="minitoc">
 <li>
 <a href="#Payload">Payload</a>
@@ -328,7 +328,7 @@
 <h2 class="h3">Purpose</h2>
 <div class="section">
 <p>This document comprehensively describes all user-facing facets of the 
-      Hadoop Map-Reduce framework and serves as a tutorial.
+      Hadoop Map/Reduce framework and serves as a tutorial.
       </p>
 </div>
     
@@ -356,11 +356,11 @@
 <a name="N10032"></a><a name="Overview"></a>
 <h2 class="h3">Overview</h2>
 <div class="section">
-<p>Hadoop Map-Reduce is a software framework for easily writing 
+<p>Hadoop Map/Reduce is a software framework for easily writing 
       applications which process vast amounts of data (multi-terabyte data-sets) 
       in-parallel on large clusters (thousands of nodes) of commodity 
       hardware in a reliable, fault-tolerant manner.</p>
-<p>A Map-Reduce <em>job</em> usually splits the input data-set into 
+<p>A Map/Reduce <em>job</em> usually splits the input data-set into 
       independent chunks which are processed by the <em>map tasks</em> in a
       completely parallel manner. The framework sorts the outputs of the maps, 
       which are then input to the <em>reduce tasks</em>. Typically both the 
@@ -368,12 +368,12 @@
       takes care of scheduling tasks, monitoring them and re-executes the failed
       tasks.</p>
 <p>Typically the compute nodes and the storage nodes are the same, that is, 
-      the Map-Reduce framework and the <a href="hdfs_design.html">Distributed 
+      the Map/Reduce framework and the <a href="hdfs_design.html">Distributed 
       FileSystem</a> are running on the same set of nodes. This configuration
       allows the framework to effectively schedule tasks on the nodes where data 
       is already present, resulting in very high aggregate bandwidth across the 
       cluster.</p>
-<p>The Map-Reduce framework consists of a single master 
+<p>The Map/Reduce framework consists of a single master 
       <span class="codefrag">JobTracker</span> and one slave <span class="codefrag">TaskTracker</span>
per 
       cluster-node. The master is responsible for scheduling the jobs' component 
       tasks on the slaves, monitoring them and re-executing the failed tasks. The 
@@ -388,7 +388,7 @@
       scheduling tasks and monitoring them, providing status and diagnostic 
       information to the job-client.</p>
 <p>Although the Hadoop framework is implemented in Java<sup>TM</sup>, 
-      Map-Reduce applications need not be written in Java.</p>
+      Map/Reduce applications need not be written in Java.</p>
 <ul>
         
 <li>
@@ -403,7 +403,7 @@
           
 <a href="api/org/apache/hadoop/mapred/pipes/package-summary.html">
           Hadoop Pipes</a> is a <a href="http://www.swig.org/">SWIG</a>-
-          compatible <em>C++ API</em> to implement Map-Reduce applications (non

+          compatible <em>C++ API</em> to implement Map/Reduce applications (non

           JNI<sup>TM</sup> based).
         </li>
       
@@ -414,7 +414,7 @@
 <a name="N1008B"></a><a name="Inputs+and+Outputs"></a>
 <h2 class="h3">Inputs and Outputs</h2>
 <div class="section">
-<p>The Map-Reduce framework operates exclusively on 
+<p>The Map/Reduce framework operates exclusively on 
       <span class="codefrag">&lt;key, value&gt;</span> pairs, that is,
the framework views the 
       input to the job as a set of <span class="codefrag">&lt;key, value&gt;</span>
pairs and 
       produces a set of <span class="codefrag">&lt;key, value&gt;</span>
pairs as the output of 
@@ -426,7 +426,7 @@
       <a href="api/org/apache/hadoop/io/WritableComparable.html">
       WritableComparable</a> interface to facilitate sorting by the framework.
       </p>
-<p>Input and Output types of a Map-Reduce job:</p>
+<p>Input and Output types of a Map/Reduce job:</p>
 <p>
         (input) <span class="codefrag">&lt;k1, v1&gt;</span> 
         -&gt; 
@@ -448,7 +448,7 @@
 <a name="N100CD"></a><a name="Example%3A+WordCount+v1.0"></a>
 <h2 class="h3">Example: WordCount v1.0</h2>
 <div class="section">
-<p>Before we jump into the details, lets walk through an example Map-Reduce 
+<p>Before we jump into the details, lets walk through an example Map/Reduce 
       application to get a flavour for how they work.</p>
 <p>
 <span class="codefrag">WordCount</span> is a simple application that counts the
number of
@@ -1226,11 +1226,11 @@
 </div>
     
     
-<a name="N105A3"></a><a name="Map-Reduce+-+User+Interfaces"></a>
-<h2 class="h3">Map-Reduce - User Interfaces</h2>
+<a name="N105A3"></a><a name="Map%2FReduce+-+User+Interfaces"></a>
+<h2 class="h3">Map/Reduce - User Interfaces</h2>
 <div class="section">
 <p>This section provides a reasonable amount of detail on every user-facing 
-      aspect of the Map-Reduce framwork. This should help users implement, 
+      aspect of the Map/Reduce framwork. This should help users implement, 
       configure and tune their jobs in a fine-grained manner. However, please 
       note that the javadoc for each class/interface remains the most 
       comprehensive documentation available; this is only meant to be a tutorial.
@@ -1260,7 +1260,7 @@
           intermediate records. The transformed intermediate records do not need
           to be of the same type as the input records. A given input pair may 
           map to zero or many output pairs.</p>
-<p>The Hadoop Map-Reduce framework spawns one map task for each 
+<p>The Hadoop Map/Reduce framework spawns one map task for each 
           <span class="codefrag">InputSplit</span> generated by the <span
class="codefrag">InputFormat</span> for 
           the job.</p>
 <p>Overall, <span class="codefrag">Mapper</span> implementations are passed
the 
@@ -1423,7 +1423,7 @@
 <h4>Reporter</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/Reporter.html">
-          Reporter</a> is a facility for Map-Reduce applications to report 
+          Reporter</a> is a facility for Map/Reduce applications to report 
           progress, set application-level status messages and update 
           <span class="codefrag">Counters</span>.</p>
 <p>
@@ -1443,20 +1443,20 @@
 <p>
 <a href="api/org/apache/hadoop/mapred/OutputCollector.html">
           OutputCollector</a> is a generalization of the facility provided by
-          the Map-Reduce framework to collect data output by the 
+          the Map/Reduce framework to collect data output by the 
           <span class="codefrag">Mapper</span> or the <span class="codefrag">Reducer</span>
(either the 
           intermediate outputs or the output of the job).</p>
-<p>Hadoop Map-Reduce comes bundled with a 
+<p>Hadoop Map/Reduce comes bundled with a 
         <a href="api/org/apache/hadoop/mapred/lib/package-summary.html">
         library</a> of generally useful mappers, reducers, and partitioners.</p>
 <a name="N107B6"></a><a name="Job+Configuration"></a>
 <h3 class="h4">Job Configuration</h3>
 <p>
 <a href="api/org/apache/hadoop/mapred/JobConf.html">
-        JobConf</a> represents a Map-Reduce job configuration.</p>
+        JobConf</a> represents a Map/Reduce job configuration.</p>
 <p>
 <span class="codefrag">JobConf</span> is the primary interface for a user to
describe
-        a map-reduce job to the Hadoop framework for execution. The framework 
+        a Map/Reduce job to the Hadoop framework for execution. The framework 
         tries to faithfully execute the job as described by <span class="codefrag">JobConf</span>,

         however:</p>
 <ul>
@@ -1747,7 +1747,7 @@
         with the <span class="codefrag">JobTracker</span>.</p>
 <p>
 <span class="codefrag">JobClient</span> provides facilities to submit jobs, track
their 
-        progress, access component-tasks' reports/logs, get the Map-Reduce 
+        progress, access component-tasks' reports and logs, get the Map/Reduce 
         cluster's status information and so on.</p>
 <p>The job submission process involves:</p>
 <ol>
@@ -1762,7 +1762,7 @@
           </li>
           
 <li>
-            Copying the job's jar and configuration to the map-reduce system 
+            Copying the job's jar and configuration to the Map/Reduce system 
             directory on the <span class="codefrag">FileSystem</span>.
           </li>
           
@@ -1802,8 +1802,8 @@
         <span class="codefrag">JobClient</span> to submit the job and monitor
its progress.</p>
 <a name="N10A48"></a><a name="Job+Control"></a>
 <h4>Job Control</h4>
-<p>Users may need to chain map-reduce jobs to accomplish complex
-          tasks which cannot be done via a single map-reduce job. This is fairly
+<p>Users may need to chain Map/Reduce jobs to accomplish complex
+          tasks which cannot be done via a single Map/Reduce job. This is fairly
           easy since the output of the job typically goes to distributed 
           file-system, and the output, in turn, can be used as the input for the 
           next job.</p>
@@ -1840,9 +1840,9 @@
 <h3 class="h4">Job Input</h3>
 <p>
 <a href="api/org/apache/hadoop/mapred/InputFormat.html">
-        InputFormat</a> describes the input-specification for a Map-Reduce job.
+        InputFormat</a> describes the input-specification for a Map/Reduce job.
         </p>
-<p>The Map-Reduce framework relies on the <span class="codefrag">InputFormat</span>
of 
+<p>The Map/Reduce framework relies on the <span class="codefrag">InputFormat</span>
of 
         the job to:</p>
 <ol>
           
@@ -1914,9 +1914,9 @@
 <h3 class="h4">Job Output</h3>
 <p>
 <a href="api/org/apache/hadoop/mapred/OutputFormat.html">
-        OutputFormat</a> describes the output-specification for a Map-Reduce 
+        OutputFormat</a> describes the output-specification for a Map/Reduce 
         job.</p>
-<p>The Map-Reduce framework relies on the <span class="codefrag">OutputFormat</span>
of 
+<p>The Map/Reduce framework relies on the <span class="codefrag">OutputFormat</span>
of 
         the job to:</p>
 <ol>
           
@@ -1946,7 +1946,7 @@
           application-writer will have to pick unique names per task-attempt 
           (using the attemptid, say <span class="codefrag">attempt_200709221812_0001_m_000000_0</span>),

           not just per task.</p>
-<p>To avoid these issues the Map-Reduce framework maintains a special 
+<p>To avoid these issues the Map/Reduce framework maintains a special 
           <span class="codefrag">${mapred.output.dir}/_temporary/_${taskid}</span>
sub-directory
           accessible via <span class="codefrag">${mapred.work.output.dir}</span>
           for each task-attempt on the <span class="codefrag">FileSystem</span>
where the output
@@ -1966,7 +1966,7 @@
 <p>Note: The value of <span class="codefrag">${mapred.work.output.dir}</span>
during 
           execution of a particular task-attempt is actually 
           <span class="codefrag">${mapred.output.dir}/_temporary/_{$taskid}</span>,
and this value is 
-          set by the map-reduce framework. So, just create any side-files in the 
+          set by the Map/Reduce framework. So, just create any side-files in the 
           path  returned by
           <a href="api/org/apache/hadoop/mapred/FileOutputFormat.html#getWorkOutputPath(org.apache.hadoop.mapred.JobConf)">
           FileOutputFormat.getWorkOutputPath() </a>from map/reduce 
@@ -1988,7 +1988,7 @@
 <h4>Counters</h4>
 <p>
 <span class="codefrag">Counters</span> represent global counters, defined either
by 
-          the Map-Reduce framework or applications. Each <span class="codefrag">Counter</span>
can 
+          the Map/Reduce framework or applications. Each <span class="codefrag">Counter</span>
can 
           be of any <span class="codefrag">Enum</span> type. Counters of a particular

           <span class="codefrag">Enum</span> are bunched into groups of type

           <span class="codefrag">Counters.Group</span>.</p>
@@ -2009,7 +2009,7 @@
           files efficiently.</p>
 <p>
 <span class="codefrag">DistributedCache</span> is a facility provided by the

-          Map-Reduce framework to cache files (text, archives, jars and so on) 
+          Map/Reduce framework to cache files (text, archives, jars and so on) 
           needed by applications.</p>
 <p>Applications specify the files to be cached via urls (hdfs://)
           in the <span class="codefrag">JobConf</span>. The <span class="codefrag">DistributedCache</span>

@@ -2078,7 +2078,7 @@
           interface supports the handling of generic Hadoop command-line options.
           </p>
 <p>
-<span class="codefrag">Tool</span> is the standard for any Map-Reduce tool or

+<span class="codefrag">Tool</span> is the standard for any Map/Reduce tool or

           application. The application should delegate the handling of 
           standard command-line options to 
           <a href="api/org/apache/hadoop/util/GenericOptionsParser.html">
@@ -2116,7 +2116,7 @@
 <h4>IsolationRunner</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/IsolationRunner.html">
-          IsolationRunner</a> is a utility to help debug Map-Reduce programs.</p>
+          IsolationRunner</a> is a utility to help debug Map/Reduce programs.</p>
 <p>To use the <span class="codefrag">IsolationRunner</span>, first set

           <span class="codefrag">keep.failed.tasks.files</span> to <span class="codefrag">true</span>

           (also see <span class="codefrag">keep.tasks.files.pattern</span>).</p>
@@ -2219,11 +2219,11 @@
 <h4>JobControl</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/jobcontrol/package-summary.html">
-          JobControl</a> is a utility which encapsulates a set of Map-Reduce jobs
+          JobControl</a> is a utility which encapsulates a set of Map/Reduce jobs
           and their dependencies.</p>
 <a name="N10D57"></a><a name="Data+Compression"></a>
 <h4>Data Compression</h4>
-<p>Hadoop Map-Reduce provides facilities for the application-writer to
+<p>Hadoop Map/Reduce provides facilities for the application-writer to
           specify compression for both intermediate map-outputs and the
           job-outputs i.e. output of the reduces. It also comes bundled with
           <a href="api/org/apache/hadoop/io/compress/CompressionCodec.html">
@@ -2268,7 +2268,7 @@
 <h2 class="h3">Example: WordCount v2.0</h2>
 <div class="section">
 <p>Here is a more complete <span class="codefrag">WordCount</span> which
uses many of the
-      features provided by the Map-Reduce framework we discussed so far.</p>
+      features provided by the Map/Reduce framework we discussed so far.</p>
 <p>This needs the HDFS to be up and running, especially for the 
       <span class="codefrag">DistributedCache</span>-related features. Hence
it only works with a 
       <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
@@ -3655,7 +3655,7 @@
 <a name="N1160B"></a><a name="Highlights"></a>
 <h3 class="h4">Highlights</h3>
 <p>The second version of <span class="codefrag">WordCount</span> improves
upon the 
-        previous one by using some features offered by the Map-Reduce framework:
+        previous one by using some features offered by the Map/Reduce framework:
         </p>
 <ul>
           



Mime
View raw message