hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From acmur...@apache.org
Subject svn commit: r594460 [6/6] - in /lucene/hadoop/trunk: ./ docs/ src/docs/src/documentation/content/xdocs/
Date Tue, 13 Nov 2007 09:01:13 GMT
Added: lucene/hadoop/trunk/src/docs/src/documentation/content/xdocs/quickstart.xml
URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/docs/src/documentation/content/xdocs/quickstart.xml?rev=594460&view=auto
==============================================================================
--- lucene/hadoop/trunk/src/docs/src/documentation/content/xdocs/quickstart.xml (added)
+++ lucene/hadoop/trunk/src/docs/src/documentation/content/xdocs/quickstart.xml Tue Nov 13
01:01:11 2007
@@ -0,0 +1,255 @@
+<?xml version="1.0"?>
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<document>
+  
+  <header>
+    <title>Hadoop Quickstart</title>
+  </header>
+  
+  <body>
+  
+    <section>
+      <title>Purpose</title>
+      
+      <p>The purpose of this document is to help users get a single-node Hadoop 
+      installation up and running very quickly so that users can get a flavour 
+      of the <a href="hdfs_design.html">Hadoop Distributed File System 
+      (<acronym title="Hadoop Distributed File System">HDFS</acronym>)</a>
and 
+      the Map-Reduce framework i.e. perform simple operations on HDFS, run 
+      example/simple jobs etc.</p>
+    </section>
+    
+    <section id="PreReqs">
+      <title>Pre-requisites</title>
+      
+      <section>
+        <title>Supported Platforms</title>
+        
+        <ul>
+          <li>
+            Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
+          </li>
+          <li>
+            Win32 is supported as a <em>development platform</em>. Distributed

+            operation has not been well tested on Win32, so this is not a 
+            <em>production platform</em>.
+          </li>
+        </ul>        
+      </section>
+      
+      <section>
+        <title>Required Software</title>
+        
+        <ol>
+          <li>
+            Java<sup>TM</sup> 1.5.x, preferably from Sun, must be installed.
Set 
+            <code>JAVA_HOME</code> to the root of your Java installation.
+          </li>
+          <li>
+            <strong>ssh</strong> must be installed and <strong>sshd</strong>
must 
+            be running to use the Hadoop scripts that manage remote Hadoop 
+            daemons.
+          </li>
+        </ol>
+        
+        <section>
+          <title>Additional requirements for Windows</title>
+          
+          <ol>
+            <li>
+              <a href="http://www.cygwin.com/">Cygwin</a> - Required for shell

+              support in addition to the required software above. 
+            </li>
+          </ol>
+        </section>
+        
+      </section>
+
+      <section>
+        <title>Installing Software</title>
+          
+        <p>If your cluster doesn't have the requisite software you will need to
+        install it.</p>
+          
+        <p>For example on Ubuntu Linux:</p>
+        <p>
+          <code>$ sudo apt-get install ssh</code><br/>
+          <code>$ sudo apt-get install rsync</code>
+        </p>
+          
+        <p>On Windows, if you did not install the required software when you 
+        installed cygwin, start the cygwin installer and select the packages:</p>
+        <ul>
+          <li>openssh - the <em>Net</em> category</li>
+        </ul>
+      </section>
+      
+    </section>
+    
+    <section>
+      <title>Download</title>
+      
+      <p>
+        First, you need to get a Hadoop distribution: download a recent 
+        <a href="releases.html">stable release</a> and unpack it.
+      </p>
+
+      <p>
+        Once done, in the distribution edit the file 
+        <code>conf/hadoop-env.sh</code> to define at least <code>JAVA_HOME</code>.
+      </p>
+
+	  <p>
+	    Try the following command:<br/>
+        <code>$ bin/hadoop</code><br/>
+        This will display the usage documentation for the <strong>hadoop</strong>

+        script.
+      </p>
+    </section>
+    
+    <section>
+      <title>Standalone Operation</title>
+      
+      <p>By default, Hadoop is configured to run things in a non-distributed 
+      mode, as a single Java process. This is useful for debugging.</p>
+      
+      <p>
+        The following example copies the unpacked <code>conf</code> directory
to 
+        use as input and then finds and displays every match of the given regular 
+        expression. Output is written to the given <code>output</code> directory.
+        <br/>
+        <code>$ mkdir input</code><br/>
+        <code>$ cp conf/*.xml input</code><br/>
+        <code>
+          $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
+        </code><br/>
+        <code>$ cat output/*</code>
+      </p>
+    </section>
+    
+    <section id="SingleNodeSetup">
+      <title>Pseudo-Distributed Operation</title>
+
+	  <p>Hadoop can also be run on a single-node in a pseudo-distributed mode 
+	  where each Hadoop daemon runs in a separate Java process.</p>
+	  
+      <section>
+        <title>Configuration</title>
+        <p>Use the following <code>conf/hadoop-site.xml</code>:</p>
+        <table>
+        <tr><td>&lt;configuration&gt;</td></tr>
+
+          <tr><td>&nbsp;&nbsp;&lt;property&gt;</td></tr>
+            <tr><td>&nbsp;&nbsp;&nbsp;&nbsp;&lt;name&gt;fs.default.name&lt;/name&gt;</td></tr>
+            <tr><td>&nbsp;&nbsp;&nbsp;&nbsp;&lt;value&gt;localhost:9000&lt;/value&gt;</td></tr>
+          <tr><td>&nbsp;&nbsp;&lt;/property&gt;</td></tr>
+
+          <tr><td>&nbsp;&nbsp;&lt;property&gt;</td></tr>
+            <tr><td>&nbsp;&nbsp;&nbsp;&nbsp;&lt;name&gt;mapred.job.tracker&lt;/name&gt;</td></tr>
+            <tr><td>&nbsp;&nbsp;&nbsp;&nbsp;&lt;value&gt;localhost:9001&lt;/value&gt;</td></tr>
+          <tr><td>&nbsp;&nbsp;&lt;/property&gt;</td></tr>
+
+          <tr><td>&nbsp;&nbsp;&lt;property&gt;</td></tr>
+            <tr><td>&nbsp;&nbsp;&nbsp;&nbsp;&lt;name&gt;dfs.replication&lt;/name&gt;</td></tr>
+            <tr><td>&nbsp;&nbsp;&nbsp;&nbsp;&lt;value&gt;1&lt;/value&gt;</td></tr>
+          <tr><td>&nbsp;&nbsp;&lt;/property&gt;</td></tr>
+
+        <tr><td>&lt;/configuration&gt;</td></tr>
+        </table>
+      </section>
+
+      <section>
+        <title>Setup passphraseless <em>ssh</em></title>
+        
+        <p>
+          Now check that you can ssh to the localhost without a passphrase:<br/>
+          <code>$ ssh localhost</code>
+        </p>
+        
+        <p>
+          If you cannot ssh to localhost without a passphrase, execute the 
+          following commands:<br/>
+   		  <code>$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa</code><br/>
+		  <code>$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys</code>
+		</p>
+      </section>
+    
+      <section>
+        <title>Execution</title>
+        
+        <p>
+          Format a new distributed-filesystem:<br/>
+          <code>$ bin/hadoop namenode -format</code>
+        </p>
+
+		<p>
+		  Start The hadoop daemons:<br/>
+          <code>$ bin/start-all.sh</code>
+        </p>
+
+        <p>The hadoop daemon log output is written to the 
+        <code>${HADOOP_LOG_DIR}</code> directory (defaults to 
+        <code>${HADOOP_HOME}/logs</code>).</p>
+
+        <p>Browse the web-interface for the NameNode and the JobTracker, by
+        default they are available at:</p>
+        <ul>
+          <li>
+            <code>NameNode</code> - 
+            <a href="http://localhost:50070/">http://localhost:50070/</a>
+          </li>
+          <li>
+            <code>JobTracker</code> - 
+            <a href="http://localhost:50030/">http://localhost:50030/</a>
+          </li>
+        </ul>
+        
+        <p>
+          Copy the input files into the distributed filesystem:<br/>
+		  <code>$ bin/hadoop dfs -put conf input</code>
+		</p>
+		
+        <p>
+          Run some of the examples provided:<br/>
+          <code>
+            $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
+          </code>
+        </p>
+        
+        <p>Examine the output files:</p>
+        <p>
+          Copy the output files from the distributed filesystem to the local 
+          filesytem and examine them:<br/>
+          <code>$ bin/hadoop dfs -get output output</code><br/>
+          <code>$ cat output/*</code>
+        </p>
+        <p> or </p>
+        <p>
+          View the output files on the distributed filesystem:<br/>
+          <code>$ bin/hadoop dfs -cat output/*</code>
+        </p>
+
+		<p>
+		  When you're done, stop the daemons with:<br/>
+		  <code>$ bin/stop-all.sh</code>
+		</p>
+      </section>
+    </section>
+    
+    <section>
+      <title>Fully-Distributed Operation</title>
+      
+	  <p>Information on setting up fully-distributed non-trivial clusters
+	  can be found <a href="cluster_setup.html">here</a>.</p>  
+    </section>
+    
+    <p>
+      <em>Java and JNI are trademarks or registered trademarks of 
+      Sun Microsystems, Inc. in the United States and other countries.</em>
+    </p>
+    
+  </body>
+  
+</document>

Modified: lucene/hadoop/trunk/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/docs/src/documentation/content/xdocs/site.xml?rev=594460&r1=594459&r2=594460&view=diff
==============================================================================
--- lucene/hadoop/trunk/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ lucene/hadoop/trunk/src/docs/src/documentation/content/xdocs/site.xml Tue Nov 13 01:01:11
2007
@@ -24,9 +24,12 @@
   </project>
 
   <docs label="Documentation"> 
-    <hdfs      label="Hadoop File System" href="hdfs_design.html" />
-    <install   label="Install and Configure" href="ext:overview" />
-    <api       label="API Docs"           href="ext:api" />
+    <overview  label="Overview"           href="documentation.html" />
+    <quickstart label="Quickstart"        href="quickstart.html" />
+    <setup     label="Cluster Setup"      href="cluster_setup.html" />
+    <hdfs      label="HDFS Architecture"  href="hdfs_design.html" />
+    <mapred    label="Map-Reduce Tutorial" href="mapred_tutorial.html" />
+    <api       label="API Docs"           href="ext:api/index" />
     <wiki      label="Wiki"               href="ext:wiki" />
     <faq       label="FAQ"                href="ext:faq" />
     <usermail  label="Mailing Lists"      href="mailing_lists.html#Users" />
@@ -46,10 +49,117 @@
     <nightly   href="http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/" />
     <releases  href="http://www.apache.org/dyn/closer.cgi/lucene/hadoop/" />
     <store     href="http://www.cafepress.com/hadoop/" />
-    <api       href="api/index.html" />
-    <overview  href="api/overview-summary.html#overview_description" />
     <lucene    href="http://lucene.apache.org/" />
     <nutch     href="http://lucene.apache.org/nutch/" />
+    <hadoop-default href="http://lucene.apache.org/hadoop/hadoop-default.html" />
+    <api href="api/">
+      <index href="index.html" />
+      <org href="org/">
+        <apache href="apache/">
+          <hadoop href="hadoop/">
+            <conf href="conf/">
+              <configuration href="Configuration.html">
+                <final href="#FinalParams" />
+                <get href="#get(java.lang.String, java.lang.String)" />
+                <set href="#set(java.lang.String, java.lang.String)" />
+              </configuration>
+            </conf>
+            <filecache href="filecache/">
+              <distributedcache href="DistributedCache.html" />
+            </filecache>
+            <fs href="fs/">
+              <filesystem href="FileSystem.html" />
+            </fs>
+            <io href="io/">
+              <closeable href="Closeable.html">
+                <close href="#close()" />
+              </closeable>
+              <sequencefile href="SequenceFile.html" />
+              <writable href="Writable.html" />
+              <writablecomparable href="WritableComparable.html" />
+              <compress href="compress/">
+                <compressioncodec href="CompressionCodec.html" />
+              </compress>
+            </io>
+            <mapred href="mapred/">
+              <clusterstatus href="ClusterStatus.html" />
+              <counters href="Counters.html" />
+              <fileinputformat href="FileInputFormat.html" />
+              <filesplit href="FileSplit.html" />
+              <inputformat href="InputFormat.html" />
+              <inputsplit href="InputSplit.html" />
+              <isolationrunner href="IsolationRunner.html" />
+              <jobclient href="JobClient.html">
+                <runjob href="#runJob(org.apache.hadoop.mapred.JobConf)" />
+                <submitjob href="#submitJob(org.apache.hadoop.mapred.JobConf)" />
+              </jobclient>
+              <jobconf href="JobConf.html">
+                <setnummaptasks href="#setNumMapTasks(int)" />
+                <setnumreducetasks href="#setNumReduceTasks(int)" />
+                <setoutputkeycomparatorclass href="#setOutputKeyComparatorClass(java.lang.Class)"
/>
+                <setoutputvaluegroupingcomparator href="#setOutputValueGroupingComparator(java.lang.Class)"
/>
+                <setinputpath href="#setInputPath(org.apache.hadoop.fs.Path)" />
+                <addinputpath href="#addInputPath(org.apache.hadoop.fs.Path)" />
+                <getoutputpath href="#getOutputPath()" />
+                <setoutputpath href="#setOutputPath(org.apache.hadoop.fs.Path)" />
+                <setcombinerclass href="#setCombinerClass(java.lang.Class)" />
+                <setmapdebugscript href="#setMapDebugScript(java.lang.String)" />
+                <setreducedebugscript href="#setReduceDebugScript(java.lang.String)" />
+                <setspeculativeexecution href="#setSpeculativeExecution(boolean)" />
+                <setmaxmapattempts href="#setMaxMapAttempts(int)" />
+                <setmaxreduceattempts href="#setMaxReduceAttempts(int)" />
+                <setmaxmaptaskfailurespercent href="#setMaxMapTaskFailuresPercent(int)"
/>
+                <setmaxreducetaskfailurespercent href="#setMaxReduceTaskFailuresPercent(int)"
/>
+                <setjobendnotificationuri href="#setJobEndNotificationURI(java.lang.String)"
/>
+              </jobconf>
+              <jobconfigurable href="JobConfigurable.html">
+                <configure href="#configure(org.apache.hadoop.mapred.JobConf)" />
+              </jobconfigurable>
+              <jobcontrol href="jobcontrol/">
+                <package-summary href="package-summary.html" />
+              </jobcontrol>
+              <mapper href="Mapper.html">
+                <map href="#map(K1, V1, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter)"
/>
+              </mapper>
+              <outputcollector href="OutputCollector.html">
+                <collect href="#collect(K, V)" />
+              </outputcollector>
+              <outputformat href="OutputFormat.html" />
+              <partitioner href="Partitioner.html" />
+              <recordreader href="RecordReader.html" />
+              <recordwriter href="RecordWriter.html" />
+              <reducer href="Reducer.html">
+                <reduce href="#reduce(K2, java.util.Iterator, org.apache.hadoop.mapred.OutputCollector,
org.apache.hadoop.mapred.Reporter)" />
+              </reducer>
+              <reporter href="Reporter.html">
+                <incrcounter href="#incrCounter(java.lang.Enum, long)" />
+              </reporter>
+              <runningjob href="RunningJob.html" />
+              <textinputformat href="TextInputFormat.html" />
+              <textoutputformat href="TextOutputFormat.html" />
+              <lib href="lib/">
+                <package-summary href="package-summary.html" />
+                <hashpartitioner href="HashPartitioner.html" />
+              </lib>
+              <pipes href="pipes/">
+                <package-summary href="package-summary.html" />
+              </pipes>
+            </mapred>
+            <streaming href="streaming/">
+              <package-summary href="package-summary.html" />
+            </streaming>
+            <util href="util/">
+              <genericoptionsparser href="GenericOptionsParser.html" />
+              <progress href="Progress.html" />
+              <tool href="Tool.html" />
+              <toolrunner href="ToolRunner.html">
+                <run href="#run(org.apache.hadoop.util.Tool, java.lang.String[])" />
+              </toolrunner>
+            </util>
+          </hadoop>
+        </apache>
+      </org>
+    </api>
   </external-refs>
  
 </site>



Mime
View raw message