hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "GettingStartedWithHadoop" by SameerParanjpye
Date Mon, 18 Sep 2006 00:31:41 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by SameerParanjpye:
http://wiki.apache.org/lucene-hadoop/GettingStartedWithHadoop

------------------------------------------------------------------------------
  = Downloading and installing Hadoop =
- Hadoop can be downloaded from [http://www.apache.org/dyn/closer.cgi/lucene/hadoop/ here].
You may also download a nightly build from [http://cvs.apache.org/dist/lucene/hadoop/nightly/
here] or check out the code from [http://lucene.apache.org/hadoop/version_control.html subversion]
and build it with [http://ant.apache.org Ant]. Select a directory to install Hadoop under
(let's call it <installdir>) and untar the tarball in that directory. This will create
a directory called hadoop-<version> under <installdir>. All scripts and tools
needed to run Hadoop are present in the directory hadoop-<version>/bin. This directory
will subsequently be referred to as "hadoop/bin" in this document.
+ 
+ Hadoop can be downloaded from [http://www.apache.org/dyn/closer.cgi/lucene/hadoop/ here].
You may also 
+ download a nightly build from [http://cvs.apache.org/dist/lucene/hadoop/nightly/ here] or
check out the 
+ code from [http://lucene.apache.org/hadoop/version_control.html subversion] and build it
with 
+ [http://ant.apache.org Ant]. Select a directory to install Hadoop under (let's call it ''hadoop-install'')
+ and untar the tarball in that directory. If you downloaded version ''<ver>'' of Hadoop,
untarring will
+ create a directory called ''hadoop-<ver>'' in the ''hadoop-install'' directory. All
scripts and tools 
+ used to run Hadoop will be present in the directory ''hadoop-<ver>/bin''. All configuration
files for
+ Hadoop will be present in the directory ''hadoop-<ver>/conf''. These directories will
subsequently be
+ referred to as ''hadoop/bin'' and ''hadoop/conf'' respectively in this document.
+ 
+ == Startup scripts ==
+ 
+ The ''hadoop/bin'' directory contains some scripts used to launch Hadoop DFS and Hadoop
Map/Reduce daemons. These
+ are:
+ 
+  * ''start-all.sh'' - Starts all Hadoop daemons, the namenode, datanodes, the jobtracker
and tasktrackers.
+  * ''stop-all.sh'' - Stops all Hadoop daemons.
+  * ''start-mapred.sh'' - Starts the Hadoop Map/Reduce daemons, the jobtracker and tasktrackers.
+  * ''stop-mapred.sh'' - Stops the Hadoop Map/Reduce daemons.
+  * ''start-dfs.sh'' - Starts the Hadoop DFS daemons, the namenode and datanodes.
+  * ''stop-dfs.sh'' - Stops the Hadoop DFS daemons.
+ 
+ == Configuration files ==
+ 
+ The ''hadoop/conf'' directory contains some configuration files for Hadoop. These are:
+ 
+  * ''hadoop-env.sh'' - This file contains some environment variable settings used by Hadoop.
You can use these to affect some aspects of Hadoop daemon behavior, such as where log files
are stored, the maximum amount of heap used etc. The only variable you should need to change
in this file is JAVA_HOME, which specifies the path to the Java installation used by Hadoop.
+  * ''slaves'' - This file lists the hosts, one per line, where the Hadoop slave daemons
(datanodes and tasktrackers) will run. By default this contains the single entry ''localhost''
+  * ''hadoop-default.xml'' - This file contains generic default settings for Hadoop daemons
and Map/Reduce jobs. '''Do not modify this file.'''
+  * ''mapred-default.xml'' - This file contains site specific settings for the Hadoop Map/Reduce
daemons and jobs. The file is empty by default. Putting configuration properties in this file
will override Map/Reduce settings in the ''hadoop-default.xml'' file. Use this file to tailor
the behavior of Map/Reduce on your site.
+  * ''hadoop-site.xml'' - This file contains site specific settings for all Hadoop daemons
and Map/Reduce jobs. This file is empty by default. Settings in this file override the settings
in ''hadoop-default.xml'' and ''mapred-default.xml''. This file should contain settings that
must be respected by all servers and clients in a Hadoop installation, for instance, the location
of the namenode and the jobtracker.
+ 
+ More details on configuration can be found on the HowToConfigure page.
  
  = Starting Hadoop using Hadoop scripts =
  This section explains how to set up a Hadoop cluster running Hadoop DFS and Hadoop Mapreduce.
The startup scripts are in hadoop/bin. The file that contains all the slave nodes that would
join the DFS and map reduce cluster is the slaves file in hadoop/conf. Edit the slaves file
to add nodes to your cluster. You need to edit the slaves file only on the machines you plan
to run the Jobtracker and Namenode on. In case you want to run a single node cluster you do
not have to edit the slaves file.  Next edit the file hadoop-env.sh in the hadoop/conf directory.
Make sure JAVA_HOME is set correctly. You can change the other environment variables as per
your requirements. HADOOP_HOME is automatically determined depending on where you run your
hadoop scripts from.

Mime
View raw message