hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "GettingStartedWithHadoop" by mahadevkonar
Date Mon, 28 Aug 2006 18:55:35 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by mahadevkonar:

+ = Downloading and installing Hadoop =
+ Hadoop can be downloaded from [http://www.apache.org/dyn/closer.cgi/lucene/hadoop/ Download].
To install Hadoop untar the tar file in your install directory. So the directory structure
would like installdir/hadoop-[version]/. All the scripts to run Hadoop are in hadoop-[version]/bin.
I will refer to this directory as hadoop/bin from now on.
  = Starting Hadoop using Hadoop scripts =
  This section explains how to set up a Hadoop cluster running Hadoop DFS and Hadoop Mapreduce.
The startup scripts are in hadoop/bin. The file that contains all the slave nodes that would
join the DFS and map reduce cluster is the slaves file in hadoop/conf. Edit the slaves file
to add nodes to your cluster. You need to edit the slaves file only on the machines you plan
to run the Jobtracker and Namenode on. In case you want to run a single node cluster you do
not have to edit the slaves file.  Next edit the file hadoop-env.sh in the hadoop/conf directory.
Make sure JAVA_HOME is set correctly. You can change the other environment variables as per
your requirements. HADOOP_HOME is automatically determined depending on where you run your
hadoop scripts from.
  == Environment Variables ==
   * The only environment variable that you may need to specify is HADOOP_CONF_DIR. Set this
variable to your configure directory which contains hadoop-site.xml, hadoop-env.sh and the
slaves file. Set this environment variable on all the machines you plan to run Hadoop on.
In case you are running bash, you can set it in .bashrc and in case of csh set it in .cshrc.
For more information on how to configure Hadoop, take a look at HowToConfigure section.
-  * You can get rid of this environment variable by specifying the configure directory as
a --config option for the scripts.
+  * You can get rid of this environment variable by specifying the configure directory as
a --config option for the scripts. All the hadoop scripts take a --config argument which is
the configure directory.
  == Configuration Parameters ==
  * Change hadoop-site.xml in the configure directory to change the default properties. Take
a look at hadoop-default.xml to see how to add properties to hadoop-site.xml. The properties
that you would mostly change are the ports and hosts for Namenode and Jobtracker. You should
propagate these changes to all the nodes in your cluster.

View raw message