hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "GettingStartedWithHadoop" by SameerParanjpye
Date Wed, 20 Sep 2006 07:22:36 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by SameerParanjpye:
http://wiki.apache.org/lucene-hadoop/GettingStartedWithHadoop

------------------------------------------------------------------------------
    * {{{mapred.local.dir}}}
  
  === Formatting the Namenode ===
- 
  The first step to starting up your Hadoop installation is formatting the filesystem. You
need to do this the first time you set up a Hadoop installation. '''Do not''' format a running
filesystem, this will cause all your data to be erased. To format the filesystem, run the
command: [[BR]] {{{% $HADOOP_INSTALL/hadoop/bin/hadoop namenode -format}}}
  
  === Starting a Single node cluster ===
@@ -59, +58 @@

  === Stopping a Single node cluster ===
  Run the command [[BR]] {{{% $HADOOP_INSTALL/hadoop/bin/stop-all.sh}}} [[BR]] to stop all
the daemons running on your machine.
  
+ === Separating Configuration from Installation ===
+ In the example described above, the configuration files used by the Hadoop cluster all lie
in the Hadoop installation. This can become cumbersome when upgrading to a new release since
all custom config has to be re-created in the new installation. It is possible to separate
the config from the install. To do so, select a 
+ directory to house Hadoop configuration (let's say {{{/foo/bar/hadoop-config}}}. Copy the
{{{hadoop-site.xml, slaves}}} and {{{hadoop-env.sh}}} files to this directory. You can either
set the {{{HADOOP_CONF_DIR}}} environment variable to refer to this directory or pass it directly
to the Hadoop scripts with the {{{--config}}} option.
+ In this case, the cluster start and stop commands specified in the above two sub-sections
become [[BR]] {{{% $HADOOP_INSTALL/hadoop/bin/start-all.sh --config /foo/bar/hadoop-config}}}
and [[BR]] {{{% $HADOOP_INSTALL/hadoop/bin/stop-all.sh --config /foo/bar/hadoop-config}}}.
[[BR]] Only the absolute path to the config directory should be passed to the scripts.
+ 
- === Starting up a real cluster ===
+ === Starting up a larger cluster ===
-  * After formatting the namenode run bin/start-dfs.sh on the Namenode. This will bring up
the dfs with Namenode running on the machine you ran the command on and Datanodes  on the
machines listed in the slaves file mentioned above.
+ 
+  * Ensure that the Hadoop package is accessible from the same path on all nodes that are
to be included in the cluster. If you have separated configuration from the install then ensure
that the config directory is also accessible the same way.
+  * Populate the {{{slaves}}} file with the nodes to be included in the cluster. One node
per line.
+  * Follow the steps in the ''Basic Configuration'' section above.
+  * Format the Namenode
+  * Run the command {{{% $HADOOP_INSTALL/hadoop/bin/start-dfs.sh}}} on the node you want
the Namenode to run on. This will bring up HDFS with the Namenode running on the machine you
ran the command on and Datanodes  on the machines listed in the slaves file mentioned above.
-  * Run bin/start-mapred.sh on the machine you plan to run the Jobtracker on. This will bring
up the map reduce cluster with Jobtracker running on the machine you ran the command on and
Tasktrackers running on machines listed in the slaves file.
+  * Run the command {{{% $HADOOP_INSTALL/hadoop/bin/start-mapred.sh}}} on the machine you
plan to run the Jobtracker on. This will bring up the Map/Reduce cluster with Jobtracker running
on the machine you ran the command on and Tasktrackers running on machines listed in the slaves
file.
+  * The above two commands can also be executed with a {{{--config}}} option.
-  * In case you have not set the HADOOP_CONF_DIR variable, you can use bin/start-mapred.sh
(bin/start-dfs.sh) --config configure_directory.
-  * Try executing bin/hadoop dfs -lsr / to see if it is working.
  
  === Stopping the cluster ===
-  * You can stop the cluster by running bin/stop-mapred.sh and then bin/stop-dfs.sh on your
Jobtracker and Namenode respectively. You can specify the configure directory by using the
--config option.
+  * The cluster can be stopped by running {{{% $HADOOP_INSTALL/hadoop/bin/stop-mapred.sh}}}
and then {{{% $HADOOP_INSTALL/hadoop/bin/stop-dfs.sh}}} on your Jobtracker and Namenode respectively.
These commands also accept the {{{--config}}} option.
  

Mime
View raw message