hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "GettingStartedWithHadoop" by mahadevkonar
Date Fri, 25 Aug 2006 19:24:00 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by mahadevkonar:
http://wiki.apache.org/lucene-hadoop/GettingStartedWithHadoop

------------------------------------------------------------------------------
- = Setting Up a Single node Hadoop cluster =
+ = Starting Hadoop using Hadoop scripts =
+ This section explains how to set up a Hadoop cluster running Hadoop DFS and Hadoop Mapreduce.
The startup scripts are in hadoop/bin. The file that contains all the slave nodes that would
join the DFS and map reduce cluster is the slaves file in hadoop/conf. Edit the slaves file
to add nodes to your cluster. You need to edit the slaves file only on the machines you plan
to run the Jobtracker and Namenode on. In case you want to run a single node cluster you do
not have to edit the slaves file.  Next edit the file hadoop-env.sh in the hadoop/conf directory.
Make sure JAVA_HOME is set correctly. You can change the other environment variables as per
your requirements. HADOOP_HOME is automatically determined depending on where you run your
hadoop scripts from.
+ 
+ 
  == Environment Variables ==
-  * Set the environment variable HADOOP_CONF_DIR to the configure directory. For more information
on how to configure Hadoop, take a look at HowToConfigure section.
- == Starting up a single node cluster ==
-   * Change the configuration property in hadoop-default.xml in the configure directory to
localhost:port. Choose a free port for your Namenode to run on.
+  * The only environment variable that you may need to specify is HADOOP_CONF_DIR. Set this
variable to your configure directory which contains hadoop-site.xml, hadoop-env.sh and the
slaves file. Set this environment variable on all the machines you plan to run Hadoop on.
In case you are running bash, you can set it in .bashrc and in case of csh set it in .cshrc.
For more information on how to configure Hadoop, take a look at HowToConfigure section.
+  * You can get rid of this environment variable by specifying the configure directory as
a --config option for the scripts.
+ 
+ == Configuration Parameters ==
+ * Change hadoop-site.xml in the configure directory to change the default properties. Take
a look at hadoop-default.xml to see how to add properties to hadoop-site.xml. The properties
that you would mostly change are the ports and hosts for Namenode and JobTracker. You should
propagate these changes to all the nodes in your cluster.
+   
- === Formatting the Namenode ===
+ == Formatting the Namenode ==
   * You are required to format the Namenode for your first installation. This is true only
for your first installation. Do not format a Namenode which was already running Hadoop. It
will clear up your DFS. Run bin/hadoop namenode -format to format your Namenode.
+ 
- === Starting up the cluster ===
+ === Starting a Single node cluster ===
   * Run bin/start-all.sh. This will startup a Namenode, Datanode, Jobtracker and a Tasktracker
on your machine.
- === Stopping the cluster ===
+ === Stopping a Single node cluster ===
   * Run bin/stop-all.sh to stop all the daemons running on your machine.
  
- = Setting Up a Cluster using Hadoop scripts =
- This section explains how to set up a Hadoop cluster running Hadoop DFS and Hadoop Mapreduce.
The startup scripts are in hadoop/bin. The file that contains all the slave nodes that would
join the DFS and map reduce cluster is the slaves file in hadoop/conf. Edit the slaves file
to add nodes to your cluster. You need to edit the slaves file only on the machines you plan
to run the Jobtracker and Namenode on. Next edit the file hadoop-env.sh in the hadoop/conf
directory. Make sure JAVA_HOME is set correctly. You can change the other environment variables
as per your requirements. HADOOP_HOME is automatically determined depending on where you run
your hadoop scripts from.
- 
- == Environment Variables ==
-  * The only environment variable that you may need to specify is HADOOP_CONF_DIR. Set this
variable to your configure directory which contains hadoop-site.xml, hadoop-env.sh. Set this
environment variable on all the machines you plan to run Hadoop on. In case you are running
bash, you can set it in .bashrc and in case of csh set it in .cshrc. 
-  * You can get rid of this environment variable by specifying the configure directory as
a --config option for the scripts.
- 
- == Starting up Hadoop ==
- === Formatting the Namenode ===
-  * You are required to format the Namenode for your first installation. This is true only
for your first installation. Do not format a Namenode which was already running Hadoop. It
will clear up your DFS. Run bin/hadoop namenode -format on the node you plan to run as the
Namenode.
- 
- === Starting up the cluster ===
+ === Starting up a real cluster ===
   * After formatting the namenode run bin/start-dfs.sh on the Namenode. This will bring up
the dfs with Namenode running on the machine you ran the command on and Datanodes  on the
machines listed in the slaves file mentioned above.
   * Run bin/start-mapred.sh on the machine you plan to run the Jobtracker on. This will bring
up the map reduce cluster with Jobtracker running on the machine you ran the command on and
Tasktrackers running on machines listed in the slaves file. 
   * In case you have not set the HADOOP_CONF_DIR variable, you can use bin/start-mapred.sh
(bin/start-dfs.sh) --config configure_directory.
   * Try executing bin/hadoop dfs -lsr / to see if it is working.
  
  === Stopping the cluster ===
-  * You can stop the cluster by running bin/stop-mapred.sh and then bin/stop-dfs.sh. You
can specify the configure directory by using the --config option.
+  * You can stop the cluster by running bin/stop-mapred.sh and then bin/stop-dfs.sh on your
Jobtracker and Namenode respectively. You can specify the configure directory by using the
--config option.
  

Mime
View raw message