hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "QuickStart" by DavidBiesack
Date Fri, 11 Feb 2011 16:57:28 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "QuickStart" page has been changed by DavidBiesack.
The comment on this change is: Update tage 3: Fully-distributed operation to reflect 0.21.0.
Also fix the URL of the Overview.
http://wiki.apache.org/hadoop/QuickStart?action=diff&rev1=25&rev2=26

--------------------------------------------------

  
  If you want to work exclusively with Hadoop code directly from Apache, the rest of this
document can help you get started quickly from there.
  
- Based on the docs found at the following link, but modified to work with the current distribution:
- http://hadoop.apache.org/core/api/overview-summary.html#overview_description
+ The instructions below are
+ based on the docs found at the [[http://hadoop.apache.org/common/docs/current/api/overview-summary.html
| Hadoop Overview]].
  
- Please note this was last updated to match svn version 605291. Things may have changed since
then. If they have, please update this page.
+ Please note the instructions were last updated to match Release 0.21.0. Things may have
changed since then. If they have, please update this page.
  
  == Requirements ==
   * Java 1.5.X
@@ -25, +25 @@

   * rsync
  
  == Preparatory Steps ==
- Dowload
+ Download
  
  '''Release Versions:'''
  can be found here http://hadoop.apache.org/core/releases.html
@@ -60, +60 @@

  
  {{{
  cat output/*
- 
  1	security.task.umbilical.protocol.acl
  1	security.refresh.policy.protocol.acl
  1	security.namenode.protocol.acl
@@ -138, +137 @@

  
  == Stage 3: Fully-distributed operation ==
  
- Distributed operation is just like the pseudo-distributed operation described above, except:
+ Fully distributed operation is just like the pseudo-distributed operation described above,
except, specify:
  
-  1. Specify hostname or IP address of the master server in the values for `fs.default.name`
and `mapred.job.tracker` in `conf/hadoop-site.xml`. These are specified as `host:port` pairs.
-  2. Specify directories for `dfs.name.dir` and `dfs.data.dir` in `conf/hadoop-site.xml`.
These are used to hold distributed filesystem data on the master node and slave nodes respectively.
Note that `dfs.data.dir` may contain a space- or comma-separated list of directory names,
so that data may be stored on multiple devices.
+  1. The hostname or IP address of your master server in the value for fs.default.name, as
hdfs://master.example.com/ in conf/core-site.xml.
+  1. The host and port of the your master server in the value of mapred.job.tracker as master.example.com:port
in conf/mapred-site.xml.
+  1. Directories for dfs.name.dir and dfs.data.dir in conf/hdfs-site.xml. These are local
directories used to hold distributed filesystem data on the master node and slave nodes respectively.
Note that dfs.data.dir may contain a space- or comma-separated list of directory names, so
that data may be stored on multiple local devices.
-  3. Specify `mapred.local.dir` in `conf/hadoop-site.xml`. This determines where temporary
MapReduce data is written. It also may be a list of directories.
+  1. mapred.local.dir in conf/mapred-site.xml, the local directory where temporary MapReduce
data is stored. It also may be a list of directories.
-  4. Specify `mapred.map.tasks` and `mapred.reduce.tasks` in `conf/mapred-default.xml`. As
a rule of thumb, use 10x the number of slave processors for `mapred.map.tasks`, and 2x the
number of slave processors for `mapred.reduce.tasks`.
-  5. List all slave hostnames or IP addresses in your `conf/slaves` file, one per line.
+  1. mapred.map.tasks and mapred.reduce.tasks in conf/mapred-site.xml. As a rule of thumb,
use 10x the number of slave processors for mapred.map.tasks, and 2x the number of slave processors
for mapred.reduce.tasks.
+  1. Finally, list all slave hostnames or IP addresses in your conf/slaves file, one per
line. Then format your filesystem and start your cluster on your master node, as above.
  
+ (See [[http://hadoop.apache.org/common/docs/current/api/overview-summary.html | Hadoop Overview]]
for details)
+ 

Mime
View raw message