hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "QuickStart" by masukomi
Date Sun, 19 Aug 2007 08:36:52 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by masukomi:
http://wiki.apache.org/lucene-hadoop/QuickStart

The comment on the change is:
Added the Fully-distributed operation section, but it's still untested

------------------------------------------------------------------------------
  
  '''Mac Users''' You'll probably need to install something like [http://www.sshkeychain.org/
SSHKeychain] or [http://www.mothersruin.com/software/SSHChain/ SSHChain] (no idea which is
better) to be able to ssh to a computer without having to enter the password every time. This
is due to the fact that ssh-agent was designed for X11 systems and OS X isn't an X11 system.
  
- == FINISH ME ==
- Will do. Or, maybe you will...
+ === Bootstrapping ===
+ A new distributed filesystem must be formatted with the following command, run on the master
node:
  
+ {{{bin/hadoop namenode -format}}}
+ 
+ You should see a quick series of `STARTUP_MSG`s and a `SHUTDOWN_MSG`
+ 
+ 
+ Then start up the Hadoop daemon with 
+ 
+ {{{bin/start-all.sh}}}
+ 
+ It should notify you that it's starting the `namenode`, `datanode`, `secondarynamenode`,
and `jobtracker`. 
+ 
+ Input files are copied into the distributed filesystem as follows: 
+ {{{bin/hadoop dfs -put <localsrc> <dst>}}}
+ For more details just type `bin/hadoop dfs` with no options.
+ 
+ == Stage 3: Fully-distributed operation ==
+ 
+ Distributed operation is just like the pseudo-distributed operation described above, except:
+ 
+  1. Specify hostname or IP address of the master server in the values for `fs.default.name`
and `mapred.job.tracker` in `conf/hadoop-site.xml`. These are specified as `host:port` pairs.
+  2. Specify directories for `dfs.name.dir` and `dfs.data.dir` in `conf/hadoop-site.xml`.
These are used to hold distributed filesystem data on the master node and slave nodes respectively.
Note that `dfs.data.dir` may contain a space- or comma-separated list of directory names,
so that data may be stored on multiple devices.
+  3. Specify `mapred.local.dir` in `conf/hadoop-site.xml`. This determines where temporary
MapReduce data is written. It also may be a list of directories.
+  4. Specify `mapred.map.tasks` and `mapred.reduce.tasks` in `conf/mapred-default.xml`. As
a rule of thumb, use 10x the number of slave processors for `mapred.map.tasks`, and 2x the
number of slave processors for `mapred.reduce.tasks`.
+  5. List all slave hostnames or IP addresses in your `conf/slaves` file, one per line.
+ 

Mime
View raw message