hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Confusion about the Hadoop conf/slaves file
Date Tue, 11 Apr 2006 17:13:05 GMT
Scott Simpson wrote:
> Excuse my ignorance on this issue. Say I have 5 machines in my Hadoop
> cluster and I only list two of them in the configuration file when I do a
> "fetch" or a "generate". Won't this just store the data on the two nodes
> since that is all I've listed for my crawling machines? I'm trying to crawl
> on two but store my data across all five.

So you want to use different sets of machines for dfs than for 
MapReduce?  An easy way to achieve this is to install Hadoop separately 
and start dfs only there ('bin/hadoop-daemon.sh start namenode; 
bin/hadoop-daemons.sh start datanode', or use the new bin/start-dfs.sh 
script).  Then, in your Nutch installation, start only the MapReduce 
daemons, using a different conf/slaves file ('bin/hadoop-daemon.sh start 
jobtracker; bin/hadoop-daemons.sh start tasktracker', or use the new 
bin/start-mapred.sh script).  Just make sure that your Nutch 
installation is configured to talk to the same namenode as your Hadoop 
installation, and make sure that you don't run bin/start-all.sh from 
either installation.  Does that make sense?


View raw message