hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Simpson" <Scott.Simp...@computer.org>
Subject Confusion about the Hadoop conf/slaves file
Date Fri, 07 Apr 2006 15:18:31 GMT
It seems the Hadoop "conf/slaves" file designates 2 things:

1. Where Hadoop should be running (which must be on the search nodes and the
crawl nodes at least with Nutch).
2. Which machines are used for a MapReduce operation.

Suppose I want to run Nutch 0.8 searches on separate machines than I crawl
on. Is there a way to separate this so my crawling operation (MapReduce)
doesn't happen on my search engine machines?

Also, is there any way to specify the placement of the distributed data on
the machines? That is, what if I want all my distributed data on different
nodes than I run searches on?

View raw message