hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Wiki
Date Thu, 26 Oct 2006 18:26:32 GMT
Grant Ingersoll wrote:
> I know Hadoop is separate from Nutch, but I found the Hadoop Tutorial 
> (http://wiki.apache.org/nutch/NutchHadoopTutorial) on the Nutch Wiki to 
> be quite informative in filling in some gray areas for me on how to get 
> Hadoop working, so I was wondering if it is all right to link to this 
> one, or should some effort be made (by me) to extract the relevant 
> Hadoop pieces from this link and put them in a new page on the Hadoop 
> wiki?  I know some users may be confused by the talk of Nutch in it.

That does look like a good tutorial, and I don't have a problem with 
linking to it from the Hadoop wiki.  Or, if you're feeling energetic, 
copy it to the Hadoop wiki & remove the Nutch-specifics.  Then you might 
make the Nutch wiki page link to your page in Hadoop's wiki.

A few notes, however:

1. mapred.map.tasks and mapred.reduce.tasks should be in 
mapred-default.xml, not in hadoop-site.xml.  Otherwise jobs cannot 
override these.  Nutch sometimes does override these.

2. Config files now support variables, so that setting just 
hadoop.tmp.dir in hadoop-site.xml is usually sufficient, since all other 
directories in the defaults are relative to this.

3. When setting HADOOP_MASTER it's usually advisable to set 
HADOOP_SLAVE_SLEEP=1, or else the rsyncs can fail.

Doug

Mime
View raw message