nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From muraliweb <mur...@live.com>
Subject Re: Nutch crawl does not capture pages of lower depth
Date Thu, 03 Sep 2009 08:29:07 GMT

Managed to find out the problem.
The property indexer.max.tokens in nutch-default.xml was causing the top
level pages to be skipped.
After changing the value to something like 30000, the crawler was able to
pick up all the pages as per the configured depth.



muraliweb wrote:
> 
> Nutch crawl does not pick up pages at depth 1 and 2 when its configured
> for depth 3.
> When the crawl is configured at depth 2 it does not pickup the homepage.
> Can anyone please help
> thanks in advance
> murali
> 

-- 
View this message in context: http://www.nabble.com/Nutch-crawl-does-not-capture-pages-of-lower-depth-tp25084017p25271774.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Mime
View raw message