nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject Re: generate/update times and crawldb size
Date Tue, 06 Dec 2011 11:11:05 GMT
Is this on Hadoop? Are the update and generate jobs doing filtering and 
normalizing? That's usually the problem.

On Tuesday 06 December 2011 11:33:49 Danicela nutch wrote:
> Hi,
> 
>  I have the impression that something is going wrong in my nutch cycle.
> 
>  4 millions pages
>  5.7 Gb crawldb
> 
>  One generate lasts 4:46 and gets 15 minutes more each segment (90 000
> pages produced for each segment) One update lasts 7h36 and gets 45 minutes
> more each segment.
> 
>  Are these times normal ?
> 
>  If not, what can I do to reduce these times ?
> 
>  Thanks.

-- 
Markus Jelsma - CTO - Openindex

Mime
View raw message