Is this on Hadoop? Are the update and generate jobs doing filtering and
normalizing? That's usually the problem.
On Tuesday 06 December 2011 11:33:49 Danicela nutch wrote:
> Hi,
>
> I have the impression that something is going wrong in my nutch cycle.
>
> 4 millions pages
> 5.7 Gb crawldb
>
> One generate lasts 4:46 and gets 15 minutes more each segment (90 000
> pages produced for each segment) One update lasts 7h36 and gets 45 minutes
> more each segment.
>
> Are these times normal ?
>
> If not, what can I do to reduce these times ?
>
> Thanks.
--
Markus Jelsma - CTO - Openindex
|