nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject Re: Too few parsed pages
Date Mon, 06 Feb 2012 15:45:41 GMT
Likely db_not_modified records, they are not parsed.

On Monday 06 February 2012 16:44:25 Danicela nutch wrote:
> Hi,
> 
>  When I make a readseg -list on a segment, I have 60.000 'FETCHED' pages,
> but only 10.000 'PARSED' pages. One month ago, I had something like 40.000
> 'PARSED' pages in my segments, and this number reduced a little every day.
> If I look in the logs of the segments, I can find approximately these
> numbers if I count the number of treated pages. But I find nothing strange
> in the parse that could explain the fact I have so few pages in the end.
> 
>  What can explain the fact I have so few pages which are parsed ?
> 
>  Thanks.

-- 
Markus Jelsma - CTO - Openindex

Mime
View raw message