nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Danicela nutch" <Danicela-nu...@mail.com>
Subject Re : Re: Too few parsed pages
Date Mon, 06 Feb 2012 16:03:52 GMT
I don't understand, what should I do ?

----- Message d'origine -----
De : Markus Jelsma
Envoyés : 06.02.12 16:45
À : user@nutch.apache.org
Objet : Re: Too few parsed pages

 Likely db_not_modified records, they are not parsed. On Monday 06 February 2012 16:44:25
Danicela nutch wrote: > Hi, > > When I make a readseg -list on a segment, I have
60.000 'FETCHED' pages, > but only 10.000 'PARSED' pages. One month ago, I had something
like 40.000 > 'PARSED' pages in my segments, and this number reduced a little every day.
> If I look in the logs of the segments, I can find approximately these > numbers if
I count the number of treated pages. But I find nothing strange > in the parse that could
explain the fact I have so few pages in the end. > > What can explain the fact I have
so few pages which are parsed ? > > Thanks. -- Markus Jelsma - CTO - Openindex

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message