nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Danicela nutch" <>
Subject Too few parsed pages
Date Mon, 06 Feb 2012 15:44:25 GMT

 When I make a readseg -list on a segment, I have 60.000 'FETCHED' pages, but only 10.000
'PARSED' pages. One month ago, I had something like 40.000 'PARSED' pages in my segments,
and this number reduced a little every day. If I look in the logs of the segments, I can find
approximately these numbers if I count the number of treated pages. But I find nothing strange
in the parse that could explain the fact I have so few pages in the end.

 What can explain the fact I have so few pages which are parsed ?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message