nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 黄淑明 <>
Subject Does anybody knows why nutch 1.1 parse data both in "Fetcher.output" and "ParseSegment"
Date Tue, 18 Jan 2011 09:09:19 GMT
I use Nutch-1.1.
I want to add a plugin to parse webpage and store it in my database, I add a
class implements HtmlParseFilter,
but found that even when the page is redirect to another
page, HtmlParseFilter still get called .
I thought ParseSegment.parse would be better, but why nutch1.1 use parse
fuction both in Fether.output method and ParseSegment.parse?

thanks in advance.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message