nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MilleBii <>
Subject Filtering ParseSegment
Date Thu, 10 Dec 2009 22:06:37 GMT
I'm thinking of develop a special ParseSegment that will filter content out
in the following way:

My scoring-plugin determines which page content to keep or drop.

So I intend to store via metadata in the scoring-plugin
 with parse.getData().getContentMeta().set ("KEY_KEEP", true/false);

and in
 instead of line 119       output.collect(url, new ParseImpl(new

I plan to make a conditional  ParseText(null) when
parse.getData().getContentMeta().get ("KEY_KEEP")==false

Before I start doing/testing/verifying, I'd like to check if I'm missing
something and I understand correctly the mechanics


View raw message