nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Zinman <>
Subject Re: Combining parsed data from two sources before indexing
Date Wed, 09 Sep 2009 04:13:29 GMT

I'm also quite interested in this feature.

I want to combine information from two different pages and I don't know
which one will be downloaded first.

Only when both are downloaded I want to process them.


On Wed, Sep 9, 2009 at 12:51 AM, Max S <>wrote:

> Hi all,
> How can I combine parsed data from two sources before indexing them? At the
> moment, the way I see it (correct me if I'm wrong), each page (fetched) is
> treated as a separate document. These documents are related only by their
> inlinks / outlinks.
> What if there are contents that have been divided into a few web page. How
> do combine them together before indexing it?
> Regards
> Max S

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message