nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Zinman <zze...@gmail.com>
Subject Re: Combining parsed data from two sources before indexing
Date Wed, 09 Sep 2009 04:13:29 GMT
Hi,

I'm also quite interested in this feature.

I want to combine information from two different pages and I don't know
which one will be downloaded first.

Only when both are downloaded I want to process them.

Thanks,
Eran

On Wed, Sep 9, 2009 at 12:51 AM, Max S <maximillian009@googlemail.com>wrote:

> Hi all,
>
> How can I combine parsed data from two sources before indexing them? At the
> moment, the way I see it (correct me if I'm wrong), each page (fetched) is
> treated as a separate document. These documents are related only by their
> inlinks / outlinks.
>
> What if there are contents that have been divided into a few web page. How
> do combine them together before indexing it?
>
> Regards
> Max S
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message