nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Max S <maximillian...@googlemail.com>
Subject Combining parsed data from two sources before indexing
Date Tue, 08 Sep 2009 21:51:27 GMT
Hi all,

How can I combine parsed data from two sources before indexing them? At the
moment, the way I see it (correct me if I'm wrong), each page (fetched) is
treated as a separate document. These documents are related only by their
inlinks / outlinks. 

What if there are contents that have been divided into a few web page. How
do combine them together before indexing it? 

Regards
Max S


Mime
View raw message