nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: RSS-fecter and index individul-how can i realize this function
Date Tue, 06 Feb 2007 21:43:07 GMT
Renaud Richardet wrote:
> The usecase is that you index RSS-feeds, but your users can search each 
> feed-entry as a single document. Does it makes sense?

But each feed item also contains a link whose content will be indexed 
and that's generally a superset of the item.  So should there be two 
urls indexed per item?  In many cases, the best thing to do is to index 
only the linked page, not the feed item at all.  In some (rare?) cases, 
there might be items without a link, whose only content is directly in 
the feed, or where the content in the feed is complementary to that in 
the linked page.  In these cases it might be useful to combine the two 
(the feed item and the linked content), indexing both.  The proposed 
change might permit that.  Is that the case you're concerned about?

Doug

Mime
View raw message