lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Melanie Langlois" <>
Subject indexing rss feeds in multiple languages
Date Thu, 22 Mar 2007 06:03:03 GMT


I saw that there are many post on the mailing list about indexing in multiple language, so
I will try to not post duplicate question. In my case, I want to index rss feeds, so one feed
contains several items in different languages, and some common data for all the items (date,
source..).  After reading the different posts, I think I will create a document per item,
index them in the same index using each time a language specific analyzer, and store lang
field for specific search. But I'm wondering how I should handle the common fields, it seems
I have two options:

1 : store the common data in each item. What happen if duplicate information are entered,
are they duplicate in the index ?


2 : create a separate document for the common data. In this case I will need to link these
data to all underlying items storing some ids. The issue is that I would need to search the
index twice if the search is done only per date, because I would need to retrieve the items


Thank in advance for your help.




  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message