lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Max Pfingsthorn" <>
Subject RE: Indexing multiple languages
Date Fri, 03 Jun 2005 13:06:51 GMT

You could use the ParalellReader for this if you have all documents in all languages. Then,
the metadata fields can be stored in one of the field data files, while each languages gets
its own field data file...


-----Original Message-----
From: Paul Libbrecht []
Sent: Friday, June 03, 2005 14:23
Subject: Re: Indexing multiple languages


Le 2 juin 05, à 21:42, Tansley, Robert a écrit :
> It seems that there are even more options --
> 4/ One index, with a separate Lucene document for each (item,language) 
> combination, with one field that specifies the language
> 5/ One index, one Lucene document per item, with field names that 
> include the language (e.g. title_en, title_cn)
> I quite like 4, because you can search with no language constraint, or 
> with one as Paul suggests below.

You can in both cases. In the second, you need to expand the query (ie 
searching for carrot would search text_en:carrot or text_cn:carrot", 
which, I think is fair as long as you don't a two kilometer's list of 

> However, some "non language-specific" data might need to be repeated 
> (e.g. dates), unless we had an extra Lucene document for all that.  I 
> wonder what the various pros and cons in terms of index size and 
> performance would be in each case?  I really don't have enough 
> knowledge of Lucene to have any idea...

If you separate the indices you won't, as far as I know, be able to 
query simultaneously (e.g. some text which, as well, is new 


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message