lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Libbrecht <p...@activemath.org>
Subject Re: Indexing multiple languages
Date Fri, 03 Jun 2005 12:22:38 GMT
Robert,

Le 2 juin 05, à 21:42, Tansley, Robert a écrit :
> It seems that there are even more options --
> 4/ One index, with a separate Lucene document for each (item,language) 
> combination, with one field that specifies the language
> 5/ One index, one Lucene document per item, with field names that 
> include the language (e.g. title_en, title_cn)
> I quite like 4, because you can search with no language constraint, or 
> with one as Paul suggests below.

You can in both cases. In the second, you need to expand the query (ie 
searching for carrot would search text_en:carrot or text_cn:carrot", 
which, I think is fair as long as you don't a two kilometer's list of 
languages.

> However, some "non language-specific" data might need to be repeated 
> (e.g. dates), unless we had an extra Lucene document for all that.  I 
> wonder what the various pros and cons in terms of index size and 
> performance would be in each case?  I really don't have enough 
> knowledge of Lucene to have any idea...

If you separate the indices you won't, as far as I know, be able to 
query simultaneously (e.g. some text which, as well, is new 
enough....).

paul


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message