lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Indexing multiple languages
Date Fri, 03 Jun 2005 16:23:56 GMT
Tansley, Robert wrote:
> What if we're trying to index multiple languages in the same site?  Is
> it best to have:
> 
> 1/ one index for all languages
> 2/ one index for all languages, with an extra language field so searches
> can be constrained to a particular language
> 3/ separate indices for each language?

I'd use 2/.  In particular, use the same field for the content, title, 
etc., even if when produced by different analyzers.  Have a "lang" field 
that names the language of the document.

At query time, use an analyzer selected by the user's environment (e.g., 
HTTP lang header).  If folks are getting false positives, where a term 
in another language that means something different is matching their 
query, they can use a "lang" pulldown to remove documents from other 
languages, implemented as a Lucene Filter.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message