lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <mike.kl...@gmail.com>
Subject Re: multi-language searching with Solr
Date Tue, 06 May 2008 22:43:41 GMT
On 5-May-08, at 1:28 PM, Eli K wrote:

> Wouldn't this impact both indexing and search performance and the size
> of the index?
> It is also probable that I will have more then one free text fields
> later on and with at least 20 languages this approach does not seem
> very manageable.  Are there other options for making this work with
> stemming?

If you want stemming, then you have to execute one query per language  
anyway, since the stemming will be different in every language.

This is a fundamental requirement: you somehow need to track the  
language of every token if you want correct multi-language stemming.   
The easiest way to do this would be to split each language into its  
own field.  But there are other options: you could prefix every  
indexed token with the language:

en:The en:quick en:brown en:fox en:jumped ...
fr:Le fr:brun fr:renard fr:vite fr:a fr:sauté ...

Separate fields seems easier to me, though.

-Mike
Mime
View raw message