lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Delacretaz <bdelacre...@apache.org>
Subject Re: Indexing of non-english text with Solr, any known limitations?
Date Wed, 12 Apr 2006 14:57:40 GMT
Hi Yonik,

Thanks very much for your replies!

Le 12 avr. 06 à 16:45, Yonik Seeley a écrit :

> On 4/12/06, Bertrand Delacretaz <bdelacretaz@apache.org> wrote:
>
>> ...The project that I'm looking at is currently single-language
>> (French), which I assume can be handled by static configuration of
>> the appropriate analyzers.
>
> Yes, with a little bit of work (making a Solr Filter Factory or
> Tokenizer factory) you can use any Lucene filter, tokenizer, or
> analyzer.

ok. If my project actually happens I'll do my best to contribute such  
changes if they make sense to Solr.

> ...Would you need to index multiple languages in the same field?  That
> could be trickier, and it seems like you would need an analyzer that
> supported that.

The language switch would be per document, so one document might  
contain French and another one with the same field structure might  
contain German.

But the content of a single field would be in one language, I don't  
see a need for language switches inside the content of a field.

Having mixed languages in a single index obviously leads to some  
imprecision when searching, as the query needs to indicate which  
analyzer to use. But it still allows people to search in their own  
language and, when the query includes common words, find documents in  
other languages.

-Bertrand
Mime
View raw message