lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Libbrecht <p...@hoplahup.net>
Subject Re: Best practices for multiple languages?
Date Wed, 19 Jan 2011 07:44:15 GMT

But for this, you need a skillfully designed:
- set of fields
- multiplexing analyzer
- query expansion
In one of my projects, we do not split language by fields and it's a pain... I'm having recurring
issues in one sense or the other.
- the "die" example that Oti s mentioned is a good one: stop-word in German, essential verb
in English
- I had recently issues with the contribution of the word Fourier (for the name of series):
in English it stays fourier, in French in becomes fouri. So: if the resource is contributed
in French, the indexed value is fouri, English seekers won't find it; if the resource is contributed
in English, French seekers won't find it.
So my last lesson: always have a whitespace-lowercase unstemmed field also at hand and prefer
it over the others in your query expansion.

A wiki page should probably be made.

paul


Le 19 janv. 2011 à 07:53, Vinaya Kumar Thimmappa a écrit :
> I think we should be using lucene with snowball jar's which means one index for all languages
(ofcourse size of index is always a matter of concerns).
> 
> Hope this helps.
> -vinaya
> 
> On Tuesday 18 January 2011 11:23 PM, Clemens Wyss wrote:
>> What is the "best practice" to support multiple languages, i.e. Lucene-Documents
that have multiple language content/fields?
>> Should
>> a) each language be indexed in a seperate index/directory or should
>> b) the Documents (in a single directory) hold the diverse localized fields?
>> 
>> We most often will be searching "language dependent" which (at least performance
wise) mandates one-directory-per-language...
>> 
>> Any (lucene specific) white papers on this topic?
>> 
>> Thx in advance
>> Clemens
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message