lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Indexing of multilingual labels
Date Fri, 11 Mar 2011 12:52:30 GMT
It's not so much a matter of problems with indexing/searching
as it is with search behavior. The reason these strategies
are implemented is that using English stemming, say, on
other languages will produce "interesting" results.

There's no a-priori reason you can't index multiple languages
in the same field.

So I don't see what you would accomplish by using payloads
to indicate which language the term is in. Could you expand
a bit on what you're trying to accomplish here? Maybe there
are better solutions....


On Thu, Mar 10, 2011 at 10:29 PM, Stephane Fellah
<> wrote:
> I  am trying to index in Lucene a field that could have label of concepts in
> different languages. Most of the approaches I have seen so far are:
>   -
>   Use a single index, where each document has a field per each language it
>   uses, or
>   -
>   Use M indexes, M being the number of languages in the corpus.
> Lucene 2.9+ has a feature called Payload that allows to attach attributes to
> term. Is anyone use this mechanism to store language (or other attributes
> such as datatypes) information ? Does this approach if labels are the same
> in different languages (does it break inverted index) ? How is performance
> compared to the two other approaches ? Any pointer on source code showing
> how it is done would help.
> Thanks
> --
> Stephane Fellah, M.Sc, B.Sc
> Principal Engineer/Product Manager
> smartRealm LLC
> 201 Loudoun St. SW
> Leesburg, VA 20175
> Tel: 703 669 5514
> Cell: 571 502 8478
> Fax: 703 669 5515

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message