lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephane Fellah <>
Subject Indexing of multilingual labels
Date Fri, 11 Mar 2011 03:29:30 GMT
I  am trying to index in Lucene a field that could have label of concepts in
different languages. Most of the approaches I have seen so far are:


   Use a single index, where each document has a field per each language it
   uses, or

   Use M indexes, M being the number of languages in the corpus.

Lucene 2.9+ has a feature called Payload that allows to attach attributes to
term. Is anyone use this mechanism to store language (or other attributes
such as datatypes) information ? Does this approach if labels are the same
in different languages (does it break inverted index) ? How is performance
compared to the two other approaches ? Any pointer on source code showing
how it is done would help.


Stephane Fellah, M.Sc, B.Sc
Principal Engineer/Product Manager
smartRealm LLC
201 Loudoun St. SW
Leesburg, VA 20175
Tel: 703 669 5514
Cell: 571 502 8478
Fax: 703 669 5515

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message