lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Vasilev <ivasi...@sirma.bg>
Subject Multy Language documents indexing
Date Thu, 22 Feb 2007 13:04:16 GMT
Hi All,

Our application that uses Lucene for indexing will be used to index 
documents that each of which contains parts written in different 
languages. For example some document could contain English, Chinese and 
Brazilian text. So how to index such document? Is there some best 
practice to do this?

What comes in my mind is to index 3 different Lucene Documents for the 
real document and keep in a database the meta info that these 3 
Documents are related to our real doc. For example for the myDoc.doc we 
will have in the index myDocEn.doc, myDocCn.doc and myDocBr.doc and when 
making search when the searched word is found in myDocCn.doc we will 
visualize to user myDoc.doc. Disadvantage here is that in this case the 
occurrences of the searched item will have to be recalculated. It is 
important for queries like “Red NEAR/10 fox”. So if someone knows better 
practice than this, please let me help.

Tanks in advance,
Ivan


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message