lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ilya Zavorin <>
Subject Can I detect incorrect language selection after creating an index?
Date Mon, 27 Feb 2012 15:53:47 GMT
Suppose I have a bunch of text documents in language X but I index ithem using an analyzer
for language Y. Once the index is created, is it possible to perform some sort of simple "sanity"
check to see if the original language selection was wrong? I presume I can try searching for
some common word in language Y, but I am not sure how reliable this would be. On the other
hand, if languages are from the same group, say X and Y are English and Spanish, I should
expect that this sanity check would produce a false match. However, I would be happy if it
worked reliably enough for languages using different scripts, e.g. Latin vs Cyrillic vs Arabic
vs Chinese etc.

Thanks much

Ilya Zavorin
View raw message