lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Glen Newton <glen.new...@gmail.com>
Subject Re: Can I detect incorrect language selection after creating an index?
Date Mon, 27 Feb 2012 15:58:44 GMT
Do the check _before_ indexing.
Use https://code.google.com/p/language-detection/  to verify the
language of the text document before you put it in the index.

-Glen Newton
http://zzzoot.blogspot.com/

On Mon, Feb 27, 2012 at 10:53 AM, Ilya Zavorin <izavorin@caci.com> wrote:
> Suppose I have a bunch of text documents in language X but I index ithem using an analyzer
for language Y. Once the index is created, is it possible to perform some sort of simple "sanity"
check to see if the original language selection was wrong? I presume I can try searching for
some common word in language Y, but I am not sure how reliable this would be. On the other
hand, if languages are from the same group, say X and Y are English and Spanish, I should
expect that this sanity check would produce a false match. However, I would be happy if it
worked reliably enough for languages using different scripts, e.g. Latin vs Cyrillic vs Arabic
vs Chinese etc.
>
>
> Thanks much
>
>
>
> Ilya Zavorin



-- 
-
http://zzzoot.blogspot.com/
-

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message