lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artem Lukanin <ice...@mail.ru>
Subject minFuzzyLength in FuzzySuggester behaves differently for English and Russian
Date Thu, 30 May 2013 12:26:46 GMT
minFuzzyLength is the length in bytes, which is wrong, I think, because it is
expected to be in letters. In English the word "table" is 5 bytes, but in
Russian the word "книга" is 10 bytes, though it has only 5 letters. If I
have English and Russian words in one field I have to multiply
minFuzzyLength by 2 if the current query has Russian letters.

Though this hack works it is wrong, because you cannot swap bytes or
substitute bytes in Russian letters if you wish to guess whether it was a
typo. Every arc in FST should be a letter, not a byte.



--
View this message in context: http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-tp4067018.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Mime
View raw message