lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathieu Lecarme <math...@garambrogne.net>
Subject Re: Why exactly are fuzzy queries so slow?
Date Sun, 25 Nov 2007 11:02:23 GMT
> Well, javadoc: "prefixLength - length of common (non-fuzzy) prefix".  
> So, this
> is some kind of "wildcard fuzzy" but not real fuzzy anymore.
>
> I understand the optimitation but right now I hardly can image a  
> reasonable
> use-case. Who care whether the levenstein distance is a the  
> beginnen, middle
> or end of word, .e.g when searching fuzzy for "philes" I want to
> find "files"...

with a ngram size of 3, "files" => fil, ile, les, and "philes" => phi,  
hil, ile, les
you restrict also the size of the word with -1, +1
when you fuzzy search "philes", you are looking for word of size 5, 6  
or 7, wich may contains its ngrams. Lucene query will send you, in  
sorted order, word with most ngrams in commons. You can clean sort  
order with a levenstein distance (to exclude anagram).
If you wont to match "ph" and "f", you have to compute with phonetic  
forms.

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message