lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Patrick Jung <>
Subject Problem / question concerning "Fuzzy Search"
Date Mon, 29 Mar 2010 14:57:44 GMT
Hi all,

I tried to figure out how the fuzzy search implementation
in Apache Lucene works and I'm kinda stuck here.
--> Version : Apache Lucene 3.0.1 (JAVA)

[What I want / need]
I'm looking for a way to combine a prefix-, fuzzy- and wildcard query.

Q: Is it possible to have a query like "user_input~0.5*" ?

[JavaDoc for c-tor]
  @param minimumSimilarity: a value between 0 and 1 to set
   the required similarity between the query term and the
   matching terms. For example, for a minimumSimilarity of
   0.5 a term of the same length as the query term is
   considered similar to the query term if the edit distance
   between both terms is less than length(term)*0.5

Q: Mh... what if the query term differs in it's length to the
   term in my document?

[Test case]
I have written a small test program (JUnit test case) to
explain my problem / confusion in detail:


[Examples] Search term --> Subset of expected result
  Cinamo~0.5 --> Cinema, Cinnamon [works]
  Strawbarr~0.8 --> Strawberry    [doesn't work]
As far as I understand, the "Edit distance"
(aka "Levinshtein distance") between "Strawbarr" and "Strawberry" 
is 2 (one replacement and one insertion to transform "Strawbarr" into

The query "Strawbarr~0.8" in my opinion (and from what I read from
the JavaDocs) should work just fine, because 
  len(Strawbarr)*0.8 == 9*0.8 == 7.2 ... 7.2 >= 2 ... still -- 
it doesn't work. Is that, because the length of the search term 
and the word in my document differ?

I already searched the wiki, the mailing list archive
and had a look in all the "obvious" places but had no luck
so far.

If I am missing something obvious here I would be glad to receive
some pointers into the right direction.


Benjamin Jung <>
Tel.: +49 (0)69 / 8484 65 37
Fax: +49 (0)6054 / 909 788 2
Mobil +49 (0)1577 / 159 788 3

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message