lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Patrick Jung <bpj...@terreon.de>
Subject Problem / question concerning "Fuzzy Search"
Date Mon, 29 Mar 2010 14:57:44 GMT
Hi all,


I tried to figure out how the fuzzy search implementation
in Apache Lucene works and I'm kinda stuck here.
--> Version : Apache Lucene 3.0.1 (JAVA)


[What I want / need]
I'm looking for a way to combine a prefix-, fuzzy- and wildcard query.

Q: Is it possible to have a query like "user_input~0.5*" ?


[JavaDoc for org.apache.lucene.search.FuzzyQuery c-tor]
  @param minimumSimilarity: a value between 0 and 1 to set
   the required similarity between the query term and the
   matching terms. For example, for a minimumSimilarity of
   0.5 a term of the same length as the query term is
   considered similar to the query term if the edit distance
   between both terms is less than length(term)*0.5

Q: Mh... what if the query term differs in it's length to the
   term in my document?


[Test case]
I have written a small test program (JUnit test case) to
explain my problem / confusion in detail:

--> http://eugeneciurana.com/pastebin/pastebin.php?show=42619



[Examples] Search term --> Subset of expected result
  Cinamo~0.5 --> Cinema, Cinnamon [works]
  Strawbarr~0.8 --> Strawberry    [doesn't work]
  
-->
As far as I understand, the "Edit distance"
(aka "Levinshtein distance") between "Strawbarr" and "Strawberry" 
is 2 (one replacement and one insertion to transform "Strawbarr" into
"Strawberry")

The query "Strawbarr~0.8" in my opinion (and from what I read from
the JavaDocs) should work just fine, because 
  len(Strawbarr)*0.8 == 9*0.8 == 7.2 ... 7.2 >= 2 ... still -- 
it doesn't work. Is that, because the length of the search term 
and the word in my document differ?

I already searched the wiki, the mailing list archive
and had a look in all the "obvious" places but had no luck
so far.

If I am missing something obvious here I would be glad to receive
some pointers into the right direction.
<--




Regards
-benjamin-

-- 
Benjamin Jung <bpjung@terreon.de>
Terreon, http://terreon.de/
Tel.: +49 (0)69 / 8484 65 37
Fax: +49 (0)6054 / 909 788 2
Mobil +49 (0)1577 / 159 788 3

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message