lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guilherme Aiolfi <grad...@gmail.com>
Subject Fuzzy search always returning docs sorted by the highest match
Date Tue, 17 May 2011 20:54:39 GMT
I'm re-sending my first message because I've just received the mailing-list
confirmation. If it's a duplicated, forget about this one.

Hi,

I want to do a fuzzy search and always return documents no matter what the
score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It worked
great and does ALMOST exactly what I wanted. The problem is that the
algorithms supported  jw, ngram and edit are not the best fit for my
scenario.

The best results come from StrikeAMatch (
http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
So, I've found this link https://issues.apache.org/jira/browse/LUCENE-2230 that
implemented what I wanted. But I was told that I should use trunk because
there were some really great news about fuzzy search there.

I read this article explaining some changes
http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html.
But I still don't think it replaces the StrikeAMatch algo, because that one
can have best results in searches like "abc" comparing to strings like "abc
company inc" (distance > 2).

But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare to
the state of lucene trunk. So here I'm, I want to know how 4.0 will help
achieve what I want.

Thanks.

Mime
View raw message