lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <ear...@gmail.com>
Subject Re: Fuzzy search always returning docs sorted by the highest match
Date Wed, 18 May 2011 21:32:10 GMT
You aren't likely to encounter strings like "abc company inc" in
Lucene index, as it will be tokenized into three tokens "abc",
"company", "inc" under most Analyzers.
So, for this exact example you don't even need fuzzy matching.

Also, maybe you should try 'user' mailing list for questions regarding
the use of Lucene.

On Wed, May 18, 2011 at 00:54, Guilherme Aiolfi <gradinf@gmail.com> wrote:
> I'm re-sending my first message because I've just received the mailing-list
> confirmation. If it's a duplicated, forget about this one.
>
> Hi,
> I want to do a fuzzy search and always return documents no matter what the
> score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It worked
> great and does ALMOST exactly what I wanted. The problem is that the
> algorithms supported  jw, ngram and edit are not the best fit for my
> scenario.
> The best results come from StrikeAMatch
> (http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
> So, I've found this
> link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented what
> I wanted. But I was told that I should use trunk because there were some
> really great news about fuzzy search there.
> I read this article explaining some
> changes http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html.
> But I still don't think it replaces the StrikeAMatch algo, because that one
> can have best results in searches like "abc" comparing to strings like "abc
> company inc" (distance > 2).
> But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare to
> the state of lucene trunk. So here I'm, I want to know how 4.0 will help
> achieve what I want.
> Thanks.
>
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: earwin@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message