lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <ear...@gmail.com>
Subject Re: Fuzzy search always returning docs sorted by the highest match
Date Thu, 19 May 2011 01:27:43 GMT
I'm baffled. As probably are you.

If all you want is a fuzzy match against a list of strings, Lucene is
a huge fat overkill, and you need to look elsewhere.

2011/5/19 Guilherme Aiolfi <gradinf@gmail.com>:
> Well, it was about the implementation of a algorithm that was purposed by a
> user and was implemented in another way. And this, and not the user mailing
> list was recommended by this developer to ask this question.
> So, not entirely my fault. But I apologize for the inconvenience.
> I just want to clarify that searching for the tokens separably is not what I
> want since those words can exist but not all in the same doc. I want to
> compare the whole phrase. For that to work I not using any Analyzer.
> As I said, I've got it working, but I don't know how to use the right
> algorithm for the job.
> I'm going to redirect my question to the other mailing list.
> Thanks anyway.
>
> On Wed, May 18, 2011 at 6:32 PM, Earwin Burrfoot <earwin@gmail.com> wrote:
>>
>> You aren't likely to encounter strings like "abc company inc" in
>> Lucene index, as it will be tokenized into three tokens "abc",
>> "company", "inc" under most Analyzers.
>> So, for this exact example you don't even need fuzzy matching.
>>
>> Also, maybe you should try 'user' mailing list for questions regarding
>> the use of Lucene.
>>
>> On Wed, May 18, 2011 at 00:54, Guilherme Aiolfi <gradinf@gmail.com> wrote:
>> > I'm re-sending my first message because I've just received the
>> > mailing-list
>> > confirmation. If it's a duplicated, forget about this one.
>> >
>> > Hi,
>> > I want to do a fuzzy search and always return documents no matter what
>> > the
>> > score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It
>> > worked
>> > great and does ALMOST exactly what I wanted. The problem is that the
>> > algorithms supported  jw, ngram and edit are not the best fit for my
>> > scenario.
>> > The best results come from StrikeAMatch
>> >
>> > (http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
>> > So, I've found this
>> > link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented
>> > what
>> > I wanted. But I was told that I should use trunk because there were some
>> > really great news about fuzzy search there.
>> > I read this article explaining some
>> >
>> > changes http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html.
>> > But I still don't think it replaces the StrikeAMatch algo, because that
>> > one
>> > can have best results in searches like "abc" comparing to strings like
>> > "abc
>> > company inc" (distance > 2).
>> > But still, Fuad Efendi told me that StrikeAMatch is toys for kids
>> > compare to
>> > the state of lucene trunk. So here I'm, I want to know how 4.0 will help
>> > achieve what I want.
>> > Thanks.
>> >
>> >
>> >
>>
>>
>>
>> --
>> Kirill Zakharenko/Кирилл Захаренко
>> E-Mail/Jabber: earwin@gmail.com
>> Phone: +7 (495) 683-567-4
>> ICQ: 104465785
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: earwin@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message