lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guilherme Aiolfi <grad...@gmail.com>
Subject Re: Fuzzy search always returning docs sorted by the highest match
Date Wed, 18 May 2011 23:50:12 GMT
Well, it was about the implementation of a algorithm that was purposed by a
user and was implemented in another way. And this, and not the user mailing
list was recommended by this developer to ask this question.

So, not entirely my fault. But I apologize for the inconvenience.

I just want to clarify that searching for the tokens separably is not what I
want since those words can exist but not all in the same doc. I want to
compare the whole phrase. For that to work I not using any Analyzer.

As I said, I've got it working, but I don't know how to use the right
algorithm for the job.

I'm going to redirect my question to the other mailing list.

Thanks anyway.

On Wed, May 18, 2011 at 6:32 PM, Earwin Burrfoot <earwin@gmail.com> wrote:

> You aren't likely to encounter strings like "abc company inc" in
> Lucene index, as it will be tokenized into three tokens "abc",
> "company", "inc" under most Analyzers.
> So, for this exact example you don't even need fuzzy matching.
>
> Also, maybe you should try 'user' mailing list for questions regarding
> the use of Lucene.
>
> On Wed, May 18, 2011 at 00:54, Guilherme Aiolfi <gradinf@gmail.com> wrote:
> > I'm re-sending my first message because I've just received the
> mailing-list
> > confirmation. If it's a duplicated, forget about this one.
> >
> > Hi,
> > I want to do a fuzzy search and always return documents no matter what
> the
> > score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It
> worked
> > great and does ALMOST exactly what I wanted. The problem is that the
> > algorithms supported  jw, ngram and edit are not the best fit for my
> > scenario.
> > The best results come from StrikeAMatch
> > (
> http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
> > So, I've found this
> > link https://issues.apache.org/jira/browse/LUCENE-2230 that implemented
> what
> > I wanted. But I was told that I should use trunk because there were some
> > really great news about fuzzy search there.
> > I read this article explaining some
> > changes
> http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html
> .
> > But I still don't think it replaces the StrikeAMatch algo, because that
> one
> > can have best results in searches like "abc" comparing to strings like
> "abc
> > company inc" (distance > 2).
> > But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare
> to
> > the state of lucene trunk. So here I'm, I want to know how 4.0 will help
> > achieve what I want.
> > Thanks.
> >
> >
> >
>
>
>
> --
> Kirill Zakharenko/Кирилл Захаренко
> E-Mail/Jabber: earwin@gmail.com
> Phone: +7 (495) 683-567-4
> ICQ: 104465785
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message