lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Bennett <mbenn...@ideaeng.com>
Subject Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
Date Sun, 11 Nov 2012 23:03:55 GMT
+1 but with a harsh warning, maybe even in the log?

In other words, it's not just performance, but also accuracy.  I believe it
breaks (in some manner) if the number of matching words passes 32k.

--
Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


On Sun, Nov 11, 2012 at 1:43 PM, Yonik Seeley <yonik@lucidworks.com> wrote:

> Thinking about this a bit more, I'm somewhat sympathetic to the
> performance arguments when the user is using the minSimilarity type of
> parameter (i.e. a number less than 1) since it's not obvious what
> algorithm would be invoked (i.e. what the resulting edit distance
> requested is), and the user is not requesting a specific edit distance
> in any case (it's fuzzy ;-)
>
> When edit distance is used directly however (i.e. param >= 1), things
> are both predictable and easy to document - there are no surprises.
> So perhaps what makes the most sense is this:
> - if minSimilarity < 1, then calculate the max edit distance based on
> a parameter fuzzy.maxDistance or something (which would default to 2).
>  Use SlowFuzzyQuery if the result is >=3
> - if minSimilarity is 1 or 2, use FuzzyQuery
> - if minSimilarity is >=3, use SlowFuzzyQuery
>
> -Yonik
> http://lucidworks.com
>
>
> On Sun, Nov 11, 2012 at 10:32 PM, Yonik Seeley <yonik@lucidworks.com>
> wrote:
> > On Sun, Nov 11, 2012 at 4:18 PM, Jack Krupansky <jack@basetechnology.com>
> wrote:
> >> Okay, so maybe this is simply a case where “an adjustment” was made to
> >> Lucene and Solr did not make a corresponding “adustment” to compensate
> to
> >> “preserve” functionality. Solr users cannot easily override factory
> methods,
> >> but of course the Solr query parser can and probably should.
> >
> > Right - and Solr attempts to preserve external interfaces (HTTP apis
> > and query languages) even across major versions.
> > It could be argued that this is a regression - a loss of the ability
> > to use higher edit distances.
> > I'd support adding a fallback to SlowFuzzyQuery when the edit distance
> > turns out to be > 2.  I'd even argue that it should do it by default
> > to retain the old behavior.  Basically from the user perspective it
> > would look like edit distances of <= 2 were sped up.
> >
> > -Yonik
> > http://lucidworks.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message