lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidworks.com>
Subject Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
Date Sun, 11 Nov 2012 21:43:28 GMT
Thinking about this a bit more, I'm somewhat sympathetic to the
performance arguments when the user is using the minSimilarity type of
parameter (i.e. a number less than 1) since it's not obvious what
algorithm would be invoked (i.e. what the resulting edit distance
requested is), and the user is not requesting a specific edit distance
in any case (it's fuzzy ;-)

When edit distance is used directly however (i.e. param >= 1), things
are both predictable and easy to document - there are no surprises.
So perhaps what makes the most sense is this:
- if minSimilarity < 1, then calculate the max edit distance based on
a parameter fuzzy.maxDistance or something (which would default to 2).
 Use SlowFuzzyQuery if the result is >=3
- if minSimilarity is 1 or 2, use FuzzyQuery
- if minSimilarity is >=3, use SlowFuzzyQuery

-Yonik
http://lucidworks.com


On Sun, Nov 11, 2012 at 10:32 PM, Yonik Seeley <yonik@lucidworks.com> wrote:
> On Sun, Nov 11, 2012 at 4:18 PM, Jack Krupansky <jack@basetechnology.com> wrote:
>> Okay, so maybe this is simply a case where “an adjustment” was made to
>> Lucene and Solr did not make a corresponding “adustment” to compensate to
>> “preserve” functionality. Solr users cannot easily override factory methods,
>> but of course the Solr query parser can and probably should.
>
> Right - and Solr attempts to preserve external interfaces (HTTP apis
> and query languages) even across major versions.
> It could be argued that this is a regression - a loss of the ability
> to use higher edit distances.
> I'd support adding a fallback to SlowFuzzyQuery when the edit distance
> turns out to be > 2.  I'd even argue that it should do it by default
> to retain the old behavior.  Basically from the user perspective it
> would look like edit distances of <= 2 were sped up.
>
> -Yonik
> http://lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message