lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Javadoc issue: FuzzyQuery should note the editing distance limit of 2 and refer to SlowFuzzyQuery
Date Thu, 13 Sep 2012 17:45:12 GMT
I dont agree with suggesting the slow, unscalable approach.

This query supports up to 2 distances including transpositions,
anything beyond that is basically going to match a significant portion
of the term dictionary and not really be useful.

If someone has special data where this makes sense, they should use an
n-gram indexing technique or the spellchecker module, or BLAST or
something other than Lucene.

As far as the constant of 2: this is actually in the javadocs: you
have to click CONSTANT VALUES.

On Thu, Sep 13, 2012 at 12:03 PM, Jack Krupansky
<jack@basetechnology.com> wrote:
> The automaton support for FuzzyQuery added the severe limitation to
> FuzzyQuery of an editing distance of 2 that needs to be documented in the
> Javadoc. A reference to SlowFuzzyQuery is also needed in the Javadoc, even
> though that class is deprecated.
>
> The constructor Javadoc does say “maxEdits - must be >= 0 and <=
> LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE”, but neither the text nor
> that link documents the extreme limitation of 2. I mean, a casual reader
> might reasonably expect that it is just some big number like
> Integer.MAX_VALUE. The rationale from the Jira should be succinctly stated,
> at the class level as well.
>
> Relevant Jira:
> https://issues.apache.org/jira/browse/LUCENE-4024
>
> -- Jack Krupansky



-- 
lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message