lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-4024) FuzzyQuery should never do edit distance > 2
Date Wed, 02 May 2012 18:48:50 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-4024:
--------------------------------

    Attachment: LUCENE-4024.patch

I agree: this crazy floating point specification of distance is hairy to be compatible with
3.x

But i think this is all a huge trap, attached is a patch that:
* removes slow capability from FuzzyTermsEnum
* Cleans up FuzzyQuery: removes float-ctors, allows transpositions as primitive edits, etc.
* adds a deprecated SlowFuzzyQuery to sandbox/ that has the old ctors
* adds a deprecated SlowFuzzyTermsEnum that it uses, which extends FuzzyTermsEnum and adds
slowness.

I added a helper static method (deprecated) to FuzzyQuery that converts from the old float
sim stuff to number of edits, but ceilinged at what automata support (this is used to easily
cut over queryparsers).

All tests pass but patch needs javadocs. Especially I think we should adjust the query syntax
and mark the old ~0.xxx stuff as deprecated, since qps can already do do ~1 ~2 now. Then we
can really cleanup for 5.0

P.S. patch is huge since i didnt use SVN adds/removes, but makes it easy to apply.
                
> FuzzyQuery should never do edit distance > 2
> --------------------------------------------
>
>                 Key: LUCENE-4024
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4024
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-4024.patch
>
>
> Edit distance 1 and 2 are now very very fast compared to 3.x (100X-200X faster) ... but
edit distance 3 will fallback to the super-slow scan all terms in 3.x, which is not graceful
degradation.
> Not sure how to fix it ... mabye we have a SlowFuzzyQuery?  And FuzzyQuery throws exc
if you try to ask it to be slow?  Or, we add boolean (off by default) that you must turn on
to allow slow one..?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message