lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <>
Subject [jira] [Commented] (LUCENE-7439) Should FuzzyQuery match short terms too?
Date Thu, 15 Sep 2016 19:47:20 GMT


ASF subversion and git services commented on LUCENE-7439:

Commit c79d44f82814d6d798450a422f73f42891cb1ef5 in lucene-solr's branch refs/heads/master
from Mike McCandless
[;h=c79d44f ]

LUCENE-7439: move CHANGES entry

> Should FuzzyQuery match short terms too?
> ----------------------------------------
>                 Key: LUCENE-7439
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master (7.0), 6.3
>         Attachments: LUCENE-7439.patch, LUCENE-7439.patch, LUCENE-7439.patch
> Today, if you ask {{FuzzyQuery}} to match {{abcd}} with edit distance 2, it will fail
to match the term {{ab}} even though it's 2 edits away.
> Its javadocs explain this:
> {noformat}
>  * <p>NOTE: terms of length 1 or 2 will sometimes not match because of how the
>  * distance between two terms is computed.  For a term to match, the edit distance between
>  * the terms must be less than the minimum length term (either the input term, or
>  * the candidate term).  For example, FuzzyQuery on term "abcd" with maxEdits=2 will
>  * not match an indexed term "ab", and FuzzyQuery on term "a" with maxEdits=2 will not
>  * match an indexed term "abc".
> {noformat}
> On the one hand, I can see that this behavior is sort of justified in that 50% of the
characters are different and so this is a very "weak" match, but on the other hand, it's quite
unexpected since edit distance is such an exact measure so the terms should have matched.
> It seems like the behavior is caused by internal implementation details about how the
relative (floating point) score is computed.  I think we should fix it, so that edit distance
2 does in fact match all terms with edit distance <= 2.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message