lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Created] (LUCENE-7439) Should FuzzyQuery match short terms too?
Date Thu, 08 Sep 2016 13:56:20 GMT
Michael McCandless created LUCENE-7439:

             Summary: Should FuzzyQuery match short terms too?
                 Key: LUCENE-7439
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: master (7.0), 6.3

Today, if you ask {{FuzzyQuery}} to match {{abcd}} with edit distance 2, it will fail to match
the term {{ab}} even though it's 2 edits away.

Its javadocs explain this:

 * <p>NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled
 * distance between two terms is computed.  For a term to match, the edit distance between
 * the terms must be less than the minimum length term (either the input term, or
 * the candidate term).  For example, FuzzyQuery on term "abcd" with maxEdits=2 will
 * not match an indexed term "ab", and FuzzyQuery on term "a" with maxEdits=2 will not
 * match an indexed term "abc".

On the one hand, I can see that this behavior is sort of justified in that 50% of the characters
are different and so this is a very "weak" match, but on the other hand, it's quite unexpected
since edit distance is such an exact measure so the terms should have matched.

It seems like the behavior is caused by internal implementation details about how the relative
(floating point) score is computed.  I think we should fix it, so that edit distance 2 does
in fact match all terms with edit distance <= 2.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message