lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 32942] New: - Fuzzy query scoring issues
Date Tue, 04 Jan 2005 21:34:19 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=32942>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=32942

           Summary: Fuzzy query scoring issues
           Product: Lucene
           Version: 1.2rc5
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Search
        AssignedTo: lucene-dev@jakarta.apache.org
        ReportedBy: markharw00d@yahoo.co.uk


Queries which automatically produce multiple terms (wildcard, range, prefix, 
fuzzy etc)currently suffer from two problems:

1) Scores for matching documents are significantly smaller than term queries 
because of the volume of terms introduced (A match on query Foo~ is 0.1 
whereas a match on query Foo is 1).
2) The rarer forms of expanded terms are favoured over those of more common 
forms because of the IDF. When using Fuzzy queries for example, rare mis-
spellings typically appear in results before the more common correct spellings.


I will attach a patch that corrects the issues identified above by 
1) Overriding Similarity.coord to counteract the downplaying of scores 
introduced by expanding terms.
2) Taking the IDF factor of the most common form of expanded terms as the 
basis of scoring all other expanded terms.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message