lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marco Dissel (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-329) Fuzzy query scoring issues
Date Thu, 05 Oct 2006 19:10:26 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-329?page=comments#action_12440219 ] 
            
Marco Dissel commented on LUCENE-329:
-------------------------------------

>> but assuming "clean data" with no mis-spellings, scoring "rare" terms higher seems
like the ideal behavior

Exact matched of a term should have a higher ranking then fuzzy matched terms.. at least that's
the expected behaviour in my situation although i think it seems the most common behaviour..
We could also make this optionally, so that we don't affect the current scoring algorithm..

ps. i'm using the Lucene.NET port, so by changing/extending this behaviour in Java we (or
George) can implement this at the .NET version as well.


> Fuzzy query scoring issues
> --------------------------
>
>                 Key: LUCENE-329
>                 URL: http://issues.apache.org/jira/browse/LUCENE-329
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 1.2rc5
>         Environment: Operating System: All
> Platform: All
>            Reporter: Mark Harwood
>         Assigned To: Lucene Developers
>         Attachments: patch.txt
>
>
> Queries which automatically produce multiple terms (wildcard, range, prefix, 
> fuzzy etc)currently suffer from two problems:
> 1) Scores for matching documents are significantly smaller than term queries 
> because of the volume of terms introduced (A match on query Foo~ is 0.1 
> whereas a match on query Foo is 1).
> 2) The rarer forms of expanded terms are favoured over those of more common 
> forms because of the IDF. When using Fuzzy queries for example, rare mis-
> spellings typically appear in results before the more common correct spellings.
> I will attach a patch that corrects the issues identified above by 
> 1) Overriding Similarity.coord to counteract the downplaying of scores 
> introduced by expanding terms.
> 2) Taking the IDF factor of the most common form of expanded terms as the 
> basis of scoring all other expanded terms.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message