lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-329) Fuzzy query scoring issues
Date Mon, 15 Feb 2010 15:01:28 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833828#action_12833828
] 

Robert Muir commented on LUCENE-329:
------------------------------------

bq. The problem with ignoring IDF completely is that it doesn't help balance partial matches
where there is >1 fuzzy element in the query e.g.in a query for John~ Patitucci~ I'm probably
more interested in a partial match on the rarer surname than a partial match on the common
forename. Obliterating IDF completely as a factor would lose this feature (available in FuzzyLikeThisQuery)

Mark, it wouldn't lose any features. we simply provide another option, just like we do for
other MultiTermQuery rewrites for other queries, so users can choose what they want to use.
its just an additional choice.

> Fuzzy query scoring issues
> --------------------------
>
>                 Key: LUCENE-329
>                 URL: https://issues.apache.org/jira/browse/LUCENE-329
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 1.2rc5
>         Environment: Operating System: All
> Platform: All
>            Reporter: Mark Harwood
>            Assignee: Lucene Developers
>            Priority: Minor
>         Attachments: patch.txt
>
>
> Queries which automatically produce multiple terms (wildcard, range, prefix, 
> fuzzy etc)currently suffer from two problems:
> 1) Scores for matching documents are significantly smaller than term queries 
> because of the volume of terms introduced (A match on query Foo~ is 0.1 
> whereas a match on query Foo is 1).
> 2) The rarer forms of expanded terms are favoured over those of more common 
> forms because of the IDF. When using Fuzzy queries for example, rare mis-
> spellings typically appear in results before the more common correct spellings.
> I will attach a patch that corrects the issues identified above by 
> 1) Overriding Similarity.coord to counteract the downplaying of scores 
> introduced by expanding terms.
> 2) Taking the IDF factor of the most common form of expanded terms as the 
> basis of scoring all other expanded terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message