lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Kohlschütter (JIRA) <j...@apache.org>
Subject [jira] Commented: (LUCENE-954) Toggle score normalization in Hits
Date Fri, 22 Feb 2008 10:34:19 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571339#action_12571339
] 

Christian Kohlschütter commented on LUCENE-954:
-----------------------------------------------

You are right, Yonik.

Hits currently tries to "hide" this by normalizing the scores to a maximum of 1, simply by
dividing the "raw" scores by the maximum score returned.

This is why the scores from Hits are currently not comparable to each other. The suggested
patch resolves this problem.


> Toggle score normalization in Hits
> ----------------------------------
>
>                 Key: LUCENE-954
>                 URL: https://issues.apache.org/jira/browse/LUCENE-954
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.2
>         Environment: any
>            Reporter: Christian Kohlschütter
>         Attachments: hits-scoreNorm.patch
>
>
> The current implementation of the "Hits" class sometimes performs score normalization.
> In particular, whenever the top-ranked score is bigger than 1.0, it is normalized to
a maximum of 1.0.
> In this case, Hits may return different score results than TopDocs-based methods.
> In my scenario (a federated search system), Hits delievered just plain wrong results.
> I was merging results from several sources, all having homogeneous statistics (similar
to MultiSearcher, but over the Internet using HTTP/XML-based protocols).
> Sometimes, some of the sources had a top-score greater than 1, so I ended up with garbled
results.
> I suggest to add a switch to enable/disable this score-normalization at runtime.
> My patch (attached) has an additional peformance benefit, since score normalization now
occurs only when Hits#score() is called, not when creating the Hits result list. Whenever
scores are not required, you save one multiplication per retrieved hit (i.e., at least 100
multiplications with the current implementation of Hits).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message