lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Kohlschütter (JIRA) <j...@apache.org>
Subject [jira] Commented: (LUCENE-954) Toggle score normalization in Hits
Date Fri, 22 Feb 2008 12:10:19 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571367#action_12571367
] 

Christian Kohlschütter commented on LUCENE-954:
-----------------------------------------------

Grant,

sorry I was perhaps not too clear about it.

The distribution of scores of one Hits instance is currently not comparable to another distribution
of scores of another Hits object, even if the underlying statistics are comparable/compatible/identical.
This is due to the case that the values are always normalized to a maximum of 1.0.

As I said, my Federated Search system provides homogeneous statistics (but not via MultiSearcher).
In fact, it does not use MultiSearcher for this, but a variant of the SRU/SRW/XCQL protocols
("SRX/FS"), where all communication is done via HTTP and XML. This includes the exchange of
Term/DF statistics. At the end, the system makes several distributed Indexes appear as a single
(read: federated) index. In order to merge the results from each index, Hits is used.

In the simplest case, the results from every Hits object (one per source) are simply merged
by score in descending order. With the current implementation of Lucene Hits, these scores
are not comparable across instances. With the patch, they are (at least when score normalization
is turned off).

If you need more information about the Federated Search system, we can indeed move the discussion
to the mailing list. However, I think the problem is not really specific to my needs. Even
if you have two Hits instances locally, you might want to be able to compare the scores (or
merge the results) from Hits instance A to those from Hits instance B (in particular, when
they are from the same index). This is also not possible right now.


> Toggle score normalization in Hits
> ----------------------------------
>
>                 Key: LUCENE-954
>                 URL: https://issues.apache.org/jira/browse/LUCENE-954
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.2
>         Environment: any
>            Reporter: Christian Kohlschütter
>         Attachments: hits-scoreNorm.patch
>
>
> The current implementation of the "Hits" class sometimes performs score normalization.
> In particular, whenever the top-ranked score is bigger than 1.0, it is normalized to
a maximum of 1.0.
> In this case, Hits may return different score results than TopDocs-based methods.
> In my scenario (a federated search system), Hits delievered just plain wrong results.
> I was merging results from several sources, all having homogeneous statistics (similar
to MultiSearcher, but over the Internet using HTTP/XML-based protocols).
> Sometimes, some of the sources had a top-score greater than 1, so I ended up with garbled
results.
> I suggest to add a switch to enable/disable this score-normalization at runtime.
> My patch (attached) has an additional peformance benefit, since score normalization now
occurs only when Hits#score() is called, not when creating the Hits result list. Whenever
scores are not required, you save one multiplication per retrieved hit (i.e., at least 100
multiplications with the current implementation of Hits).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message