lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luc Vanlerberghe (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-469) (Parallel-)MultiSearcher: using Sort object changes the scores
Date Mon, 21 Nov 2005 18:13:42 GMT
     [ http://issues.apache.org/jira/browse/LUCENE-469?page=all ]

Luc Vanlerberghe updated LUCENE-469:
------------------------------------

    Attachment: TestMultiSearcher.patch

This adds a test to TestMultiSearcher (and ParallelMultiSearcher since TestParallelMultiSearcher

runs this code too) demonstrating the problem.

Two document sets are created, both with ten documents, and a query that matches exactly one
of each.
Since the documents in the second set have more terms, the scores for those document should
be lower.

Putting all documents in one index demonstrates this, and the scores from that are used to
check the ones
obtained by MultiSearcher when the document sets are put in two different indexes.

Using searcher.search(query), the results are ok,
using searcher.search(query, Sort.RELEVANCE), they are not (both scores are 1.0)


> (Parallel-)MultiSearcher: using Sort object changes the scores
> --------------------------------------------------------------
>
>          Key: LUCENE-469
>          URL: http://issues.apache.org/jira/browse/LUCENE-469
>      Project: Lucene - Java
>         Type: Bug
>   Components: Search
>     Versions: CVS Nightly - Specify date in submission
>  Environment: 21 november 2005, revision 345901
>     Reporter: Luc Vanlerberghe
>  Attachments: TestMultiSearcher.patch
>
> Example: 
> Hits hits=multiSearcher.search(query);
> returns different scores for some documents than
> Hits hits=multiSearcher.search(query, Sort.RELEVANCE);
> (both for MultiSearcher and ParallelMultiSearcher)
> The documents returned will be the same and in the same order, but the scores in the
second case will seem out of order.
> Inspecting the Explanation objects shows that the scores themselves are ok, but there's
a bug in the normalization of the scores.
> The document with the highest score should have score 1.0, so all document scores are
divided by the highest score.  (Assuming the highest score was>1.0)
> However, for MultiSearcher and ParallelMultiSearcher, this normalization factor is applied
*per index*, before merging the results together (the merge itself is ok though).
> An example: if you use
> Hits hits=multiSearcher.search(query, Sort.RELEVANCE);
> for a MultiSearcher with two subsearchers, the first document will have score 1.0.
> The next documents from the same subsearcher will have decreasing scores.
> The first document from the other subsearcher will however have score 1.0 again !
> The same applies for other Sort objects, but it is less visible.
> I will post a TestCase demonstrating the problem and suggested patches to solve it in
a moment...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message